-
Notifications
You must be signed in to change notification settings - Fork 5
PSC-STM-A7: Deployment of PSC Stream Ingester App #246
Comments
Static IP ExperimentHeroku doesn't support static IPs, at least not in its standard cloud offering. This prevents it from being used to host Transformer PSC, which needs a static IP to be whitelisted for OpenCorporates reconciliation. An experiment in routing requests through a proxy or similar will be conducted. It's worth noting that some time was spent considering a similar approach in Jul 2023 during Register 1; however, that wasn't successful, mostly because SOCKS proxies configured via environment variables (or indeed, even specified explicitly in code) seemingly aren't supported by Net::HTTP::Persistent, which Sources OC ReconciliationClient uses. At that time, no Heroku plugins were tried, however. This time, it would be helpful to check not only SOCKS proxies, but whether an HTTP proxy (ideally using TLS) would be sufficient for the purposes. Alternatively, perhaps some other method is available in Heroku via a network wrapper. |
IP Test Case using Net::HTTP::Persistentrequire 'net/http/persistent'
http = Net::HTTP::Persistent.new
http.proxy = :ENV
p http
uri = URI('https://ipinfo.io')
res = http.request(uri)
puts res.body This may be run via a console in Heroku. For this to work, ensure that |
Fixiehttps://elements.heroku.com/addons/fixie
Fixie Sockshttps://elements.heroku.com/addons/fixie-socks
QuotaGuard Static IP'shttps://elements.heroku.com/addons/quotaguardstatic
QuotaGuard Shield Static IP'shttps://elements.heroku.com/addons/quotaguardshield
IPBurger Static IPshttps://elements.heroku.com/addons/ipburger
Proximohttps://elements.heroku.com/addons/proximo
|
Static IP Experiment ConclusionSOCKS proxies are not supported by Net::HTTP::Persistent. Whilst these would usually be my preference, they are not necessary since we require only HTTP/HTTPS traffic. So, we can ignore those options which support only SOCKS, as well as those alternative configurations which support SOCKS but also provide HTTP or HTTPS proxies. HTTPS proxies are also not supported by Net::HTTP::Persistent. This is unfortunate. However, note that it's still possible to access HTTPS sites using CONNECT, and that most of the connection is encrypted:
https://en.wikipedia.org/wiki/HTTP_tunnel Most of the connection is not all of the connection, however. This leaves us with a couple of options:
Given (2), which particular Heroku plugin to use (that is, which third-party service) isn't so important. There are a number of options, and some prices are similar depending on the number of requests per month (which we don't currently know). So, we could start with one, and change it easily within a few minutes, with no code changes required. In the case we proceed with HTTP proxies, I'd suggest Fixie or Proximo for this use case, in the first instance. Also given (2) (so, no rewrite), there is one small change needed, which I'll submit in a PR momentarily. With one of these options in place, we could plan to host the new Ingester PSC and Transformer PSC apps in Heroku as new apps, configured with proxy plugins, and provide those IPs to whitelisted. This could be 1 IP per app (Proximo and Fixie alternative config), or 2 IPs per app (Fixie default), resulting in 2-4 IPs to be whitelisted (since we would need stg and prd apps). If everything worked satisfactorily, there would be the option to reconsider the existing EC2 and dev whitelisted IPs, since using a proxy should theoretically also be possible there, too (although a proxy provider would still have to be found). However, it's useful to note that these Heroku plugins are not doing anything special; they are just proxy providers. So, it would in fact be possible to use other proxy providers, instead, without installing the plugins, or alternatively to sign up directly for one of those services, and to share the usage across multiple applications. e.g. Fixie could be used (https://usefixie.com/), but there are many others—including ones not listed as Heroku plugins. |
To clarify something I muddled: Ingester PSC doesn't need run OpenCorporates reconciliation; rather, Transformer PSC does. This means that it doesn't need access to OpenCorporates or any IPs whitelisted, but Transformer PSC will. All the IP-related experiments and notes in this ticket still stand, but apply to Transformer PSC, not Ingester PSC. Thus, they should have been done under #252 rather than this ticket. My apologies for the confusion. |
Ingester PSC has been deployed to Heroku. There is only a production app, since it's not possible for us to run staging apps on the Register data pipelines within our current setup. Ingester PSC on Heroku is now intentionally in a crash loop; this will be lifted when openownership/register-ingester-psc#33 is merged once the streaming code is ready to go live. |
Ingester PSC on Heroku is now live and streaming updates from PSC datasource. |
Instead of a monthly job run on an EC2 instance, this will need to run continuously, so that new records through the PSC Stream API are detected and ingested immediately.
Subtasks include:
Estimate: 12 hours
The text was updated successfully, but these errors were encountered: