-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log datasette access IPs #3669
Log datasette access IPs #3669
Conversation
@@ -0,0 +1 @@ | |||
load_module modules/ngx_http_realip_module.so; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could technically also live in nginx.conf
but this is more idiomatic.
@@ -8,4 +8,5 @@ ls | |||
mv all_dbs.tar.zst /data | |||
zstd -f -d /data/all_dbs.tar.zst -o /data/all_dbs.tar | |||
tar -xf /data/all_dbs.tar --directory /data | |||
datasette serve --host 0.0.0.0 ${DATABASES} --cors --inspect-file inspect-data.json --metadata metadata.yml --setting sql_time_limit_ms 5000 --port $PORT | |||
datasette serve --host 0.0.0.0 ${DATABASES} --cors --inspect-file inspect-data.json --metadata metadata.yml --setting sql_time_limit_ms 5000 --port $DATASETTE_PORT > /dev/null & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to suppress the datasette serve
logs since we get the same information in the nginx access logs.
&& ln -sf /dev/stdout /var/log/nginx/access.log \ | ||
&& ln -sf /dev/stderr /var/log/nginx/error.log | ||
|
||
COPY . /app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this down to the bottom since all the other setup changes much less frequently than the contents of the Docker context - this helps with build caching.
8b37c9a
to
635384a
Compare
@@ -2216,7 +2216,7 @@ def from_data_source_ids( | |||
xbrl_resources=xbrl_resources, | |||
) | |||
|
|||
def to_yaml(self) -> None: | |||
def to_yaml(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy was complaining about this.
} | ||
http { | ||
server { | ||
listen 8080; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could turn this (and the Dockerfile) into a jinja template, which might be cleaner, but it felt like gilding the lily to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all mostly makes sense to me! On the staging server I get a 502 but when I reload the page it works.
Also, how do we transfer the logs into a bucket?
check_tables_have_metadata(metadata_yml, databases) | ||
|
||
if deploy == "metadata": | ||
logging.info("Only writing metadata. Aborting now.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You maybe forgot to abort the script here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure did, fixing! Good catch :) Fortunately we only actually deploy if we actively choose a valid deployment target 😅
635384a
to
f2e3983
Compare
Yeah, the 502 is happening because nginx will start up and become available before datasette is fully initialized. Since the staging server shuts down with inactivity, you'll see a 502 first every time the VM comes out of inactivity. In terms of how to transfer logs to a bucket: the official solution is fly-log-shipper: https://github.com/superfly/fly-log-shipper. |
We can't declare victory on tracking logs until we set up the sink right? Any idea how the fly-log-shipper actually runs? Also, looks like there isn't a GCS sink :( |
We'd have to set up the sink to declare victory, yes. The quick-start in their docs seems reasonable. The confusing thing about the GCS sink is that it's clearly supported by Vector, the library that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Looks like those next steps are tracked in the issue 👍
* use nginx real IP module * get nginx logs to show in stdout/err and suppress datasette/gunicorn logs
f2e3983
to
317bea4
Compare
Overview
Relates to catalyst-cooperative/pudl-usage-metrics#128
Right now, our datasette instance only logs out the IP of the load balancer it sits behind. Useless!
If we shove datasette behind nginx, we can then use nginx's "real IP" module to grab the IPs of the clients that the load balancer is forwarding.
This PR does that, and:
--local
target actually use Docker - this makes the deploy slightly slower but also much closer to what is actually getting deployed on fly.ioTesting
How did you make sure this worked?
I did a fast deploy to our staging server:
Then I went to the staging url: https://catalyst-coop-pudl-staging.fly.dev/
And I saw my IP show up in the fly.io live logs:
https://fly.io/apps/catalyst-coop-pudl-staging/monitoring
Which I verified with a quick
curl icanhazip.com
/ asking other people to do the same.To-do list