-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-30446: Implement DMTN-183's alert database server #1
Conversation
Add a FastAPI-based server which provides alerts and schemas according to the routes laid out in DMTN-183. This is done in a package structure which is separate from the LSST stack because I do not expect it to depend upon the stack, and the stack's complexity is not warranted here.
Moved the Alert DB's backend implementations into alertdb/storage.py. Fixed the server's responses so they provide raw bytes rather than JSON-encoding the byte sequences. Added an integration test. Added requests dependency for the integration test.
Fix an invalid import, and correctly spell 'AssertionError'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic. I really appreciate how this architecture cleanly abstracts the storage backend so that storage can be implemented multiple ways without needing to change the web service or orchestration layers.
I noticed that the server is built with FastAPI but everything about the app is synchronous, from the request handlers to the backend interface. Would you consider building around asyncio? Like, build the FileBackend
with aiofiles. For the S3-compatible backends, I don't know, there is aiobotocore, aioboto3 and ways to use boto3 itself concurrently. I honestly don't know what sort of performance gain you'd get from using asyncio, but it might be something to look into.
I'd recommend getting some GitHub Actions-based CI up before going much further. In particular, linting with flake8 and type checking with mypy would be big wins (the annotations in place). I also think that it'd be beneficial to run tests against the FileBackend
and have those run in GitHub Actions. You could either generate sample data on the fly or commit some sample data in a tests/data
directory.
My last big comment is totally beyond the scope of this PR and more a comment for DMTN-183. I wonder if there's a use case for GET /v1/alerts/:alert_id
to take an Accept
header and either return the Confluent Wire Format or indeed, decode it into JSON given an application/json
mime type. I say this because this alert server and alert database now serve to abstract the Kafka infrastructure away, and so perhaps it makes sense to go all and not force the Avro encoding on consumers? You could even start to subset fields in the response, etc. This is sort of pie in the sky thinking and in fact, you may not want the server to take on extra compute load doing this.
Everything else is sort of thoughts for making this service more amenable to containerization and/or opinions:
- Rather than maintaining your app's dependencies in your
setup.cfg
, consider a system that lets you pin dependencies (requirements.txt) for repeatable container builds, while still allowing you to only thinking about your abstract dependencies. In SQuaRE we've got a system based on pip-tools that compiles our abstract dependencies in arequirements/main.in
file into concrete dependencies in arequirements/main.txt
file. - At some point, you may want a configuration object that gathers configurations from environment variables (we like pydantic.BaseSettings). You might find this more convenient and maintainable than gather settings via the alertdb.py script. In this mode, you application start up module would build the config object first and then build the backend based on that configuration. Specifically, I think the "FastAPI way" of doing this would be as a dependency and that dependency would be available as an argument on the web hander functions. Its definitely different, but that's sort of the explanation for why FastAPI recommends the app be a global singleton as you mention in your server.py comment.
- If you add more endpoints, you may want to split them out into another module. Something I like to also do is create a sub-app to encompass my versioned endpoints (so a sub-app for v1). This is something I'm still working out, and I don't know if means all my dependencies and middleware need to be replicated, if necessary between each sub-app.
- I don't know whether you prefer unittest or not, but we write all our tests directly as pytest functions and make use of fixtures for things like getting a test client or setting up a test dataset.
- Once you start running in Kubernetes, you'll want to add a health check endpoint that indicates whether the app is ready to take requests, and whether the application continues to be healthy (they can either be the same endpoint, or two different endpoints). The simplest way to start doing that is implementing a
GET /
endpoint that just returns 200. A more sophisticated set of health check endpoints would be one that also checks if the app can talk to the backend. - For logging, we use structlog in combination with our own FastAPI middleware in Safir to inject context about the request. This may be more relevant if the service layer becomes more complex and you need to piece together log statements from multiple places and correlated them by request.
alertdb/storage.py
Outdated
>>> import io | ||
>>> alert_payload = backend.get_alert("alert-id") | ||
>>> alert_payload = gzip.decompress(alert_payload) | ||
>>> alert_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
Yeah, this is worth looking into. There's no official support in Google's client (googleapis/python-storage#478). There's a third-party library that provides it though (https://github.com/talkiq/gcloud-aio). I don't know very much about how ASGI works, so I don't know how much this would matter. I would assume that the server will handle multiple concurrent requests in multiple processes; if so, then I don't think that an async backend would make much of a difference: storage is the only thing the request does, so the client has to wait for the storage backend work to be complete either way.
Definitely! Made https://jira.lsstcorp.org/browse/DM-31213.
Right, I'm happy with the server being as simple as possible, and putting that extra logic in the client. I'd rather push the compute load onto clients, rather than have all the clients share the server's compute load.
Yeah, I really like this.
This seems like it might be worth doing... but I'll know more once I sit down and write some code to actually deploy this thing somewhere. It's not obvious to me yet whether env vars or command-line params are going to be more convenient.
Yeah, I had a lot of trouble making sense of how FastAPI dependencies work. If I had a dependency on a backend, would it get instantiated on every request? Or would it be persisted across requests? I want it to persist, since the Google APIs use gRPC and expect persistence. ' Anyway, I decided that it's at least easy to know how this server.py mechanism works. If I understood FastAPI better, I'd probably do something different.
Sounds good. For now, I think there won't be many more endpoints.
Eh, yeah, I think unittest is easier to reason about. pytest is so magical that I find it hard to follow, sometimes. I like how unittest is just python. pytest modifies the AST and does its own separate compilation step, so it's really a different programming language, which just has confused me too many times.
Good point! https://jira.lsstcorp.org/projects/DM/issues/DM-31215
Oh, this looks terrific. I'd like to use this, yes: https://jira.lsstcorp.org/projects/DM/issues/DM-31216 |
Add a FastAPI-based server which provides alerts and schemas according to the routes laid out in DMTN-183.
This is done in a package structure which is separate from the LSST stack because I do not expect it to depend upon the stack, and the stack's complexity is not warranted here.
I'm trusting the integration test pretty heavily. I haven't really run the command line application because there's no data to surface.