DM-30446: Implement DMTN-183's alert database server #1

spenczar · 2021-07-13T19:53:54Z

Add a FastAPI-based server which provides alerts and schemas according to the routes laid out in DMTN-183.

This is done in a package structure which is separate from the LSST stack because I do not expect it to depend upon the stack, and the stack's complexity is not warranted here.

I'm trusting the integration test pretty heavily. I haven't really run the command line application because there's no data to surface.

Add a FastAPI-based server which provides alerts and schemas according to the routes laid out in DMTN-183. This is done in a package structure which is separate from the LSST stack because I do not expect it to depend upon the stack, and the stack's complexity is not warranted here.

Moved the Alert DB's backend implementations into alertdb/storage.py. Fixed the server's responses so they provide raw bytes rather than JSON-encoding the byte sequences. Added an integration test. Added requests dependency for the integration test.

Fix an invalid import, and correctly spell 'AssertionError'.

jonathansick

This is fantastic. I really appreciate how this architecture cleanly abstracts the storage backend so that storage can be implemented multiple ways without needing to change the web service or orchestration layers.

I noticed that the server is built with FastAPI but everything about the app is synchronous, from the request handlers to the backend interface. Would you consider building around asyncio? Like, build the FileBackend with aiofiles. For the S3-compatible backends, I don't know, there is aiobotocore, aioboto3 and ways to use boto3 itself concurrently. I honestly don't know what sort of performance gain you'd get from using asyncio, but it might be something to look into.

I'd recommend getting some GitHub Actions-based CI up before going much further. In particular, linting with flake8 and type checking with mypy would be big wins (the annotations in place). I also think that it'd be beneficial to run tests against the FileBackend and have those run in GitHub Actions. You could either generate sample data on the fly or commit some sample data in a tests/data directory.

My last big comment is totally beyond the scope of this PR and more a comment for DMTN-183. I wonder if there's a use case for GET /v1/alerts/:alert_id to take an Accept header and either return the Confluent Wire Format or indeed, decode it into JSON given an application/json mime type. I say this because this alert server and alert database now serve to abstract the Kafka infrastructure away, and so perhaps it makes sense to go all and not force the Avro encoding on consumers? You could even start to subset fields in the response, etc. This is sort of pie in the sky thinking and in fact, you may not want the server to take on extra compute load doing this.

Everything else is sort of thoughts for making this service more amenable to containerization and/or opinions:

Rather than maintaining your app's dependencies in your setup.cfg, consider a system that lets you pin dependencies (requirements.txt) for repeatable container builds, while still allowing you to only thinking about your abstract dependencies. In SQuaRE we've got a system based on pip-tools that compiles our abstract dependencies in a requirements/main.in file into concrete dependencies in a requirements/main.txt file.
At some point, you may want a configuration object that gathers configurations from environment variables (we like pydantic.BaseSettings). You might find this more convenient and maintainable than gather settings via the alertdb.py script. In this mode, you application start up module would build the config object first and then build the backend based on that configuration. Specifically, I think the "FastAPI way" of doing this would be as a dependency and that dependency would be available as an argument on the web hander functions. Its definitely different, but that's sort of the explanation for why FastAPI recommends the app be a global singleton as you mention in your server.py comment.
If you add more endpoints, you may want to split them out into another module. Something I like to also do is create a sub-app to encompass my versioned endpoints (so a sub-app for v1). This is something I'm still working out, and I don't know if means all my dependencies and middleware need to be replicated, if necessary between each sub-app.
I don't know whether you prefer unittest or not, but we write all our tests directly as pytest functions and make use of fixtures for things like getting a test client or setting up a test dataset.
Once you start running in Kubernetes, you'll want to add a health check endpoint that indicates whether the app is ready to take requests, and whether the application continues to be healthy (they can either be the same endpoint, or two different endpoints). The simplest way to start doing that is implementing a GET / endpoint that just returns 200. A more sophisticated set of health check endpoints would be one that also checks if the app can talk to the backend.
For logging, we use structlog in combination with our own FastAPI middleware in Safir to inject context about the request. This may be more relevant if the service layer becomes more complex and you need to piece together log statements from multiple places and correlated them by request.

jonathansick · 2021-07-27T14:14:57Z

alertdb/storage.py

+        >>> import io
+        >>> alert_payload = backend.get_alert("alert-id")
+        >>> alert_payload = gzip.decompress(alert_payload)
+        >>> alert_


spenczar · 2021-07-27T16:42:12Z

This is fantastic. I really appreciate how this architecture cleanly abstracts the storage backend so that storage can be implemented multiple ways without needing to change the web service or orchestration layers.

I noticed that the server is built with FastAPI but everything about the app is synchronous, from the request handlers to the backend interface. Would you consider building around asyncio?

Yeah, this is worth looking into. There's no official support in Google's client (googleapis/python-storage#478). There's a third-party library that provides it though (https://github.com/talkiq/gcloud-aio).

I don't know very much about how ASGI works, so I don't know how much this would matter. I would assume that the server will handle multiple concurrent requests in multiple processes; if so, then I don't think that an async backend would make much of a difference: storage is the only thing the request does, so the client has to wait for the storage backend work to be complete either way.

I'd recommend getting some GitHub Actions-based CI up before going much further.

Definitely! Made https://jira.lsstcorp.org/browse/DM-31213.

this alert server and alert database now serve to abstract the Kafka infrastructure away, and so perhaps it makes sense to go all and not force the Avro encoding on consumers?

Right, I'm happy with the server being as simple as possible, and putting that extra logic in the client. I'd rather push the compute load onto clients, rather than have all the clients share the server's compute load.

Rather than maintaining your app's dependencies in your setup.cfg, consider a system that lets you pin dependencies (requirements.txt) for repeatable container builds, while still allowing you to only thinking about your abstract dependencies.

Yeah, I really like this. pip-tools looks awesome. https://jira.lsstcorp.org/browse/DM-31214

At some point, you may want a configuration object that gathers configurations from environment variables (we like pydantic.BaseSettings).

This seems like it might be worth doing... but I'll know more once I sit down and write some code to actually deploy this thing somewhere. It's not obvious to me yet whether env vars or command-line params are going to be more convenient.

Specifically, I think the "FastAPI way" of doing this would be as a dependency and that dependency would be available as an argument on the web hander functions. Its definitely different, but that's sort of the explanation for why FastAPI recommends the app be a global singleton as you mention in your server.py comment.

Yeah, I had a lot of trouble making sense of how FastAPI dependencies work. If I had a dependency on a backend, would it get instantiated on every request? Or would it be persisted across requests? I want it to persist, since the Google APIs use gRPC and expect persistence. '

Anyway, I decided that it's at least easy to know how this server.py mechanism works. If I understood FastAPI better, I'd probably do something different.

If you add more endpoints, you may want to split them out into another module. Something I like to also do is create a sub-app to encompass my versioned endpoints (so a sub-app for v1).

Sounds good. For now, I think there won't be many more endpoints.

I don't know whether you prefer unittest or not, but we write all our tests directly as pytest functions and make use of fixtures for things like getting a test client or setting up a test dataset.

Eh, yeah, I think unittest is easier to reason about. pytest is so magical that I find it hard to follow, sometimes. I like how unittest is just python. pytest modifies the AST and does its own separate compilation step, so it's really a different programming language, which just has confused me too many times.

Once you start running in Kubernetes, you'll want to add a health check endpoint

Good point! https://jira.lsstcorp.org/projects/DM/issues/DM-31215

For logging, we use structlog in combination with our own FastAPI middleware in Safir to inject context about the request. This may be more relevant if the service layer becomes more complex and you need to piece together log statements from multiple places and correlated them by request.

Oh, this looks terrific. I'd like to use this, yes: https://jira.lsstcorp.org/projects/DM/issues/DM-31216

spenczar marked this pull request as draft July 13, 2021 19:53

spenczar added 8 commits July 15, 2021 16:09

Refactor, add first integration test

e7210a8

Moved the Alert DB's backend implementations into alertdb/storage.py. Fixed the server's responses so they provide raw bytes rather than JSON-encoding the byte sequences. Added an integration test. Added requests dependency for the integration test.

Provide a 404 when an alert or schema is not found

64ada4c

Simplify integration test

12152d7

Add a bunch of documenting comments

1a800fa

Fix alertdb.py errors

8566e03

Fix an invalid import, and correctly spell 'AssertionError'.

Add some docstrings to tests.

aaf93ef

Provide a bit more help in alertdb command

653b92d

Add some instructions to the README.

b466b05

spenczar marked this pull request as ready for review July 16, 2021 21:06

spenczar changed the title ~~Add a first sketch of an alert database server~~ Implement DMTN-183's alert database server Jul 16, 2021

spenczar changed the title ~~Implement DMTN-183's alert database server~~ DM-30446: Implement DMTN-183's alert database server Jul 16, 2021

spenczar requested a review from jonathansick July 22, 2021 19:50

jonathansick approved these changes Jul 27, 2021

View reviewed changes

Fix docstring typo

8a902c5

spenczar merged commit e780649 into main Jul 27, 2021

spenczar deleted the tickets/DM-30446 branch July 27, 2021 17:19

This was referenced Aug 9, 2021

DM-31215: Add health check endpoint to the server #3

Merged

DM-31214: Pin dependencies #4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-30446: Implement DMTN-183's alert database server #1

DM-30446: Implement DMTN-183's alert database server #1

spenczar commented Jul 13, 2021 •

edited

Loading

jonathansick left a comment

jonathansick Jul 27, 2021

spenczar commented Jul 27, 2021

DM-30446: Implement DMTN-183's alert database server #1

DM-30446: Implement DMTN-183's alert database server #1

Conversation

spenczar commented Jul 13, 2021 • edited Loading

jonathansick left a comment

Choose a reason for hiding this comment

jonathansick Jul 27, 2021

Choose a reason for hiding this comment

spenczar commented Jul 27, 2021

spenczar commented Jul 13, 2021 •

edited

Loading