-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polish on books #916
Polish on books #916
Conversation
puts "[PopulateController] the size of the serialized data is #{number_to_human_size(serialized_data.bytesize)}" | ||
compressed_data = Zlib::Deflate.deflate(serialized_data) | ||
puts "[PopulateController] the size of the compressed data is #{number_to_human_size(compressed_data.bytesize)}" | ||
@book.populate_file.attach(io: StringIO.new(compressed_data), filename: "book_populate_#{@book.id}.bin.zip", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epugh I am not versed in ActiveJob
, or rails generally.
Storing compressed blobs in the database is probably fine considering what I imagine are relatively small deployments of Quepid.
For the hosted Quepid service, it is a bit more worrying.
My web experience is a lot of Django and Celery. I would typically save a large file to a blob store like S3, and then download in the async tasks.
Or, if i could guarantee a file was only required by one async task, or that all related tasks would be on the same machine, I would save to local disk.
Btw extra documentation of the API is awesome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback. Yeah, generally with ActiveStorage you back it with s3.. I went with DB for now just to avoid adding "yet anohter quepid setup task" and that the blobs shouldn't last a long time. the app.quepid.com on heroku means that the local disk doesn't work cause each web worker has it's own local disk :-(.. WE'll see!
Description
Moving from a few hundred queries at most to supporting thousands of queries requires rethinking a number of the interaction models. The amount of processing you can complete in a couple of seconds web request/response cycle is limited. This large PR does a number of things:
Combine the use of ActiveJob for background processing and ActiveStorage for managing temporary large files lets us move processing work to a background thread, freeing up the web request/response cycle. This has been done for populating a book from a case and importing a book from JSON.
In testing populating a book with 5000 queries each with 10 docs, so 50,000 objects, the web frontend easily posts a 87 MB JSON payload to the backend. It's the processing of all that data that takes a long time. With compression, we can take 87 mb json down to 21 mb zipped file, and store that. Once the background job finishes, it deletes that file. So the fact that we are storing a big binary in our database for a period of time is probably okay.
In the vein of being more cognizant of what lots of queries means, now when you close out the Judgements modal we only re run the queries if you have imported new ratings from the book. Just opening it up and then closing it won't prompt rerunning the queries.
We've also made the integration from Case to a create a book a bit better by passing along the teams that a Case is already shared with to pre-populate the Book creation screen.
Starting to see more demand to interact with Quepid via it's APIs, so we documented them more. Specifically the ones around interacting with a book, query doc pairs, and judgements! SO check out
/apipie
for more info!You can now round trip all the attributes of a query to a book and back to a new query in a new case. We preserve the information_need, notes, and any options defined on a query through that lifecycle.
Exposing a proper screen to import an exported book's JSON file. Plus better handling of the exported JSON file.
Motivation and Context
How Has This Been Tested?
Screenshots or GIFs (if appropriate):
Types of changes
Checklist: