Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish on books #916

Merged
merged 36 commits into from
Jan 17, 2024
Merged

Polish on books #916

merged 36 commits into from
Jan 17, 2024

Conversation

epugh
Copy link
Member

@epugh epugh commented Jan 3, 2024

Description

Moving from a few hundred queries at most to supporting thousands of queries requires rethinking a number of the interaction models. The amount of processing you can complete in a couple of seconds web request/response cycle is limited. This large PR does a number of things:

  • Combine the use of ActiveJob for background processing and ActiveStorage for managing temporary large files lets us move processing work to a background thread, freeing up the web request/response cycle. This has been done for populating a book from a case and importing a book from JSON.

  • In testing populating a book with 5000 queries each with 10 docs, so 50,000 objects, the web frontend easily posts a 87 MB JSON payload to the backend. It's the processing of all that data that takes a long time. With compression, we can take 87 mb json down to 21 mb zipped file, and store that. Once the background job finishes, it deletes that file. So the fact that we are storing a big binary in our database for a period of time is probably okay.

  • In the vein of being more cognizant of what lots of queries means, now when you close out the Judgements modal we only re run the queries if you have imported new ratings from the book. Just opening it up and then closing it won't prompt rerunning the queries.

  • We've also made the integration from Case to a create a book a bit better by passing along the teams that a Case is already shared with to pre-populate the Book creation screen.

  • Starting to see more demand to interact with Quepid via it's APIs, so we documented them more. Specifically the ones around interacting with a book, query doc pairs, and judgements! SO check out /apipie for more info!

  • You can now round trip all the attributes of a query to a book and back to a new query in a new case. We preserve the information_need, notes, and any options defined on a query through that lifecycle.

  • Exposing a proper screen to import an exported book's JSON file. Plus better handling of the exported JSON file.

Motivation and Context

How Has This Been Tested?

Screenshots or GIFs (if appropriate):

Types of changes

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] Improvement (non-breaking change which improves existing functionality)
  • [] New feature (non-breaking change which adds new functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [] My code follows the code style of this project.
  • [] My change requires a change to the documentation.
  • [] I have updated the documentation accordingly.
  • [] I have read the CONTRIBUTING document.
  • [] I have added tests to cover my changes.
  • [] All new and existing tests passed.

@epugh epugh temporarily deployed to quepid-pr-916 January 15, 2024 22:01 Inactive
@epugh epugh temporarily deployed to quepid-pr-916 January 17, 2024 14:25 Inactive
@epugh epugh merged commit 56c1c48 into main Jan 17, 2024
4 checks passed
puts "[PopulateController] the size of the serialized data is #{number_to_human_size(serialized_data.bytesize)}"
compressed_data = Zlib::Deflate.deflate(serialized_data)
puts "[PopulateController] the size of the compressed data is #{number_to_human_size(compressed_data.bytesize)}"
@book.populate_file.attach(io: StringIO.new(compressed_data), filename: "book_populate_#{@book.id}.bin.zip",
Copy link

@dacox dacox Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epugh I am not versed in ActiveJob, or rails generally.

Storing compressed blobs in the database is probably fine considering what I imagine are relatively small deployments of Quepid.

For the hosted Quepid service, it is a bit more worrying.

My web experience is a lot of Django and Celery. I would typically save a large file to a blob store like S3, and then download in the async tasks.

Or, if i could guarantee a file was only required by one async task, or that all related tasks would be on the same machine, I would save to local disk.

Btw extra documentation of the API is awesome

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback. Yeah, generally with ActiveStorage you back it with s3.. I went with DB for now just to avoid adding "yet anohter quepid setup task" and that the blobs shouldn't last a long time. the app.quepid.com on heroku means that the local disk doesn't work cause each web worker has it's own local disk :-(.. WE'll see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants