Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading oxidicom to support "folder-based" CUBE #2

Open
jennydaman opened this issue Jun 3, 2024 · 0 comments · May be fixed by #3
Open

Upgrading oxidicom to support "folder-based" CUBE #2

jennydaman opened this issue Jun 3, 2024 · 0 comments · May be fixed by #3

Comments

@jennydaman
Copy link
Contributor

jennydaman commented Jun 3, 2024

FNNDSC/ChRIS_ultron_backEnd#551 introduced major breaking changes to how CUBE works and how PACSFiles are stored in CUBE. Here, the issues regarding oxidicom and the new "folder-based" CUBE are described.

Will it be easy to change oxidicom?

Yes.

  • Jorge implemented a new "bulk registration API endpoint."
  • registration_synchronizer.rs tracks the state of all tasks for each DICOM series being received. Currently, this is used to guarantee that the "Oxidicom Custom Metadata" OxidicomAttemptedPushCount=N file is the last file to appear for a given series. The code here will be easy to change so that it sends a HTTP request to the "bulk registration API endpoint" as the final action instead.
  • chrisdb_client.rs will become obsolete.

How will the performance compare between oxidicom v2 and the new CUBE?

By wild guess, I estimate performance will be 4-20x worse, by comparing oxidicom's Rust implementation to the Python code here:

https://github.com/FNNDSC/ChRIS_ultron_backEnd/pull/551/files#diff-e3063c776a0f053b209d03d96505628a527e2ded3acfd362fd4aa2155cdb12dfR45-R156

  • Oxidicom is written in Rust, a high-performance language, whereas ChRIS_ultron_backEnd is written in Python, which is infamously slow.
  • Oxidicom's code is highly optimized and aggressively parallel, whereas the code in ChRIS_ultron_backEnd would need to do costly checks that are synchronous and superfluous such as polling storage a hard-coded 30 times to check whether the given value for ndicom is equal to the number of files in storage under the given path.
  • Overhead is introduced by the need to serialize data as JSON, send over HTTP, parse, and deserialize.
  • The batch size of oxidicom is configurable (default 20) whereas the new "bulk registration API endpoint" only supports registering a series in its entirety in one go, whether the series is 1 file of 1,000 files.

Even though the performance will be worse, I do not think performance should be a big concern.

Performance with the "new CUBE" will be worse than oxidicom v2, but better than with oxidicom v1. most crucial aspect is that the PACSFiles of a series are now being created in bulk using the Django method call PACSFile.objects.bulk_create(files).

Solution Optimized Bulk Creation
oxidicom v1 ✔️
oxidicom v2 ✔️ ✔️
"new CUBE" ✔️

The importance of bulk creation >> optimization. The lack of optimization is a worthwhile trade-off to improve maintainability (it is a bad practice for two services to be coupled via sharing a database schema.)

Note that oxidicom v2 is fast. In ideal conditions, it can register a series (200 files) in 50ms. 10x slower than 50ms is half a second, so my estimate of "the new CUBE"'s performance would still be acceptable. However, that is the case for ideal conditions. Our internal deployment of CUBE is deployed on NFS storage and interfaces with a slow, proprietary PACS server. These bottlenecks mean that oxidicom might take 1 second (of busy time) to register the series, and 10x slower means a (wildly) estimated 10 seconds to register a series.

Should oxidicom register files directly to the database?

Oxidicom v2 achieves optimized performance because it writes to the database directly, bypassing the slow APIs provided by CUBE. This is less of a good idea in the "new CUBE," because the new CUBE's schema and semantics are much more complicated.

  • In the old CUBE, oxidicom needed to write to 2 tables (pacsfiles_pacs and pacsfiles_pacsfile).
  • In the new CUBE, oxidicom would need to write to 4 (or more) tables: PACS identifiers, PACS series, files, and folders.

How will "Oxidicom Custom Metadata" work in the "new CUBE?"

In the "new CUBE," SeriesInstanceUID must be unique for series. This new restriction is very much reasonable, however a new restriction is always going to be a backwards-incompatible change. Solving this problem will be easy: "Oxidicom Custom Metadata" files (NumberOfSeriesRelatedInstances=M and OxidicomAttemptedPushCount=N files) will need to be registered with a prefix or suffix, e.g. SeriesInstanceUID=oxidicom-{SeriesInstanceUID}

What are some problematic changes or blockers?

If we use the new PACSFiles "bulk registration API endpoint," there will be no way to obtain progress information on the registration of files to CUBE. ChRIS_ui will not be able to show progress bars for PACS retrieve operations.

Major changes to ChRIS_ui and/or oxidicom can be made so that ChRIS_ui can show progress bars for the reception of DICOM data from PACS. It will not be impossible to show progress information on the registration of files to CUBE. Hopefully, this will be fine because ideally the bulk registration of files to CUBE should be "fast."

Currently, users are using oxidicom v2 and they will be used to and expecting of oxidicom v2's performance. When we upgrade to "the new CUBE," users will be upset about the lack of progress bars and inferior performance. These two problems are worse because they coincide.

What would a better solution be?

See FNNDSC/ChRIS_ultron_backEnd#557

Conclusion?

None, I don't have any real arguments to make. The purpose of this issue is to simply describe the current situation in an unbiased manner. However, my act of writing this implies I am hesitant about "the new CUBE." I advocate for us to spend more effort on discussion of solutions before implementation rather than cutting corners to make the release happen sooner. I think the predicted target of deploying "the new CUBE" in one month is much too soon.

Or, even more concretely:

  • I don't think oxidicom should use the "new PACSFiles bulk registration API."
  • I don't think oxidicom should interface with Postgres directly in the new CUBE like how it does in oxidicom v2.
  • I endorse PACS retrieve jobs ChRIS_ultron_backEnd#557
@jennydaman jennydaman linked a pull request Sep 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant