Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for storing single-cell data #1020

Draft
wants to merge 416 commits into
base: development
Choose a base branch
from
Draft

Conversation

arteymix
Copy link
Member

@arteymix arteymix commented Feb 5, 2024

TODO

  • add a CLI for loading/reloading single cell data and cell type assignments
  • extend the GEO loader to detect MEX and other supported formats and apply the appropriate strategy for loading vectors
  • Add support for HDF5-based single-cell data formats #1039
  • support saving single cell data to disk (I already added a SingleCellExpresionDataMatrix, we need to finish the work and write it to file). I think MEX is a pretty decent output format for this.

REST API

  • review which fields should be exposed on the REST API for filtering purposes
  • add aliases in the REST API to refer to the preferred single cell dimension and cell assignment

@arteymix arteymix force-pushed the feature-single-cell branch 6 times, most recently from 9222a95 to 788a61b Compare February 7, 2024 20:13
@arteymix arteymix force-pushed the feature-single-cell branch 2 times, most recently from f04ae2f to d791771 Compare February 7, 2024 20:23
@arteymix arteymix force-pushed the feature-single-cell branch from b60a8de to 0ce142a Compare February 8, 2024 00:23
@arteymix arteymix force-pushed the feature-single-cell branch from 80c6409 to e804d92 Compare February 13, 2024 03:40
@arteymix arteymix force-pushed the feature-single-cell branch 4 times, most recently from b2c8a8b to 6a993b7 Compare February 19, 2024 20:47
@arteymix arteymix added the single cell Issues related to single-cell data support label Feb 20, 2024
@arteymix arteymix self-assigned this Feb 21, 2024
@arteymix arteymix force-pushed the feature-single-cell branch from b7d4810 to 7c29995 Compare February 21, 2024 23:26
@arteymix arteymix linked an issue Feb 25, 2024 that may be closed by this pull request
3 tasks
@arteymix
Copy link
Member Author

I'm in the process of merging the dev branch to get this work up-to-date.

@@ -130,6 +130,13 @@
<!-- cannot be non-null because subsets and generic experiments don't have curation details -->
<column name="CURATION_DETAILS_FK" not-null="false" sql-type="BIGINT" unique="true"/>
</many-to-one>
<set name="singleCellExpressionDataVectors" lazy="true" fetch="select" inverse="true"
cascade="all-delete-orphan">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the -delete-orphan and manage vectors the same way we do for raw and processed ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That include bulk insertion, removal, etc.

Move classes that analyzes single-cell data in ubic.gemma.core.analysis.singleCell.

Extract logic that converts data from different scale type.
Don't require -qtName to be supplied and in the case of no set of
vectors matching exactly the one being imported, find one by name.
Add a new interface BioAssaySetValueObject and use it in the code so
that we don't assume that an ExpressionExperimentSubsetValueObject
derives from an ExpressionExperimentValueObject.

Reintroduce the DoubleVectorValueObject cache. Since we're not caching
processed vector in Hibernate anymore, this now make sense to do.

Make sure that operations on ProcessedExpressionDataVectorService evict
the vectors and vectors by gene caches.

Make sure that we create copies when setting a minimum P-value or rank
on a vector as to not contaminate the content of the cache. Ideally,
those models should be immutable.

Regenerate the DWR client code as we're adding a new model.
Add an ExpressionDataPrimitiveDoubleMatrix interface to provide unboxed
doubles from bulk and single-cell data matrices.

Rename DoubleSingleCellExpressionDataMatrix to
SingleCellExpressionDataDoubleMatrix for consistency with other
matrix names.
Make sure that services are transient in tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment