You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AI in biology is becoming a big thing. One thing that would be a huge benefit to the field that Huggingface Datasets doesn't currently have is native support for sparse arrays.
It would be a big deal if Hugging Face Datasets supported sparse tensors as a feature type, natively.
Motivation
This is important for example in the field of transcriptomics (modeling and understanding gene expression), because a large fraction of the genes are not expressed (zero). More generally, in science, sparse arrays are very common, so adding support for them would be very benefitial, it would make just using Hugging Face Dataset objects a lot more straightforward and clean.
Your contribution
We can discuss this further once the team comments of what they think about the feature, and if there were previous attempts at making it work, and understanding their evaluation of how hard it would be. My intuition is that it should be fairly straightforward, as the Arrow backend already supports it.
The text was updated successfully, but these errors were encountered:
Feature request
AI in biology is becoming a big thing. One thing that would be a huge benefit to the field that Huggingface Datasets doesn't currently have is native support for sparse arrays.
Arrow has support for sparse tensors.
https://arrow.apache.org/docs/format/Other.html#sparse-tensor
It would be a big deal if Hugging Face Datasets supported sparse tensors as a feature type, natively.
Motivation
This is important for example in the field of transcriptomics (modeling and understanding gene expression), because a large fraction of the genes are not expressed (zero). More generally, in science, sparse arrays are very common, so adding support for them would be very benefitial, it would make just using Hugging Face Dataset objects a lot more straightforward and clean.
Your contribution
We can discuss this further once the team comments of what they think about the feature, and if there were previous attempts at making it work, and understanding their evaluation of how hard it would be. My intuition is that it should be fairly straightforward, as the Arrow backend already supports it.
The text was updated successfully, but these errors were encountered: