Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sparse arrays with the Arrow Sparse Tensor format? #7377

Open
JulesGM opened this issue Jan 21, 2025 · 0 comments
Open

Support for sparse arrays with the Arrow Sparse Tensor format? #7377

JulesGM opened this issue Jan 21, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@JulesGM
Copy link

JulesGM commented Jan 21, 2025

Feature request

AI in biology is becoming a big thing. One thing that would be a huge benefit to the field that Huggingface Datasets doesn't currently have is native support for sparse arrays.

Arrow has support for sparse tensors.
https://arrow.apache.org/docs/format/Other.html#sparse-tensor

It would be a big deal if Hugging Face Datasets supported sparse tensors as a feature type, natively.

Motivation

This is important for example in the field of transcriptomics (modeling and understanding gene expression), because a large fraction of the genes are not expressed (zero). More generally, in science, sparse arrays are very common, so adding support for them would be very benefitial, it would make just using Hugging Face Dataset objects a lot more straightforward and clean.

Your contribution

We can discuss this further once the team comments of what they think about the feature, and if there were previous attempts at making it work, and understanding their evaluation of how hard it would be. My intuition is that it should be fairly straightforward, as the Arrow backend already supports it.

@JulesGM JulesGM added the enhancement New feature or request label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant