Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CZI Essential OSS Cycle 3 #1

Open
FrancescAlted opened this issue Jul 29, 2020 · 7 comments
Open

CZI Essential OSS Cycle 3 #1

FrancescAlted opened this issue Jul 29, 2020 · 7 comments

Comments

@FrancescAlted
Copy link
Member

FrancescAlted commented Jul 29, 2020

We are planning to apply to a CZI Essential OSS Cycle 3 via NumFOCUS. This issue is meant as a discussion tool with the community.

We have a tight deadline (Aug 4th), but thanks to a good handful of people, the goals are pretty well defined already:

  1. Create a codec specific for n-dim data in Blosc2
  2. Create interfaces for the existing Python ecosystem
  3. Approach existing and new communities
  4. Support new biomedical applications

We still need to isolate the tasks and create a budget. After that, we will send the application to NumFOCUS for their review. With your collaboration and a bit of luck I think we can manage to apply on-time.

@Blosc/core-devs
@scopatz

@FrancescAlted
Copy link
Member Author

For goal 1, creating a codec specific for n-dim data, @oscargm98 already started exploratory work at https://github.com/oscargm98/c-blosc2/tree/Blosc-ndlz. Hopefully this could be consolidated next year.

@FrancescAlted
Copy link
Member Author

For goal 2, we are hoping that our work on Blosc2/Caterva will benefit the Zarr community (@zarr-developers/core-devs), but also we would like to facilitate plugins for xarray (@shoyer), dask (@mrocklin) and napari (@jni). In particular, we hope that the new backend system for xarray would have been finished soon so that we can leverage it. On its hand napari seems to have a nice plugin system already in-place, and I think that providing the necessary interfaces to dask would be fine too.

@FrancescAlted
Copy link
Member Author

FrancescAlted commented Jul 29, 2020

Goal 3 will require quite a lot of work on docs and community interation. @albertosm27 has already done a good job of setting up a nice initial site for Blosc-related docs, but documention continues to be sparsed around many different places. We also need more work on tutorials for make the new comers to easly grasp the basics about Blosc2/Caterva. Finally, API/format safety issues are important here and even though @nmoinvaz is making a good job here, we still need quite a bit of more work in this area.

@FrancescAlted
Copy link
Member Author

Regarding goal 4, biomedical applications are important for CZI, and I am happy that we have onboard Brent Pedersen (@brentp) and Josh Moore (@joshmoore) who are strong the in the fields of genomics and microscopy applications so as to guide us on the requirements in this fields and make our software more useful for them.

@FrancescAlted
Copy link
Member Author

Maybe a bit late, but @kif would be interested in this initiative too.

@shoyer
Copy link

shoyer commented Jul 29, 2020

For goal 2, we are hoping that our work on Blosc2/Caterva will benefit the Zarr community (@zarr-developers/core-devs),

This sounds very exciting! Ping @alimanfoo for zarr.

In particular, we hope that the new backend system for xarray would have been finished soon so that we can leverage it.

To clarify: do you hope to implement something more like a new file format for storing xarray data on disk, or a new computation backend for working with xarray arrays in memory? We already have pretty good support for the later via NumPy's __array_function__ interface. See xarray's roadmap for more elaboration on these ("flexible storage" vs "flexible arrays")

@FrancescAlted
Copy link
Member Author

For goal 2, we are hoping that our work on Blosc2/Caterva will benefit the Zarr community (@zarr-developers/core-devs),

This sounds very exciting! Ping @alimanfoo for zarr.

To clarify: I expect Zarr to be benefited mainly from the new features in Blosc2. Caterva is essentially a multidimensional container with its own format, so adopting that inside Zarr would mean to break forward compatibilty, and I am not sure this is a good thing. But it is up the Zarr devs to decide whether they would like to adopt Caterva inside Zarr indeed.

In particular, we hope that the new backend system for xarray would have been finished soon so that we can leverage it.

To clarify: do you hope to implement something more like a new file format for storing xarray data on disk, or a new computation backend for working with xarray arrays in memory? We already have pretty good support for the later via NumPy's __array_function__ interface. See xarray's roadmap for more elaboration on these ("flexible storage" vs "flexible arrays")

I was referring more to the former: adding a new file format for storing xarray data on disk. My understanding is that this process is bit involved currently, and hoping you are trying to make the support of new storage backends easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants