Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XArrays, torch.Tensor, metadata and named dimensions #369

Open
jdeschamps opened this issue Jan 23, 2025 · 0 comments
Open

XArrays, torch.Tensor, metadata and named dimensions #369

jdeschamps opened this issue Jan 23, 2025 · 0 comments
Labels
feature New feature or request

Comments

@jdeschamps
Copy link
Member

jdeschamps commented Jan 23, 2025

Problem

Very early on, we explored using XArrays rather than pure numpy.ndarray. The reason was to add clarity about the dimensions, but it could also be used to add TileInformation:

import xarray as xr
import numpy as np

# Create a DataArray with metadata
data = xr.DataArray(
    np.zeros((1, 1, 64, 64)),
    dims=("s", "c", "y", "x"),
    # one could also make use of data.coords to store some info
    attrs={"file": "/path/to/file.tiff", "tile_ID": "meters", "tile_coords": [0, 0, 128, 128]}
)

print(data.attrs)

Now, I don't remember why we ended up not using it, but clearly one of the problem we would run into is that torch.Tensor cannot store the metadata (attrs). But I just learned that torch.Tensor can store labeled dimensions, which is nice, but maybe not strictly necessary for us.

Crazy idea

Could we create a subclass of torch.Tensor that maintains the Tensor API for the computational part (and can therefore be passed to all of PyTorch computation), but that would hold metadata equivalent to the metada stored in the XArray?

Of course, one of the problem is to make sure that internally PyTorch does not get rid of it somewhere (through a copy() for instance). But it seems that torch has a mechanism for that.

[edit]: they actually have an example: https://pytorch.org/docs/main/notes/extending.html#extending-torch-with-a-tensor-wrapper-type ! They do warn about potential issues with some operations, but we could try out to see if anything we do is a problem.

The advantages for us:

  • We could remove TileInformation, no need to collate it
  • Tiling could be detected through checking the metadata for the Tensors
  • May solve the problem of the file ID and tile ID mix up (Enable stitching of non-sequentially ordered tiles #358)
  • May help carry info on the origin of Zarr arrays
  • No need to carry axes everywhere, we now at every point what they are, which also helps visualize data afterwards
  • We could introduce scale metadata (e.g. pixel size) which is very useful to scientists and helpful for viewing the images
@jdeschamps jdeschamps added the feature New feature or request label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant