XArrays, torch.Tensor, metadata and named dimensions #369

jdeschamps · 2025-01-23T09:15:35Z

Problem

Very early on, we explored using XArrays rather than pure numpy.ndarray. The reason was to add clarity about the dimensions, but it could also be used to add TileInformation:

import xarray as xr
import numpy as np

# Create a DataArray with metadata
data = xr.DataArray(
    np.zeros((1, 1, 64, 64)),
    dims=("s", "c", "y", "x"),
    # one could also make use of data.coords to store some info
    attrs={"file": "/path/to/file.tiff", "tile_ID": "meters", "tile_coords": [0, 0, 128, 128]}
)

print(data.attrs)

Now, I don't remember why we ended up not using it, but clearly one of the problem we would run into is that torch.Tensor cannot store the metadata (attrs). But I just learned that torch.Tensor can store labeled dimensions, which is nice, but maybe not strictly necessary for us.

Crazy idea

Could we create a subclass of torch.Tensor that maintains the Tensor API for the computational part (and can therefore be passed to all of PyTorch computation), but that would hold metadata equivalent to the metada stored in the XArray?

Of course, one of the problem is to make sure that internally PyTorch does not get rid of it somewhere (through a copy() for instance). But it seems that torch has a mechanism for that.

[edit]: they actually have an example: https://pytorch.org/docs/main/notes/extending.html#extending-torch-with-a-tensor-wrapper-type ! They do warn about potential issues with some operations, but we could try out to see if anything we do is a problem.

The advantages for us:

We could remove TileInformation, no need to collate it
Tiling could be detected through checking the metadata for the Tensors
May solve the problem of the file ID and tile ID mix up (Enable stitching of non-sequentially ordered tiles #358)
May help carry info on the origin of Zarr arrays
No need to carry axes everywhere, we now at every point what they are, which also helps visualize data afterwards
We could introduce scale metadata (e.g. pixel size) which is very useful to scientists and helpful for viewing the images

The text was updated successfully, but these errors were encountered:

jdeschamps added the feature New feature or request label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XArrays, torch.Tensor, metadata and named dimensions #369

XArrays, torch.Tensor, metadata and named dimensions #369

jdeschamps commented Jan 23, 2025 •

edited

Loading

XArrays, torch.Tensor, metadata and named dimensions #369

XArrays, torch.Tensor, metadata and named dimensions #369

Comments

jdeschamps commented Jan 23, 2025 • edited Loading

Problem

Crazy idea

jdeschamps commented Jan 23, 2025 •

edited

Loading