ValueError index must be monotonic increasing or decreasing when Indexing zarr Dataset with sel() Method #8812
Replies: 2 comments 2 replies
-
Thanks for your question @dimplejaiin. This ValueError is expected when using
I suspect you're on the right track that by modifying your dataset creation code you can likely avoid NaNs in your coordinates. If you can share fully reproducible code that would be helpful. I'd also suggest looking at https://corteva.github.io/rioxarray/html/examples/reproject_match.html and https://github.com/opendatacube/odc-stac. It can also be helpful to construct synthetic datasets to hone in on the root of the problem in situations like these. For example you can ignore Zarr and just focus on in-memory behavior of Xarray: import xarray as xr
import numpy as np
rng = np.random.default_rng()
shape = (4143, 4793, 13, 1)
data1 = rng.random(shape, dtype='float32')
data2 = rng.random(shape, dtype='float32')
# Create dimensions and coordinates
dims=['y', 'x', 'band', 'time']
x = np.linspace(-106.4, -105.9, 4793)
y = np.linspace(35.52, 35.1, 4143)
# Note last coordinate appears to be a NaN
x[-1] = np.nan
coords={'y': y,
'x': x,
'band': np.arange(13),
'time': np.arange(1)}
# Create synthetic dataset
da1 = xr.DataArray(data1, dims=dims, coords=coords)
da2 = xr.DataArray(data2, dims=dims, coords=coords)
ds = xr.concat([da1, da2], dim='time').to_dataset(name='data')
ds['time'] = ['2024-03-03', '2024-03-01']
#ds['time'] = pd.to_datetime(['2024-03-03', '2024-03-01']) # Note better to use datetime dtypes
# Access to Pandas Indexes and methods
print(ds.indexes['y'].is_monotonic_decreasing) #True
print(ds.indexes['x'].is_monotonic_increasing) #False due to NaN
# ValueError: index must be monotonic increasing or decreasing
ds.sel(x=-106.39, y=35.525, method='nearest') |
Beta Was this translation helpful? Give feedback.
-
Thank You for your response @scottyhq Below are the detailed steps to reproduce. Here, I'm not creating multiple data arrays, just updating the Zarr store and applying padding based on the maximum dimensions from two Sentinel2 tiles. Steps to Reproduce the Issue
Here's the code to reproduce the issue: import os
import xarray
from fsspec.mapping import FSMap
import rioxarray
from fsspec.implementations.local import LocalFileSystem
from typing import Dict
import numpy as np
def get_max_dimension(mosaic_files: Dict[str, str]):
y_max = max(
[rioxarray.open_rasterio(file).sizes["y"] for file in mosaic_files.values()]
)
x_max = max(
[rioxarray.open_rasterio(file).sizes["x"] for file in mosaic_files.values()]
)
return x_max, y_max
def _update_zarr_store_with_mosaic(
path_to_mosaic_file: str,
date: str,
index: int,
y_max,
x_max
) -> bool:
local_fs: LocalFileSystem = LocalFileSystem()
parent_path = "/Users/work/test/"
zarr_store_path = os.path.join(parent_path, "datacube")
zarr_store: FSMap = local_fs.get_mapper(root=zarr_store_path)
data = rioxarray.open_rasterio(path_to_mosaic_file, chunks=True)
# Pad the data for 'y' dimension if it's smaller than the y_max
if data.sizes["y"] < y_max:
padding_y = y_max - data.sizes["y"]
data = data.pad(y=(0, padding_y), constant=np.nan)
# Pad the data for 'x' dimension if it's smaller than the x_max
if data.sizes["x"] < x_max:
padding_x = x_max - data.sizes["x"]
data = data.pad(x=(0, padding_x), constant=np.nan)
data = data.expand_dims({"time": 1})
data = data.assign_coords(time=[date])
data.attrs = {}
dataset = data.to_dataset(name="data")
try:
if index == 0:
# during first push, we need to create time dimension hence we don't call append dim
zarr_out = dataset.to_zarr(store=zarr_store)
else:
# during consequent pushes to zarr store, we want new data along the time dimension
zarr_out = dataset.to_zarr(store=zarr_store, append_dim="time")
except Exception:
return False
return True
if __name__ == "__main__":
date_to_mosaic_file_path_dict = {
"2024-03-01": "2024_03_01_mosaic.tif",
"2024-03-03": "2024_03_03_mosaic.tif"
}
xarray.show_versions()
x_max, y_max = get_max_dimension(date_to_mosaic_file_path_dict)
for idx, date in enumerate(sorted(date_to_mosaic_file_path_dict.keys())):
_update_zarr_store_with_mosaic(date_to_mosaic_file_path_dict[date], date, idx, y_max, x_max) Code to read and fetch value at a particular coordinate: import xarray as xr
data_set = xr.open_zarr(store='/Users/work/test/datacube/', chunks=None)
data_set.sel({'x':-106.40856573808874,'y': 35.51543260935861}, method='nearest').to_dict() I hope this information is sufficient for diagnosing the issue. Looking forward to your insights or suggestions on how to resolve this. Thank you for your assistance |
Beta Was this translation helpful? Give feedback.
-
When attempting to index a zarr dataset, constructed from two timestamp tiles of Sentinel2 imagery, using xarray's sel() method, I encounter a ValueError stating that the index must be monotonic increasing or decreasing. Intriguingly, reversing the order of the tiles when creating the zarr dataset and subsequently indexing it resolves the problem. The error may not stem directly from issues related to the dataset's time dimension. Instead, it appears more likely related to how NaN values within the dataset are handled.
Steps to Reproduce
Expected Behavior
The sel() method should successfully index the dataset without requiring the indices to be in a strictly monotonic order, or it should handle non-monotonic indices in a more intuitive way.
Actual Behavior
A ValueError is raised, stating:
Environment
Python Version: 3.9.7
Xarray Version: 2024.2.0
Pandas Version: 2.1.1
NumPy Version: 1.24.3
Zarr Version: 2.16.1
Additional Information
Orignial tile order dataset:-
Reverse tile order dataset:-
Seeking Suggestions
I appreciate any insights or recommendations on how to address this issue.
Beta Was this translation helpful? Give feedback.
All reactions