Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling a ROI when loading raster data #642

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

vschaffn
Copy link
Contributor

@vschaffn vschaffn commented Dec 17, 2024

Resolves GlacioHack/xdem#602

Description

This PR introduces the functionality for handling a Region of Interest (ROI) when loading raster data. Specifically, it ensures that rasters can be loaded with a specific subset of their data, as defined by a ROI. This functionality is available during the raster initialization. The ROI can be either pixel-based or georeferenced.

Changes

  1. New roi parameter in the Raster class constructor:
  • The roiparameter is introduced to specify the region of interest when loading a raster. It is a dictionary with the keys:
    • x, y, w, and h if pixel-based (where (x, y) top-left corner coordinates and (w, h) the dimensions (width, height) in pixels).
    • left, bottom, right, top and optional crs (defaults to EPSG:4326) if georeferenced, representing the bounding box of the region to load.
  • The parameter is used in the Raster class to crop the raster data during initialization. If provided, the data will be sliced accordingly, while also adjusting the transform and metadata to match the cropped area.
  1. New convert_roi method:
  • Converts the ROI from georeferenced coordinates (e.g., latitude/longitude) or pixel-based coordinates into the appropriate coordinate system and pixel indices for the target raster.
  • Ensures the ROI is correctly aligned with the raster's CRS and verifies that the resulting pixel indices are valid and within the raster dimensions.
  • Includes validation logic to ensure the converted ROI is well-formed, i.e., the bounding box is valid, and the ROI fits within the raster bounds without inverting the top/bottom or left/right limits.
  1. _load_rio function modification:
  • The _load_rio function now supports the roi parameter to load a specific subset of raster data directly from the disk.
  • It handles the creation of a Window for reading the data in the specified region, ensuring that only the relevant part of the raster is loaded into memory.
  1. Support for ROI during raster creation:
  • The from_array method now supports an optional roiparameter. This allows users to define a ROI when creating a raster from an in-memory numpy array.
  • When a ROI is provided, the data is sliced accordingly before the raster is created.

Tests

  • New test test_raster_with_roi: This test verifies the correct behavior when loading raster data with a specified ROI. It checks that the raster is properly cropped and ensures the output data matches expectations when applying either a georeferenced or pixel-based ROI. This ensures the convert_roi method functions as intended in real-world use cases.
  • Additional tests in test_init and test_from_array: These tests have been extended to validate behavior when a ROI is provided but the raster is initialized without a file path. The tests ensure that the ROI is correctly handled when working with arrays directly, confirming the method's compatibility with non-file-based rasters.

Documentation

  • The section crop in the raster documentation has been updated with the possibility to pass a ROI in parameter and crop the raster during initialization.

Example

# Create a raster with a region of interest

# ROI pixel-based
raster = Raster('path/to/raster.tif', load_data=True, roi={'x': 50, 'y': 100, 'w': 400, 'h': 300})

# ROI georeferenced
raster = Raster('path/to/raster.tif', load_data=True, roi={'left': 0, 'right': 1, 'bottom': 43, 'top': 44})

Test with xDEM

image

@rhugonnet
Copy link
Member

Great addition, thanks @vschaffn! 🙂

I have a main remark about the roi addition to _load_rio() specifically:

As it happens, the crop() function already worked with the data not loaded on disk, by reading the ROI from a window in the same way you did (but it wasn't very clear from the documentation), see here:

window=final_window_disk,

In short, doing:

rst = Raster("file.tif")
rst_cropped = rst.crop(my_roi)

worked without loading the full dataset.

We had been debating about adding the windowed reading of crop directly to _load_rio() to avoid the loading of crop being separate from all other loading of the Raster class. This would allow to use an roi/crop_geom from __init__() or from_array() directly, just as you did.

Maybe this is the occasion to merge things together?
We could modify _crop to call your new _load_rio(roi=) (might require some small modifs following the code in _crop), and merge the roi/crop_geom arguments into one, to have everything consistent?

@rhugonnet
Copy link
Member

Additionally, by merging like this, all the existing tests we have for crop() would also help ensure it works properly through __init__ and from_array! So less work on tests 😛

@vschaffn
Copy link
Contributor Author

vschaffn commented Jan 23, 2025

Great addition, thanks @vschaffn! 🙂

I have a main remark about the roi addition to _load_rio() specifically:

As it happens, the crop() function already worked with the data not loaded on disk, by reading the ROI from a window in the same way you did (but it wasn't very clear from the documentation), see here:

window=final_window_disk,

In short, doing:

rst = Raster("file.tif")
rst_cropped = rst.crop(my_roi)

worked without loading the full dataset.

We had been debating about adding the windowed reading of crop directly to _load_rio() to avoid the loading of crop being separate from all other loading of the Raster class. This would allow to use an roi/crop_geom from __init__() or from_array() directly, just as you did.

Maybe this is the occasion to merge things together? We could modify _crop to call your new _load_rio(roi=) (might require some small modifs following the code in _crop), and merge the roi/crop_geom arguments into one, to have everything consistent?

@rhugonnet I had not noticed that the crop() method already worked with the data not loaded on disk. After studying _crop() function, I think the simplest thing is to restore _load_rio() without ROI, and to use crop, as it is, before _load_rio() if a ROI is passed when the raster is initialised. This will effectively mean that there is only one function dealing with the ROI, and therefore simplified code maintenance 😃

@rhugonnet
Copy link
Member

@vschaffn Yes that's true, this would work as well 🙂. The difference would be only internal to make _crop depend on _load_rio(), so we don't need to do it (could at a later stage if we feel it is useful to avoid having independent loading calls, but that's not a priority).

@adebardo
Copy link

adebardo commented Jan 27, 2025

Okay for me, we can open an issue for the _crop dependance. #645

# Check if ROI is georeferenced or pixel_based and convert to crs
if roi_georef.issubset(roi.keys()):
crs_roi = roi.get("crs", "EPSG:4326")
transformer = Transformer.from_crs(crs_roi, crs, always_xy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have a function to convert points from one CRS to another, which saves a few lines: https://github.com/GlacioHack/geoutils/blob/main/geoutils/projtools.py#L246

@@ -303,6 +312,41 @@ def test_load(self) -> None:
r.filename = None
r.load()

@pytest.mark.parametrize("example", [landsat_b4_path, aster_dem_path, landsat_rgb_path]) # type: ignore
def test_raster_with_roi(self, example: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test! But you also need to include tests that could fail, like e.g. with ROI that are out of bounds etc, to make sure that these are caught correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, we check that errors are raised properly, such as the ones raised at L 1075 in raster.py.

@adehecq
Copy link
Member

adehecq commented Jan 29, 2025

Hi all,
Thanks for implementing this!

I'm slowly trying to catch up, and sorry if some of it should have come up earlier during the conception...
As raised by rhugonnet, it would have helped if you had known that crop was already working on the unloaded dataset (in general, we only load the data when strictly needed, i.e., for printing the data, changing the data or plotting, otherwise only metadata are loaded). Sorry, we could have clarified this earlier if we had been more available.
Considering this, I wonder if it would not be easier to update the method crop to take a geometry in pixels? Then either simply suggest people to use the crop method directly, or have an argument crop to Raster to enable direct cropping. In either case, I am a bit concerned that having an "ROI" functionality and "crop" functionality may create some confusion as both are somewhat redundant. What do you think @rhugonnet ?

One more thing to consider. We should make sure that crop and ROI (if kept separate) behave the same way when the ROI is partially or fully out of the raster bounds. I had to check with crop, but it looks like it does not raise any warning or error if the bound is larger (but silently crops to the maximum extent) and raises an error if there is no intersection at all between the two. See example below. Maybe we should raise a warning when the requested crop partially overlap and an error when it does not overlap at all?
I made a comment directly in the code, but I think these edge cases should be tested in the tests, to make sure we don't end up with unexpected errors.

import geoutils as gu
img = gu.Raster(gu.examples.get_path("everest_landsat_b4"))
print(img.bounds)
# -> BoundingBox(left=478000.0, bottom=3088490.0, right=502000.0, top=3108140.0)

# Test with crop geometry too large on the right
img_crop = img.crop((478000, 3088490, 502100, 3108140), inplace=False)
print(img_crop.bounds)
# -> same extent as original

# Test with crop geometry outside of bounds 
img_crop = img.crop((502000, 3088490, 502500, 3108140), inplace=False)
# -> raises a rasterio WindowError -> we could actually catch this earlier and raise a clearer error 

@rhugonnet
Copy link
Member

What do you think @rhugonnet ?

I agree, we should make the naming and behaviour fully consistent between Raster(roi=) and Raster.crop(crop_geom=), as they are the same functionality.
I'd suggest simply renaming roi into crop_geom in Raster.__init__. Then, as @vschaffn already proposed, calling crop(crop_geom=) directly. This will ensure everything is the same, and rely on all existing tests to reduce maintenance of this addition 😉 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[POC] Simplify the creation of an ROI for a user.
4 participants