Skip to content

Commit

Permalink
Merge pull request #18 from quantifyearth/mwd-auto-window
Browse files Browse the repository at this point in the history
Automated result window inference.

I was going to add more to this PR in terms of API refinement, but already I've found myself taking advantage of the improvements here in code I write, so I think whilst not perfect, it is better than what was there before and more useful, so 1.0.0 it is.
  • Loading branch information
mdales authored Feb 21, 2025
2 parents ff78ca4 + 32e81e0 commit 07eeb01
Show file tree
Hide file tree
Showing 13 changed files with 537 additions and 148 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install "numpy<2" gdal[numpy]==3.9.2 scikit-image "h3==4.0.0b5"
python -m pip install pylint mypy pytest types-setuptools
python -m pip install pylint mypy pytest types-setuptools pytest-cov
- name: Lint with pylint
run: |
python3 -m pylint yirgacheffe
- name: Test with pytest
run: |
python3 -m pytest -vv
python3 -m pytest --cov=yirgacheffe -vv
99 changes: 48 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,82 +2,79 @@

## Overview

Yirgacheffe is an attempt to wrap gdal datasets such that you can do computational work on them without having to worry about common tasks:
Yirgacheffe is an attempt to wrap raster and polygon geospatial datasets such that you can do computational work on them as a whole or at the pixel level, but without having to do a lot of the grunt work of working out where you need to be in rasters, or managing how much you can load into memory safely.

Example common use-cases:

* Do the datasets overlap? Yirgacheffe will let you define either the intersection or the union of a set of different datasets, scaling up or down the area as required.
* Rasterisation of vector layers: if you have a vector dataset then you can add that to your computation and yirgaceffe will rasterize it on demand, so you never need to store more data in memory than necessary.
* Do the raster layers get big and take up large amounts of memory? Yirgacheffe will let you do simple numerical operations with layers directly and then worry about the memory management behind the scenes for you.


## Basic layer usage
## Basic usage

They main unit of data in Yirgacheffe is a "layer", which wraps either a raster dataset or polygon data, and then you can do work on layers without having to worry (unless you choose to) about how they align - Yirgacheffe will work out all the details around overlapping

The motivation for Yirgacheffe layers is to make working with gdal data slightly easier - it just hides some common operations, such as incremental loading to save memory, or letting you align layers to generate either the intersection result or the union result.

For example, say we had three layers that overlapped and we wanted to know the
For example, if we wanted to do a simple [Area of Habitat](https://github.com/quantifyearth/aoh-calculator/) calculation, whereby we find the pixels where a species resides by combining its range polygon, its habitat preferences, and its elevation preferences, the code would be like this:

```python
from yirgacheffe.layer import Layer, UniformAreaLayer

elevation_layer = RasterLayer.layer_from_file('elecation.tiff')
area_layer = UniformAreaLayer('area.tiff')
validity_layer = RasterLayer.layer_from_file('validity.tiff')

# Work out the common subsection of all these and apply it to the layers
intersection = RasterLayer.find_intersection([elecation_layer, area_layer, validity_layer])
elevation_layer.set_window_for_intersection(intersection)
area_layer.set_window_for_intersection(intersection)
validity_layer.set_window_for_intersection(intersection)

# Create a layer that is just the size of the intersection to store the results
result = RasterLayer.empty_raster_layer(
intersection,
area_layer.pixel_scale,
area_layer.datatype,
"result.tif",
area_layer.projection
)

# Work out the area where the data is valid and over 3000ft
def is_munro(data):
return numpy.where(data > 3000.0, 0.0, 1.0)
calc = validity_layer * area_layer * elevation_layer.numpy_apply(is_munro)
from yirgacheffe.layer import RasterLayer, VectorLayer

calc.save(result)
result.close()
habitat_map = RasterLayer.layer_from_file("habitats.tif")
elevation_map = RasterLayer.layer_from_file('elevation.tif')
range_polygon = VectorLayer.layer_from_file('species123.geojson', raster_like=habitat_map)
area_per_pixel_map = RasterLayer.layer_from_file('area_per_pixel.tif')

refined_habitat = habitat_map.isin([...species habitat codes...])
refined_elevation = (elevation_map >= species_min) && (elevation_map <= species_max)

aoh = refined_habitat * refined_elevation * range_polygon * area_per_pixel_map

print(f'area for species 123: {aoh.sum()}')
```

If you want the union then you can simply swap do:
Similarly, you could save the result to a new raster layer:

```python
intersection = RasterLayer.find_union(elecation_layer, area_layer, validity_layer)
elevation_layer.set_window_for_union(intersection)
area_layer.set_window_for_union(intersection)
validity_layer.set_window_for_union(intersection)
with RasterLayer.empty_raster_layer_like(aoh, filename="result.tif") as result:
aoh.save(result)
```

If you want to work on the data in a layer directly you can call `read_array`, as you would with gdal. The data you access will be relative to the specified window - that is, if you've called either `set_window_for_intersection` or `set_window_for_union` then `read_array` will be relative to that and Yirgacheffe will clip or expand the data with zero values as necessary.
Yirgacheffe will automatically infer if you want to do an intersection of maps or a union of the maps based on the operators you use (see below for a full table). You can explicitly override that if you want.

### Lazy loading and evaluation

Yirgacheffe uses a technique from computer science called "lazy evaluation", which means that only when you resolve a calculation will Yirgacheffe do any work. So in the first code example given, the work is only calculated when you call the `sum()` method. All the other intermediary results such as `refined_habitat` and `refined_elevation` are not calculated either until that final `sum()` is called. You could easily call sum on those intermediaries if you wanted and get their results, and that would cause them to be evaluated then.

Similarly, when you load a layer, be it a raster layer or a vector layer from polygon data, the full data for the file isn't loaded until it's needed for a calculation, and even then only the part of the data necessary will be loaded or rasterized. Furthermore, Yirgacheffe will load the data in chunks, letting you work with rasters bigger than those that would otherwise fit within your computer's memory.

If you have set either the intersection window or union window on a layer and you wish to undo that restriction, then you can simply call `reset_window()` on the layer.

Layers also work with Python's `with` statement, which helps prevent you forgetting to close a layer, which is important both for ensuring results are written to disk when you want, and that memory used by layers is freed (Yirgacheffe doesn't cache data in memory, but it uses GDAL which does).
### Automatic expanding and contracting layers

So for example you can turn the above example to:
When you load raster layers that aren't of equal geographic area (that is, they have a different origin, size, or both)then Yirgacheffe will do all the math internally to ensure that it aligns the pixels geospatially when doing calculations.

If size adjustments are needed, then Yirgacheffe will infer from the calculations you're doing if it needs to either crop or enlarge layers. For instance, if you're summing two rasters it'll expand them to be the union of their two areas before adding them, filling in the missing parts with zeros. If you're multiplying or doing a logical AND of pixels then it'll find the intersection between the two rasters (as areas missing in one would cause the other layer to result in zero anyway).

Whilst it is hoped that the default behaviour makes sense in most cases, we can't anticipate all usages, and so if you want to be explicit about the result of any maps you can specify it yourself.

For example, to tell Yirgacheffe to make a union of a set of layers you'd write:

```python
with RasterLayer.empty_raster_layer(
intersection,
area_layer.pixel_scale,
area_layer.datatype,
"result.tif",
area_layer.projection
) as result:
# Work out the area where the data is valid and over 3000ft
def is_munro(data):
return numpy.where(data > 3000.0, 0.0, 1.0)
calc = validity_layer * area_layer * elevation_layer.numpy_apply(is_munro)
layers = [habitat_map, elevation_map, range_polygon]
union_area = YirgacheffeLayer.find_union(layers)
for layer in layers:
layer.set_window_for_union(union_area)
```

And you no longer need to explicitly call `result.close()` to ensure the result is written to disk.
There is a similar set of methods for using the intersection.

If you have set either the intersection window or union window on a layer and you wish to undo that restriction, then you can simply call `reset_window()` on the layer.

### Direct access to data

If doing per-layer operations isn't applicable for your application, you can read the pixel values for all layers (including VectorLayers) by calling `read_array` similarly to how you would for gdal. The data you access will be relative to the specified window - that is, if you've called either `set_window_for_intersection` or `set_window_for_union` then `read_array` will be relative to that and Yirgacheffe will clip or expand the data with zero values as necessary.


### Todo but not supported
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "yirgacheffe"
version = "0.9.4"
version = "1.0.0"
description = "Abstraction of gdal datasets for doing basic math operations"
readme = "README.md"
authors = [{ name = "Michael Dales", email = "[email protected]" }]
Expand All @@ -25,7 +25,7 @@ dependencies = [
requires-python = ">=3.10"

[project.optional-dependencies]
dev = ["mypy", "pylint", "pytest", "h3"]
dev = ["mypy", "pylint", "pytest", "h3", "pytest-cov"]

[project.urls]
Homepage = "https://github.com/quantifyearth/yirgacheffe"
Expand Down
Loading

0 comments on commit 07eeb01

Please sign in to comment.