Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Land cover mapping tutorial #2449

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

burakekim
Copy link
Contributor

@burakekim burakekim commented Dec 5, 2024

Adding a new tutorial according to #2418

This tutorial demonstrates how to combine Sentinel-2 and CDL EuroCrops datasets using the Sentinel2CDLDataModule Sentinel2EuroCropsDataModule. It covers training a semantic segmentation model, along with evaluation and inference steps.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 5, 2024
@adamjstewart adamjstewart mentioned this pull request Dec 6, 2024
25 tasks
@nilsleh
Copy link
Collaborator

nilsleh commented Dec 6, 2024

@burakekim Can you add the tutorial to the table of content in the rst file? Then one can view it in the CI docs to review as well :)

@adamjstewart adamjstewart modified the milestones: 0.6.2, 0.6.3 Dec 8, 2024
docs/index.rst Outdated Show resolved Hide resolved
@burakekim
Copy link
Contributor Author

burakekim commented Jan 4, 2025

The initial and somewhat end-to-end draft is now out:

  • downloads a Sentinel-2 patch with rasterio's windowed reading
  • prepares EuroCrops
  • visualizes the Sentinel-2 patch and its corresponding EuroCrops mask (with matplotlib and on a dynamic map)
  • trains and loads a dummy model for qualitative and quantitative evaluation

There are quite a few things I want to correct and improve:

  • train on GPU for a reasonable number of epochs, with proper dataloader and Trainer hyperparameters
  • maybe host the pretrained model + Sentinel-2 patch on HF?
  • use a bigger Sentinel-2 patch for training and possibly download another patch for inference or use some sort of opportunistic sampling (can we do that with GridSampler?) for proper evaluation that tones down potential spatial autocorrelation
  • EuroCrops has over 300 labels, but each country has its own distinct subset. The number of classes Slovakia has is still high. Shall we just turn this into a binary crop classification?
  • addressing the question above comes down to what we want to do with the trained model, i.e., does it add value to form multi-class classification?
  • there is a skimage dependency to visualize Sentinel-2 with percentile normalization; and folium, pyproj, shapely for plotting the Sentinel-2 and EuroCrops bounds on a dynamic map -- or is it fine to download 3rd party libraries for individual case studies?

P.S. In the next iteration, I am thinking of renaming the case study to Crop Type Classification. That would describe the task better

@adamjstewart
Copy link
Collaborator

Still need to actually look at the code, but here are responses to your TODOs:

train on GPU for a reasonable number of epochs, with proper dataloader and Trainer hyperparameters

Note that this needs to run in CI, preferably in seconds, not days. We can monkeypatch certain hyperparams to make this faster, but it shouldn't require a GPU.

maybe host the pretrained model + Sentinel-2 patch on HF?

Happy to do this if it makes the above faster while still getting good results.

use a bigger Sentinel-2 patch for training and possibly download another patch for inference or use some sort of opportunistic sampling (can we do that with GridSampler?) for proper evaluation that tones down potential spatial autocorrelation

Avoid big data, this needs to run in CI where we have very limited storage, don't want to wait 10 min to download data during a tutorial. Not sure what you mean by opportunistic sampling, but there are various GeoDataset splitting methods that you can use to chop a tile into east/west splits, grids, etc.

EuroCrops has over 300 labels, but each country has its own distinct subset. The number of classes Slovakia has is still high. Shall we just turn this into a binary crop classification? addressing the question above comes down to what we want to do with the trained model, i.e., does it add value to form multi-class classification?

I think both add value. Basically, we should have some kind of binary semantic segmentation application, and some kind of multiclass semantic segmentation application. They don't both have to be for agriculture though. For binary, something like building mapping may make more sense.

Also, tasks involving agriculture benefit greatly from time-series data. I'm planning on extending this tutorial for time series once we add support for it. So don't worry too much about the details right now, they will change in the future. This will also make the big data problem even worse, so keep the images small for now.

there is a skimage dependency to visualize Sentinel-2 with percentile normalization; and folium, pyproj, shapely for plotting the Sentinel-2 and EuroCrops bounds on a dynamic map -- or is it fine to download 3rd party libraries for individual case studies?

Would prefer to avoid any additional dependencies if we can. Any reason we can't plot a static map with matplotlib? eurocrops.plot(sample) and sentinel2.plot(sample) should get you pretty far. If we do need to add additional deps, they need to be installed in .github/workflows/tutorials.yaml and .github/workflows/release.yaml like we did with planetary_computer. But I'm trying to get rid of those too, since they aren't absolutely necessary and aren't tracked by dependabot like our formal deps.

P.S. In the next iteration, I am thinking of renaming the case study to Crop Type Classification. That would describe the task better

I agree with the rename. Both "Crop Classification" and "Crop Type Mapping" are common names. I think the latter may actually be even more common, and more technically correct. A computer vision person may argue that this is semantic segmentation, not classification. Of course, semantic segmentation is just pixelwise classification, so the distinction isn't too important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants