Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress #1

Open
4 of 11 tasks
jonathom opened this issue Jul 3, 2021 · 12 comments
Open
4 of 11 tasks

Progress #1

jonathom opened this issue Jul 3, 2021 · 12 comments

Comments

@jonathom
Copy link
Contributor

jonathom commented Jul 3, 2021

An issue to follow progress and make notes. As we're doing this on the VITO backend, some things might be VITO-specific and not applicable generally.

Work plan so far:

  1. Preprocess Sentinel-2 data in openEO

    • Time Series 2018
    • Cloud Mask with SCL Band
    • Median Composit
    • 60m Bands raus
    • 20m Bands to 10m resampling
    • NDVI, EVI
    • Download
  2. Import LUCAS Data into openEO

    • Reduce LUCAS data to relevant area (Germany)
    • Possible to Upload Own Data?
    • Which Format should the data be?
    • Probably GeoJSON
  3. Extract Training data
    a. train model locally
    b. train model in openEO

  4. Predict in openEO

@jonathom
Copy link
Contributor Author

jonathom commented Jul 3, 2021

This is a first test: Band 8 from 1.1.2018 - 20.1.2018 median composite with an SCL cloud mask (white = nodata). I think it shows that SCL is missing cloud shadows form here to there. This is less of a problem when greater timeframes are used.
Screenshot from 2021-07-03 13-25-19

@jonathom
Copy link
Contributor Author

jonathom commented Jul 3, 2021

The SCL classification table. Right now the script allows classes vegetation, non-vegetation, water, snow and unclassified and masks all others. Maybe also filtering unclassified can help to improve the above cloud shadow issue.
SCL_table

@jonathom
Copy link
Contributor Author

jonathom commented Jul 3, 2021

2018 yearly median RGB composite.
So far so good, only that the forest in the lower part seems to vary in brightness.. maybe cloud shadows?
status

@jonathom
Copy link
Contributor Author

Short update on things:

  • cloudless composite with NDVI and EVI works. tested with 3 bands on small areas (some km²) (to keep computation time down)

  • computations this large would be good in batch jobs, but there's a problem with downloading the results. see geopyspark#83

  • next step aggregation: multiple issues, see geopyspark#84 for an overview

@jdries
Copy link

jdries commented Jul 14, 2021

Hi @jonathom ,
about the aggregation issue. We recently added a feature that allows you to sample raster values to NetCDF files. It's also based on polygons, but you get back NetCDF, so you can extract any exact pixel value as preferred:
https://open-eo.github.io/openeo-python-client/cookbook/sampling.html

Not sure if this can help you?

@jonathom
Copy link
Contributor Author

@jdries Thanks for the suggestion. Ultimately we want to do a model training and prediction. Currently I'm not sure how much the filter_spatial can help with that.
I do get kind of large raster results though, many times larger than the polygons submitted, of the surrounding area. Is that expected behaviour? I understood it that only polygon pixels would be in the netcdf.

@jdries
Copy link

jdries commented Jul 14, 2021

The idea is that it generates one netcdf per polygon, it should indeed not be much larger than the polygon itself.
Assuming that your polygon corresponds to a reference point, the timeseries in the netcdf file can be used as input to model training.
If you have a job id, I can have a look at what output got generated.

@jonathom
Copy link
Contributor Author

I actually just used download on the example you provided, I also selected a single polygon from the given collection:

pol = {"type": "FeatureCollection","features": [{ "type": "Feature", "properties": { "fieldid": "00002806640E4676", "croptype": "60", "area": 14693.208386633429 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 2.894263811783079, 50.848173569416495 ], [ 2.895779792874445, 50.848291515531237 ], [ 2.896474603080004, 50.848327210217398 ], [ 2.896550154265974, 50.848409079717108 ], [ 2.897357980946039, 50.848464169736296 ], [ 2.897349549296917, 50.848279742901632 ], [ 2.897236890499088, 50.848022078692175 ], [ 2.897116150015971, 50.847795782433344 ], [ 2.896788572218617, 50.847598360692821 ], [ 2.896671134945062, 50.847493487588153 ], [ 2.896637612570767, 50.847430128039896 ], [ 2.896367029700992, 50.847453706473239 ], [ 2.895891076053742, 50.847686001351036 ], [ 2.895716831972371, 50.848034473674765 ], [ 2.895590948786306, 50.848199231782012 ], [ 2.895477232008031, 50.848202299534613 ], [ 2.89527945976356, 50.848168346467965 ], [ 2.895195545269909, 50.848126831827749 ], [ 2.895388753310959, 50.848079806219175 ], [ 2.895593505496588, 50.847664282831524 ], [ 2.895687146017587, 50.847390774985918 ], [ 2.8955570587023, 50.847390580216668 ], [ 2.895545670325787, 50.847391426258298 ], [ 2.895485451919841, 50.84739597495745 ], [ 2.895320365992804, 50.847447579775434 ], [ 2.894981950998298, 50.847587189779794 ], [ 2.894768328477734, 50.847687456356056 ], [ 2.89469268155421, 50.847731102718768 ], [ 2.894674725271703, 50.847742116219059 ], [ 2.894633594743594, 50.847767494338349 ], [ 2.894632575605076, 50.84776829074886 ], [ 2.894587838815454, 50.847804502992183 ], [ 2.894472881471012, 50.847887936225831 ], [ 2.894413285985372, 50.847936010010976 ], [ 2.894383563408518, 50.847959912967859 ], [ 2.894366866028001, 50.847976067272363 ], [ 2.894335529257079, 50.848006153916529 ], [ 2.894263811783079, 50.848173569416495 ] ] ] } }]}

s2_bands = s2_bands.filter_spatial(pol)

s2_bands.download(outputfile = "out.nc", format = "netCDF", options = {"sample_by_features": "FALSE"})

Thought batch job or not doesn't make a difference if I only want a single output anyway. so I don't have a job ID...

@jdries
Copy link

jdries commented Jul 14, 2021

aha, the idea is to use it with lots of polygons, as usually you need to gather a lot of reference data to train a model.
And in that sense, I only made it work for batch jobs, as you can't really download more than one file, so that should also work better with clipping to the actual polygon.

@jonathom
Copy link
Contributor Author

Batch download doesn't work through the Web Editor (python-driver#50), does it work from inside python? Are there other ways to get results from batch jobs?

@jdries
Copy link

jdries commented Jul 14, 2021

Yes, it works from the python client, and in fact, that bug is also fixed, tried it myself earlier today.
Although for downloading lots of samples, you'll want a tool to support it, like the python client.

@jonathom
Copy link
Contributor Author

filter_spatial works in batch jobs with the option sample_by_feature = TRUE enabled 👍
@Ludwigm6 What's possible now is to download a NetCDF file with some pixels (due to buffer) per LUCAS point - probably useful for data download (less space required but many many small files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants