Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add marine infrastructure dataset and model config #49

Open
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

favyen2
Copy link
Collaborator

@favyen2 favyen2 commented Oct 18, 2024

This adds dataset/model configs for solar farm and wind turbine models, along with corresponding inference pipelines.

It also:

  • Moves convert_satlas_webmercator_to_rslearn to one_off_projects since it's no longer needed now that the conversion of the datasets (from old format + WebMercator projection to new rslearn format + UTM projection) is completed.
  • Adds rslp.common.worker system to launch workers that read tasks from a queue (Google Pub/Sub topic) and execute them. In the future all task systems (like Landsat and Sentinel-2 job launchers) should be converted to this system so that they can share the same way to execute workers -- then the individual projects just need to implement the process of writing tasks to the topic.

@favyen2
Copy link
Collaborator Author

favyen2 commented Oct 23, 2024

Changing back to draft, I'm going to add the work on deploying the prediction pipeline on Beaker here.

@favyen2 favyen2 marked this pull request as draft October 23, 2024 21:40
@favyen2 favyen2 requested a review from Hgherzog January 9, 2025 20:01
rslp/common/worker.py Outdated Show resolved Hide resolved
@@ -104,7 +100,9 @@ def launch_job(
config_path,
"--autoresume=true",
],
constraints=Constraints(cluster=["ai2/jupiter-cirrascale-2"]),
constraints=Constraints(
cluster=["ai2/jupiter-cirrascale-2", "ai2/augusta-google-1"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Make cluster Name into constants or better a class containing all the cluster names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add refactoring of this to #100 since that PR consolidates some of the Beaker stuff.

rslp/satlas/bkt.py Show resolved Hide resolved
rslp/common/worker.py Show resolved Hide resolved
rslp/satlas/data_sources.py Outdated Show resolved Hide resolved
return cur_groups


class MonthlySentinel2(DataSource):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this code make more sense in rslearn? I would imagine there are other usecases for monthly sentinel2 data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the rest of the data sources here. Add integration tests as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests. I think it makes sense for now to keep it in rslearn_projects, we will see how many applications use the same thing. If we do add to rslearn, we should refactor it so we don't need this wrapper class, ideally it'd be something in the QueryConfig maybe.

return fname, json.load(f)


def apply_nms(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for the postprocessing functions

applying the model globally but can be disabled for small regions to avoid
the time to create the index.
"""
dataset_config_fname = DATASET_CONFIG_FNAME.format(application=application.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Split this into a couple functions for readability

# Smoothing is handled by a Go script.
subprocess.check_call(
[
"rslp/satlas/scripts/smooth_point_labels_viterbi",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't the smooth point labels be written in python?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be feasible, I'm not sure, this is borrowed from https://github.com/allenai/remote-sensing-data-hub/blob/main/go_scripts/smooth_point_labels_viterbi.go and not much reason to rewrite at this time I think.

# Apply the pipeline. It will ingest data and apply the model.
# We disable rtree index so that it doesn't need an hour to create it.
predict_pipeline(
application=Application.MARINE_INFRA,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have tests for the other applications as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is only MARINE_INFRA / WIND_TURBINE and these are similar enough that I don't think we need to run both (the code is the same, just different model, but this test is just making sure the code runs correctly).

I think once we have SOLAR_FARM it could use its own test since there will be code specific to the segmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants