Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaia integration with Minerva/Girder #72

Open
mbertrand opened this issue Jul 25, 2016 · 12 comments
Open

Gaia integration with Minerva/Girder #72

mbertrand opened this issue Jul 25, 2016 · 12 comments
Assignees

Comments

@mbertrand
Copy link
Collaborator

Goal

Run Gaia processes from within Minerva

General requirements

  • Gaia needs to be able to read from/write to data stored in Girder. Possible approaches:
    • Minerva sends Gaia the girder item id (and authentication token?). Gaia then loads the data directly from the Girder backend using the REST API, and writes result back to same folder.
    • Minerva front-end sends Gaia the input data directly as geojson (for vector data, but what about raster data?). REST API still used to write output?
  • Gaia should run processes for Minerva using the same framework as used for other jobs/analyses. This likely means integration with girder-worker.

@jbeezley @kotfic @aashish24 @dorukozturk let me know if you have any thoughts/suggestions on this, thanks.

@mbertrand mbertrand self-assigned this Jul 25, 2016
@jbeezley
Copy link
Contributor

One decision that will inform what to do here is determining how data is stored and transferred. There are several different conventions being used here and elsewhere.

  1. Vector data stored as item metadata. This is used for GeoJSON feature types within Minerva. There is also the geospatial plugin that does something similar, but stores the vector data on the geo property of the item model.
  2. Data stored as files. This would be mostly compatible with the semantics of girder_worker, so we could provide item and file id's directly, and it would handle the data transfer. The limitation of this approach is that you can't run queries on the data itself.
  3. Something else such as WMS, OpenDAP, WFS, girder_db_items, etc. Perhaps we could add girder_worker "format" types to handle these in the same way girder items are handled.

I think in general we should always route data through Minerva's server rather than allow the client to get data directly from (or post to) Gaia. Doing so would add complexity in a way that wouldn't scale to real world datasets.

@mbertrand
Copy link
Collaborator Author

Thanks @jbeezley. By routing data "through Minerva's server rather than allow the client to get data directly from (or post to) Gaia", do you mean that Minerva should create a girder-worker job for gaia, that would include the data's item id('s) (and possibly a format type to distinguish between data stored as metadata or in files). Girder-worker would then retrieve the data sets as GeoJSON and use that as input to a Gaia process? And save the process output to a new item?

@mbertrand
Copy link
Collaborator Author

@aashish24 based on today's phone call, does this accurately represent your ideas on how to move forward?

  • Gaia should read/write data from and to Girder directly, given a folder/item id and authentication token. There will be a GirderIO class to implement this, or perhaps multiple classes to deal with geodata stored as files vs metadata vs girder_db_items etc.
  • The celery_jobs plugin should be copied over to Gaia and modified to allow running and monitoring of gaia processes as girder jobs. At this point I’m still unclear as to what needs to be modified from the current plugin to allow this. There are currently some gaia celery wrapper tasks defined here.
  • There should be a REST API call to get a list of the available classes (IO and Process) available through girder.

@jbeezley @dorukozturk @kotfic thoughts?

@kotfic
Copy link
Contributor

kotfic commented Aug 4, 2016

@mbertrand You may want to look at the girder_worker girder_io plugin for some ideas about how to leverage the girder-client library to achieve this kind of functionality. Obviously that code is tied up with girder_worker's architecture, but it essentially does what you're describing.

@aashish24 rather than creating another custom distributed job management plugin, wouldn't it be better to coordinate with the girder dev team and see if we can put together a PR that meets gaia's needs through an existing piece of infrastructure?

@mbertrand
Copy link
Collaborator Author

Thanks @kotfic I will take a look at that plugin. It would also be my preference to use a standard girder celery/job management framework if possible.

@mbertrand
Copy link
Collaborator Author

mbertrand commented Aug 5, 2016

@aashish24 geospatial vector data in minerva can come from a few sources right now:

  • Mongo database
  • WMS server
  • Uploaded geojson, stored as files in girder

Each of these sources would need some strategy for processing in Gaia:

  • Mongo: Either a 'MongoIO' class in Gaia, to read from mongo directly; or pass the geojson already retrieved from mongo in minerva to Gaia's FeatureIO or VectorFileIO reader; or read the geojson from the girder item's minerva metadata.
  • WMS: A 'WfsIO' class in Gaia to retrieve the geojson from the WMS server via a WFS GetFeature request. The actual data is not retrieved by Minerva, it's just displaying map tiles.
  • Uploaded geojson: Either a GirderIO class in Gaia to retrieve the file from girder, or pass the geojson already retrieved in minerva to Gaia's FeatureIO or VectorFileIO reader.

Am I missing any other sources? Any particular source I should give priority to?

@mbertrand
Copy link
Collaborator Author

PS @aashish24 in girder there's also the ability to store a 'geo' field in an item's metadata via the geospatial plugin, but AFAIK this isn't used in Minerva.

@mbertrand
Copy link
Collaborator Author

Some exploratory code in a notebook here:

https://gist.github.com/mbertrand/c153d0019441ef0fc2298e5359d73f2d

Uses a combination of girder-client, girder-worker to retrieve geojson data from minerva items and use them as inputs to run a gaia process via girder-worker.

@aashish24
Copy link
Collaborator

thanks @mbertrand I am getting back to this. I will have a look. Thanks,

@mbertrand
Copy link
Collaborator Author

@aashish24 Here is an alternative approach, based on several of the supplied analyses in Minerva. It is a girder plugin that allows Gaia processes to be run on Minerva geojson datasets as Girder jobs (with or without celery): https://github.com/mbertrand/gaia_minerva

@kotfic
Copy link
Contributor

kotfic commented Aug 31, 2016

@mbertrand Just an FYI, once girder/girder#1553 clears you should be able to modify gaia_minerva to use remote celery jobs via the worker plugin instead using girder local jobs.

@mbertrand
Copy link
Collaborator Author

The repo for this is https://github.com/OpenDataAnalytics/gaia_minerva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants