Refactor database for datasets #5

freeman-lab · 2015-09-11T16:36:44Z

To begin addressing #4 , we should use a more flexible approach for managing datasets and their metadata. Currently we use a mongo db (within meteor) for datasets but it's populated by querying all datasets stored in a dedicated bucket on S3, and periodically refreshed, so is more or less transient.

We should instead use a more persistent db that's initialized with the S3 bucket, but provide methods for users to submit datasets directly in the web app and update the db, with some form validation / checking.

We'll assume anyone submitting data is already hosting it publicly on S3, and the validation during submission can check that the specified resources exist. Submissions can also specify any Jupyter notebooks associated with the data. We'll need to separately address how to include the new notebooks in our notebook deployments (will create another issue for that).

cc @bcipolli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor database for datasets #5

Refactor database for datasets #5

freeman-lab commented Sep 11, 2015

Refactor database for datasets #5

Refactor database for datasets #5

Comments

freeman-lab commented Sep 11, 2015