Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More database structure musings #134

Open
aidanheerdegen opened this issue Jun 21, 2019 · 3 comments
Open

More database structure musings #134

aidanheerdegen opened this issue Jun 21, 2019 · 3 comments
Labels
🥞 database Related to the structure of the database itself 🧜🏽‍♀️ enhancement
Milestone

Comments

@aidanheerdegen
Copy link
Collaborator

Currently it is possible to decouple the experiment name from its location, which I think is potentially a good thing. name could be added to the metadata file, so an experiment name need not be tied to a particular choice of directory name. This would allow, say, spinup to be the name of the most recent spinup.

This is a good and bad thing. Convenient, but difficult for reproducibility. How would users know they are using the same dataset as the had in the past when specifying just a name.

Which brings me back to thinking about versioning, and uniquely identifying datasets.

Possible solution (or beginning of one): generate a uuid for each dataset and save it in the metadata file (create if it doesn't already exist).

This would solve another issue I have been thinking about: uniqueness. It should be possible to have the same experiment name for different models. Not only possible, but potentially desirable. That way every model can have a spinup experiment, say. This is possible if the uuid is the only column on which we force uniqueness.

It does make data discovery trickier. If you ask for all the experiments it may return 7 all named spinup. So I would advocate for adding a model column to the experiments table, and a corresponding field in the metadata file, unless there is a better way to extract model name from the outputs.

We can also easily handle versioning in this way. Multiple experiments can have the same name even for the same model, but unique ids. For this reason I would also advocate adding a version column to the experiments table. It could be specified in the metadata, or could require uniqueness for experiment+model+version and auto-increment version if there is a clash.

I know I said don't change this anymore, and maybe this could be done after the winter school, but I would strongly advocate for this, or something like this before publishing to a wider audience.

@angus-g angus-g added this to the v0.4 milestone Jun 24, 2019
@angus-g angus-g added 🥞 database Related to the structure of the database itself 🧜🏽‍♀️ enhancement labels Jun 24, 2019
@aidanheerdegen
Copy link
Collaborator Author

Related #168

@aidanheerdegen
Copy link
Collaborator Author

I'm not sure how useful a lot of these ideas are any longer, thought it was a good idea I had to add a model field in the DB, that is now inferred in the explorer, but probably should be a field in the DB. See #182

@angus-g
Copy link
Collaborator

angus-g commented Sep 1, 2020

We could probably add model as a heuristic (hybrid?) property on the NCFile or NCVar models -- I'm not such a fan of storing it in the database itself, but we can easily derive it from data available on those models (you pull it out of the file's path, right?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥞 database Related to the structure of the database itself 🧜🏽‍♀️ enhancement
Projects
None yet
Development

No branches or pull requests

2 participants