More database structure musings #134

aidanheerdegen · 2019-06-21T00:05:10Z

Currently it is possible to decouple the experiment name from its location, which I think is potentially a good thing. name could be added to the metadata file, so an experiment name need not be tied to a particular choice of directory name. This would allow, say, spinup to be the name of the most recent spinup.

This is a good and bad thing. Convenient, but difficult for reproducibility. How would users know they are using the same dataset as the had in the past when specifying just a name.

Which brings me back to thinking about versioning, and uniquely identifying datasets.

Possible solution (or beginning of one): generate a uuid for each dataset and save it in the metadata file (create if it doesn't already exist).

This would solve another issue I have been thinking about: uniqueness. It should be possible to have the same experiment name for different models. Not only possible, but potentially desirable. That way every model can have a spinup experiment, say. This is possible if the uuid is the only column on which we force uniqueness.

It does make data discovery trickier. If you ask for all the experiments it may return 7 all named spinup. So I would advocate for adding a model column to the experiments table, and a corresponding field in the metadata file, unless there is a better way to extract model name from the outputs.

We can also easily handle versioning in this way. Multiple experiments can have the same name even for the same model, but unique ids. For this reason I would also advocate adding a version column to the experiments table. It could be specified in the metadata, or could require uniqueness for experiment+model+version and auto-increment version if there is a clash.

I know I said don't change this anymore, and maybe this could be done after the winter school, but I would strongly advocate for this, or something like this before publishing to a wider audience.

The text was updated successfully, but these errors were encountered:

aidanheerdegen · 2020-09-01T05:22:56Z

Related #168

aidanheerdegen · 2020-09-01T05:24:12Z

I'm not sure how useful a lot of these ideas are any longer, thought it was a good idea I had to add a model field in the DB, that is now inferred in the explorer, but probably should be a field in the DB. See #182

angus-g · 2020-09-01T05:27:42Z

We could probably add model as a heuristic (hybrid?) property on the NCFile or NCVar models -- I'm not such a fan of storing it in the database itself, but we can easily derive it from data available on those models (you pull it out of the file's path, right?)

aidanheerdegen mentioned this issue Jun 21, 2019

Experiment versioning payu-org/payu#191

Closed

angus-g added this to the v0.4 milestone Jun 24, 2019

angus-g added 🥞 database Related to the structure of the database itself 🧜🏽‍♀️ enhancement labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More database structure musings #134

More database structure musings #134

aidanheerdegen commented Jun 21, 2019

aidanheerdegen commented Sep 1, 2020

aidanheerdegen commented Sep 1, 2020

angus-g commented Sep 1, 2020

More database structure musings #134

More database structure musings #134

Comments

aidanheerdegen commented Jun 21, 2019

aidanheerdegen commented Sep 1, 2020

aidanheerdegen commented Sep 1, 2020

angus-g commented Sep 1, 2020