You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently it is possible to decouple the experiment name from its location, which I think is potentially a good thing. name could be added to the metadata file, so an experiment name need not be tied to a particular choice of directory name. This would allow, say, spinup to be the name of the most recent spinup.
This is a good and bad thing. Convenient, but difficult for reproducibility. How would users know they are using the same dataset as the had in the past when specifying just a name.
Which brings me back to thinking about versioning, and uniquely identifying datasets.
Possible solution (or beginning of one): generate a uuid for each dataset and save it in the metadata file (create if it doesn't already exist).
This would solve another issue I have been thinking about: uniqueness. It should be possible to have the same experiment name for different models. Not only possible, but potentially desirable. That way every model can have a spinup experiment, say. This is possible if the uuid is the only column on which we force uniqueness.
It does make data discovery trickier. If you ask for all the experiments it may return 7 all named spinup. So I would advocate for adding a model column to the experiments table, and a corresponding field in the metadata file, unless there is a better way to extract model name from the outputs.
We can also easily handle versioning in this way. Multiple experiments can have the same name even for the same model, but unique ids. For this reason I would also advocate adding a version column to the experiments table. It could be specified in the metadata, or could require uniqueness for experiment+model+version and auto-increment version if there is a clash.
I know I said don't change this anymore, and maybe this could be done after the winter school, but I would strongly advocate for this, or something like this before publishing to a wider audience.
The text was updated successfully, but these errors were encountered:
I'm not sure how useful a lot of these ideas are any longer, thought it was a good idea I had to add a model field in the DB, that is now inferred in the explorer, but probably should be a field in the DB. See #182
We could probably add model as a heuristic (hybrid?) property on the NCFile or NCVar models -- I'm not such a fan of storing it in the database itself, but we can easily derive it from data available on those models (you pull it out of the file's path, right?)
Currently it is possible to decouple the experiment name from its location, which I think is potentially a good thing.
name
could be added to the metadata file, so an experiment name need not be tied to a particular choice of directory name. This would allow, say,spinup
to be the name of the most recent spinup.This is a good and bad thing. Convenient, but difficult for reproducibility. How would users know they are using the same dataset as the had in the past when specifying just a name.
Which brings me back to thinking about versioning, and uniquely identifying datasets.
Possible solution (or beginning of one): generate a
uuid
for each dataset and save it in the metadata file (create if it doesn't already exist).This would solve another issue I have been thinking about: uniqueness. It should be possible to have the same experiment name for different models. Not only possible, but potentially desirable. That way every model can have a spinup experiment, say. This is possible if the
uuid
is the only column on which we force uniqueness.It does make data discovery trickier. If you ask for all the experiments it may return 7 all named
spinup
. So I would advocate for adding amodel
column to theexperiments
table, and a corresponding field in the metadata file, unless there is a better way to extract model name from the outputs.We can also easily handle versioning in this way. Multiple experiments can have the same name even for the same model, but unique ids. For this reason I would also advocate adding a
version
column to theexperiments
table. It could be specified in the metadata, or could require uniqueness forexperiment
+model
+version
and auto-incrementversion
if there is a clash.I know I said don't change this anymore, and maybe this could be done after the winter school, but I would strongly advocate for this, or something like this before publishing to a wider audience.
The text was updated successfully, but these errors were encountered: