Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment versioning #191

Closed
aidanheerdegen opened this issue Jun 21, 2019 · 5 comments
Closed

Experiment versioning #191

aidanheerdegen opened this issue Jun 21, 2019 · 5 comments
Assignees
Labels

Comments

@aidanheerdegen
Copy link
Collaborator

aidanheerdegen commented Jun 21, 2019

It would be great to be able to uniquely identify an entire experiment.

Could generate a uuid for this purpose, and save it in a metadata YaML YAML file in archive. If there was not already a metadata file.

It would break if the metadata file was copied from another experiment. Maybe just live with that?

Would want to be consistent with whatever is done for the cosima cookbook

COSIMA/cosima-cookbook#134

Would the metadata file be tracked by the git repo? That would mean generating the metadata file in the control directory and copying it to archive. It would make it more ambiguous when it was necessary to generate a new metadata file.

@marshallward
Copy link
Collaborator

Could the git hash for the experiment act as a unique ID? Not sure what metadata is referring to here.

@aidanheerdegen
Copy link
Collaborator Author

Sorry, need more background. There was a discussion in this cosima-cookbook PR about providing some metadata for the cookbook database.

COSIMA/cosima-cookbook#130 (comment)

An example was:

contact: Andrew Kiss 
contact_email: [email protected]

created: 2018-01-01

description: "Attempted spinup, using Russ' salt flux fix https://arccss.slack.com/archives/C6PP0GU9Y/p1515460656000124 and https://github.com/mom-ocean/MOM5/pull/208/commits/9f4ee6f8b72b76c96a25bf26f3f6cdf773b424d2 from the start. Used mushy ice from July year 1 onwards to avoid vertical thermo error in cice https://arccss.slack.com/archives/C6PP0GU9Y/p1515842016000079"

notes: "Stripy salt restoring: https://github.com/OceansAus/access-om2/issues/74  tripole seam bug: https://github.com/OceansAus/access-om2/issues/86 requires dt=300s in May, dt=240s in Aug to maintain CFL in CICE near tripoles (storms in those months in 8485RYF); all other months work with dt=400s"

I then suggested we could generate a uuid to uniquely identify an experiment.

We'd need just a single hash right? So which one? Well I guess it would be the hash at the time a new experiment was being started? I don't think we could guarantee that was unique. If an experiment is forked and a the user specified a reproducible run I don't think this would trigger a commit, and therefore a new hash.

@aidanheerdegen
Copy link
Collaborator Author

Was thinking about this the other day and definitely want an exptID generated separately from the git repo hashes of the control directory.

This exptID needs to be regenerated when various criteria are met. Some I thought of

  • run counter is reset
  • experiment name (directory name) changed
  • archive directory created
  • no existing metadata.yaml file

There is redundancy in the above list, as making an archive directory pretty much implies there is no existing metadata file, and the run counter will be reset. Equally when the experiment name is changed it is likely that a new archive directory will be created. If not, if a user manually renames the archive directory would the exptID need to change?

Some examples of when exptID would (and should) change:

  • Clone an experiment but change some parameters/forcing and start from initial conditions
  • Clone an experiment, but change some parameters/forcing and start from restarts: this is an experiment fork, like a perturbation run
  • Re-run after pay sweep --hard (so maybe a failed run, or incorrect inputs etc)

If a user clones their own experiment they are pretty much forced to change the experiment name otherwise they will have a directory name clash in their laboratory. A user clone someone else's config has no such restriction, but again, the clone will not bring over a metadata.yaml file, so one should be created as desired.

@aidanheerdegen
Copy link
Collaborator Author

Some portion of this ID could be appended to work and archive directories to disambiguate between identical experiment names. In this case that would be another reason to regenerate an experiment ID, if that experiment name already exists. Now that I think about it, that case is probably implicitly covered above when there is no archive directory .. but to check for that you'd need the experiment ID. This is getting circular ...

@aidanheerdegen
Copy link
Collaborator Author

Closed by #384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants