Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

feature: config validation #157

Draft
wants to merge 69 commits into
base: develop
Choose a base branch
from

Conversation

theissenhelen
Copy link
Collaborator

@theissenhelen theissenhelen commented Nov 22, 2024

Currently, the configurations are passed via hydra from yaml files. This PR adds structured configs (or schemas) and basic config validation via Pydantic base models.

Some advantageous are:

  • validation and feedback to the user
  • syntax highlighting
  • data transformations

Main changes are:

  • schemas in utils that represent the structure of the yamls
  • a new command Anemoi-training config validate config_name

For developers:
If you make changes to the configs, these need to be represented in the structured configs/schemas.

This still work in progress, but I wanted to get feedback on e.g. where important validations are missing.


📚 Documentation preview 📚: https://anemoi-training--157.org.readthedocs.build/en/157/

Copy link
Member

@HCookie HCookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the way this has been implemented, but I do have some concerns.

  1. Where should defaults be specified? I worry about the visibility of having them set in the schema, and values being filled automajically for the user.
  2. We use hydra instantiate to allow a user to bring some of thier own classes and work them into the run. Some of the checks herein limit what can be given to the _target_. I wonder if instead, we could have an approach, that if it is a hard subclass, we enfore the parent init args, but allow whatever extra kwargs alongside any _target_. Or have a custom model validator, which on validation, loads the _target_, creates a schema for it, and then runs validate. That way the config is still validated, but we allow for any class to be used.

Comment on lines 40 to 44
assert target in [
"anemoi.models.preprocessing.normalizer.InputNormalizer",
"anemoi.models.preprocessing.imputer.InputImputer",
"anemoi.models.preprocessing.remapper.Remapper",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns that this hard limitation to anemoi.models provided preprocessors is not scalable.
Say I build another processor in another package, with the hydra.instatiate I can bring that along by providing the path, but I would get blocked here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three options I can think of that we should discuss: 1. No check for valid targets, 2. strict validation in the sense, we support only preprocessors that are implemented in Anemoi and thus tested and 3. we need to instantiate the targets and check whether they are subclassed from the BaseProcessor class. The third option requires addtional insatntiation of the target classes which might not be optimal.

src/anemoi/training/utils/schemas/hardware.py Outdated Show resolved Hide resolved
src/anemoi/training/utils/schemas/training.py Outdated Show resolved Hide resolved
@FussyDuck
Copy link

FussyDuck commented Dec 2, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ theissenhelen
❌ chebertpinard
You have signed the CLA already but the status is still pending? Let us recheck it.

@theissenhelen theissenhelen force-pushed the 1-feature-improved-configuration-and-data-structures branch from 176e652 to 74c7373 Compare December 5, 2024 12:45
@theissenhelen theissenhelen force-pushed the 1-feature-improved-configuration-and-data-structures branch from 3762500 to 844cd24 Compare December 6, 2024 12:02
Comment on lines +49 to +57
class Rollout(BaseModel):
"""Rollout configuration."""

start: PositiveInt = Field(default=1)
"Number of rollouts to start with."
epoch_increment: NonNegativeInt = Field(default=0)
"Number of epochs to increment the rollout."
max: PositiveInt = Field(default=1)
"Maximum number of rollouts."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To let you know, #206 will change these options.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] improved configuration and data structures
4 participants