Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Refactor Callbacks #60

Merged
merged 58 commits into from
Oct 29, 2024
Merged

Refactor Callbacks #60

merged 58 commits into from
Oct 29, 2024

Conversation

HCookie
Copy link
Member

@HCookie HCookie commented Sep 24, 2024

  • Split into seperate files
  • Use list in config to add callbacks
  • Provide legacy config enabled approach
  • Fix ruff issues

New Usage

Set config.diagnostics.callbacks to a list of callback names to include

Closes #59, #45


📚 Documentation preview 📚: https://anemoi-training--60.org.readthedocs.build/en/60/

- Split into seperate files
- Use list in config to add callbacks
- Provide legacy config enabled approach
- Fix ruff issues
@HCookie HCookie self-assigned this Sep 24, 2024
@FussyDuck
Copy link

FussyDuck commented Sep 24, 2024

CLA assistant check
All committers have signed the CLA.

@HCookie
Copy link
Member Author

HCookie commented Sep 24, 2024

At the moment, this is the proposed refactor, I am yet to complete an exhaustive test of the changes

@HCookie HCookie removed the request for review from JesperDramsch September 24, 2024 09:57
@JesperDramsch
Copy link
Member

Great work, thank you for taking this on.

I was thinking that it might be nice to make this fully configurable through instantiate.

For example, no one is really using the stochastic weight averaging as far as I know, so having specific config entries for this is a bit of feature bloat.

Then the list of callbacks would just look like this:

callbacks:
  swa: _target_: pytorch_lightning.callbacks.stochastic_weight_avg.StochasticWeightAveraging
          swa_lr: 1e-4
          swa_epoch_start: 123
          annealing_epochs: 5
          annealing_strategy: cos
          device: null
  blabla: _target_: blabla_callback
             blabla: bla

This makes it more extensible and actually reduces some of or less used config entries.

Additionally, we can keep the standard callbacks, like model checkpoints as "permanent callback" (I don't think we have to make everything optional).

One idea I also had is that we could make a special list for "plot_callbacks" in the same style. Then we can easily keep the super convenient "plots.enabled = False" as a shortcut to disable them?

@HCookie HCookie marked this pull request as ready for review September 25, 2024 09:41
Copy link
Member

@JesperDramsch JesperDramsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HCookie, thanks for taking on the callbacks!

It's already much better, great work on that. I think we can take the refactor even further and make the callbacks (almost?) fully modular, which would be incredible for future extensibility.

One comment regarding the file names. So far we haven't been using <xyz>-ing.py as language. Especially "checkpointing" would be confusing with activation checkpointing (although that is and will stay confusing honestly). Can we rename these please?

src/anemoi/training/diagnostics/callbacks/plotting.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/learning_rate.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
src/anemoi/training/config/diagnostics/eval_rollout.yaml Outdated Show resolved Hide resolved
src/anemoi/training/diagnostics/callbacks/__init__.py Outdated Show resolved Hide resolved
pre-commit-ci bot and others added 7 commits October 2, 2024 10:54
- Prefill config with callbacks
- Warn on deprecations for old config
- Expand config enabled
- Add back SWA
- Fix logging callback
- Add flag to disable checkpointing
- Add testing
[feature] Fix trainable attribute callbacks
@sahahner
Copy link
Member

sahahner commented Oct 28, 2024

In general, this looks good to me. The new layout of the config files is intuitive. Thank you for the work that you have put into this.
There is one regard I have: The configuration of the callbacks is not traceable via MLFlow, as the list of targets is cut after a certain number of characters in the mlflow parameters.
Is there a way to work around this?

@HCookie
Copy link
Member Author

HCookie commented Oct 28, 2024

The configuration of the callbacks is not traceable via MLFlow, as the list of targets is cut after a certain number of characters in the mlflow parameters.

That issue with mlflow is addressed in #91. So once that is merged, the config will be accessible in a dump or fully expanded

JPXKQX
JPXKQX previously approved these changes Oct 28, 2024
JPXKQX
JPXKQX previously approved these changes Oct 28, 2024
@HCookie HCookie force-pushed the fxi/refactor_callbacks branch from a3f7e00 to 30dfd45 Compare October 28, 2024 13:51
sahahner
sahahner previously approved these changes Oct 29, 2024
Copy link
Member

@sahahner sahahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for incorporating the requested changes. This looks good to me now.

@mchantry mchantry added the ATS approved Approved by ATS label Oct 29, 2024
Copy link
Member

@sahahner sahahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@HCookie HCookie merged commit 6433fa3 into develop Oct 29, 2024
116 checks passed
@HCookie HCookie deleted the fxi/refactor_callbacks branch October 29, 2024 13:18
JesperDramsch pushed a commit that referenced this pull request Oct 29, 2024
* Refactor Callbacks
- Split into seperate files
- Use list in config to add callbacks
- Split out plotting callbacks config

* Refactor rollout (#87)
- New rollout central function

---------

Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Sara Hahner <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ATS approved Approved by ATS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Callbacks
8 participants