Merge branch 'develop' into pre-training-check

ecmwf · Nov 21, 2024 · 8565cbb · 8565cbb
2 parents a1dcb64 + 923b266
commit 8565cbb
Show file tree

Hide file tree

Showing 5 changed files with 326 additions and 67 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,47 +8,67 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 Please add your functional changes to the appropriate section in the PR.
 Keep it human-readable, your future self will thank you!
 
-## [Unreleased](https://github.com/ecmwf/anemoi-training/compare/0.2.2...HEAD)
+## [Unreleased](https://github.com/ecmwf/anemoi-training/compare/0.3.0...HEAD)
+### Fixed
+
+### Added
+- Added a check for the variable sorting on pre-trained/finetuned models [#120](https://github.com/ecmwf/anemoi-training/pull/120)
+
+### Changed
+
+### Removed
+- Removed the resolution config entry [#120](https://github.com/ecmwf/anemoi-training/pull/120)
+
+
+
+## [0.3.0 - Loss & Callback Refactors](https://github.com/ecmwf/anemoi-training/compare/0.2.2...0.3.0) - 2024-11-14
+
+### Changed
+- Increase the default MlFlow HTTP max retries [#111](https://github.com/ecmwf/anemoi-training/pull/111)
 
 ### Fixed
+
 - Rename loss_scaling to variable_loss_scaling [#138](https://github.com/ecmwf/anemoi-training/pull/138)
 - Refactored callbacks. [#60](https://github.com/ecmwf/anemoi-training/pulls/60)
-    - Updated docs [#115](https://github.com/ecmwf/anemoi-training/pull/115)
-    - Fix enabling LearningRateMonitor [#119](https://github.com/ecmwf/anemoi-training/pull/119)
+  - Updated docs [#115](https://github.com/ecmwf/anemoi-training/pull/115)
+  - Fix enabling LearningRateMonitor [#119](https://github.com/ecmwf/anemoi-training/pull/119)
+
 - Refactored rollout [#87](https://github.com/ecmwf/anemoi-training/pulls/87)
-    - Enable longer validation rollout than training
+  - Enable longer validation rollout than training
+
 - Expand iterables in logging [#91](https://github.com/ecmwf/anemoi-training/pull/91)
-    - Save entire config in mlflow
+  - Save entire config in mlflow
+
+
 ### Added
+
 - Included more loss functions and allowed configuration [#70](https://github.com/ecmwf/anemoi-training/pull/70)
 - Include option to use datashader and optimised asyncronohous callbacks [#102](https://github.com/ecmwf/anemoi-training/pull/102)
    - Fix that applies the metric_ranges in the post-processed variable space [#116](https://github.com/ecmwf/anemoi-training/pull/116)
 - Allow updates to scalars [#137](https://github.com/ecmwf/anemoi-training/pulls/137)
-    - Add without subsetting in ScaleTensor
+  - Add without subsetting in ScaleTensor
+
 - Sub-hour datasets [#63](https://github.com/ecmwf/anemoi-training/pull/63)
 - Add synchronisation workflow [#92](https://github.com/ecmwf/anemoi-training/pull/92)
 - Feat: Anemoi Profiler compatible with mlflow and using Pytorch (Kineto) Profiler for memory report [38](https://github.com/ecmwf/anemoi-training/pull/38/)
-- Added a check for the variable sorting on pre-trained/finetuned models [#120](https://github.com/ecmwf/anemoi-training/pull/120)
+- Feat: Save a gif for longer rollouts in validation [#65](https://github.com/ecmwf/anemoi-training/pull/65)
 - New limited area config file added, limited_area.yaml. [#134](https://github.com/ecmwf/anemoi-training/pull/134/)
 - New stretched grid config added, stretched_grid.yaml [#133](https://github.com/ecmwf/anemoi-training/pull/133)
 
 ### Changed
+
 - Renamed frequency keys in callbacks configuration. [#118](https://github.com/ecmwf/anemoi-training/pull/118)
 - Modified training configuration to support max_steps and tied lr iterations to max_steps by default [#67](https://github.com/ecmwf/anemoi-training/pull/67)
 - Merged node & edge trainable feature callbacks into one. [#135](https://github.com/ecmwf/anemoi-training/pull/135)
 
 ### Removed
-- Removed the resolution config entry [#120](https://github.com/ecmwf/anemoi-training/pull/120)
 
 ## [0.2.2 - Maintenance: pin python <3.13](https://github.com/ecmwf/anemoi-training/compare/0.2.1...0.2.2) - 2024-10-28
 
-
 ### Changed
 
 - Lock python version <3.13 [#107](https://github.com/ecmwf/anemoi-training/pull/107)
 
-
-
 ## [0.2.1 - Bugfix: resuming mlflow runs](https://github.com/ecmwf/anemoi-training/compare/0.2.0...0.2.1) - 2024-10-24
 
 ### Added
@@ -90,6 +110,7 @@ Keep it human-readable, your future self will thank you!
 
 - Variable Bounding as configurable model layers [#13](https://github.com/ecmwf/anemoi-models/issues/13)
 
+
 #### Functionality
 
 - Enable the callback for plotting a histogram for variables containing NaNs
@@ -101,7 +122,6 @@ Keep it human-readable, your future self will thank you!
 - Feature: `AnemoiMlflowClient`, an mlflow client with authentication support [#86](https://github.com/ecmwf/anemoi-training/pull/86)
 - Long Rollout Plots
 
-
 ### Fixed
 
 - Fix `TypeError` raised when trying to JSON serialise `datetime.timedelta` object - [#43](https://github.com/ecmwf/anemoi-training/pull/43)

diff --git a/src/anemoi/training/config/diagnostics/plot/rollout_eval.yaml b/src/anemoi/training/config/diagnostics/plot/rollout_eval.yaml
@@ -60,8 +60,10 @@ callbacks:
     - 10u
     - 10v
   - _target_:  anemoi.training.diagnostics.callbacks.plot.LongRolloutPlots
+    # for rollout and video_rollout pick any integers below dataloader.validation_rollout
     rollout:
       - ${dataloader.validation_rollout}
+    video_rollout: ${dataloader.validation_rollout}
     every_n_epochs: 20
     sample_idx: ${diagnostics.plot.sample_idx}
     parameters: ${diagnostics.plot.parameters}