You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lr-decay-strategy epoch+stalled does not decay the learning rate after stalled validation.
How to reproduce
Set --lr-decay 0.5 --lr-decay-strategy epoch+stalled --lr-decay-start 1 1 and wait until one validation stalls.
Instead, if --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 is set, the learning rate is correctly decayed after the first stalled validation.
Context
Marian version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800
Bug description
lr-decay-strategy epoch+stalled
does not decay the learning rate after stalled validation.How to reproduce
Set
--lr-decay 0.5 --lr-decay-strategy epoch+stalled --lr-decay-start 1 1
and wait until one validation stalls.Instead, if
--lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1
is set, the learning rate is correctly decayed after the first stalled validation.Context
cmake .. -DCOMPILE_CPU=OFF -DCOMPILE_TURING=OFF -DCOMPILE_VOLTA=OFF -DCOMPILE_PASCAL=OFF -DCOMPILE_MAXWELL=OFF -DCMAKE_INSTALL_PREFIX:PATH=~/.local
Training logs after one stalled validation with 'epoch+stalled' strategy
Training logs after one stalled validation with 'stalled' strategy
``` [2025-01-27 14:02:05] [marian] Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [marian] Running on a1e8062f4fc4 as process 40103 with command line: [2025-01-27 14:02:05] [marian] marian -c teacher.finetune.yml -c teacher-finetuned/train.yml -m teacher-finetuned/model1.npz -v teacher-finetuned/vocab.spm teacher-finetuned/vocab.spm --tsv -t train_teacher.gz --valid-sets dev.tsv --seed 1111 --log teacher-finetuned/model1.npz.finetune.log --learn-rate 1e-05 --lr-decay-inv-sqrt 0 --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 --valid-freq 100 [2025-01-27 14:02:05] [config] after: 30e [2025-01-27 14:02:05] [config] after-batches: 0 [2025-01-27 14:02:05] [config] after-epochs: 0 [2025-01-27 14:02:05] [config] all-caps-every: 0 [2025-01-27 14:02:05] [config] allow-unk: false [2025-01-27 14:02:05] [config] authors: false [2025-01-27 14:02:05] [config] beam-size: 4 [2025-01-27 14:02:05] [config] bert-class-symbol: "[CLS]" [2025-01-27 14:02:05] [config] bert-mask-symbol: "[MASK]" [2025-01-27 14:02:05] [config] bert-masking-fraction: 0.15 [2025-01-27 14:02:05] [config] bert-sep-symbol: "[SEP]" [2025-01-27 14:02:05] [config] bert-train-type-embeddings: true [2025-01-27 14:02:05] [config] bert-type-vocab-size: 2 [2025-01-27 14:02:05] [config] build-info: "" [2025-01-27 14:02:05] [config] check-gradient-nan: false [2025-01-27 14:02:05] [config] check-nan: false [2025-01-27 14:02:05] [config] cite: false [2025-01-27 14:02:05] [config] clip-norm: 0 [2025-01-27 14:02:05] [config] cost-scaling: [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] - 10000 [2025-01-27 14:02:05] [config] - 1.f [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] cost-type: ce-sum [2025-01-27 14:02:05] [config] cpu-threads: 0 [2025-01-27 14:02:05] [config] data-threads: 8 [2025-01-27 14:02:05] [config] data-weighting: "" [2025-01-27 14:02:05] [config] data-weighting-type: sentence [2025-01-27 14:02:05] [config] dec-cell: gru [2025-01-27 14:02:05] [config] dec-cell-base-depth: 2 [2025-01-27 14:02:05] [config] dec-cell-high-depth: 1 [2025-01-27 14:02:05] [config] dec-depth: 6 [2025-01-27 14:02:05] [config] devices: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] - 2 [2025-01-27 14:02:05] [config] - 3 [2025-01-27 14:02:05] [config] dim-emb: 1024 [2025-01-27 14:02:05] [config] dim-rnn: 1024 [2025-01-27 14:02:05] [config] dim-vocabs: [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] disp-first: 10 [2025-01-27 14:02:05] [config] disp-freq: 50 [2025-01-27 14:02:05] [config] disp-label-counts: true [2025-01-27 14:02:05] [config] dropout-rnn: 0 [2025-01-27 14:02:05] [config] dropout-src: 0 [2025-01-27 14:02:05] [config] dropout-trg: 0 [2025-01-27 14:02:05] [config] dump-config: "" [2025-01-27 14:02:05] [config] dynamic-gradient-scaling: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] early-stopping: 20 [2025-01-27 14:02:05] [config] early-stopping-on: first [2025-01-27 14:02:05] [config] embedding-fix-src: false [2025-01-27 14:02:05] [config] embedding-fix-trg: false [2025-01-27 14:02:05] [config] embedding-normalization: false [2025-01-27 14:02:05] [config] embedding-vectors: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] enc-cell: gru [2025-01-27 14:02:05] [config] enc-cell-depth: 1 [2025-01-27 14:02:05] [config] enc-depth: 6 [2025-01-27 14:02:05] [config] enc-type: bidirectional [2025-01-27 14:02:05] [config] english-title-case-every: 0 [2025-01-27 14:02:05] [config] exponential-smoothing: 0.0001 [2025-01-27 14:02:05] [config] factor-weight: 1 [2025-01-27 14:02:05] [config] factors-combine: sum [2025-01-27 14:02:05] [config] factors-dim-emb: 0 [2025-01-27 14:02:05] [config] gradient-checkpointing: false [2025-01-27 14:02:05] [config] gradient-norm-average-window: 100 [2025-01-27 14:02:05] [config] guided-alignment: none [2025-01-27 14:02:05] [config] guided-alignment-cost: ce [2025-01-27 14:02:05] [config] guided-alignment-weight: 0.1 [2025-01-27 14:02:05] [config] ignore-model-config: false [2025-01-27 14:02:05] [config] input-types: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] interpolate-env-vars: false [2025-01-27 14:02:05] [config] keep-best: True [2025-01-27 14:02:05] [config] label-smoothing: 0 [2025-01-27 14:02:05] [config] layer-normalization: false [2025-01-27 14:02:05] [config] learn-rate: 1e-05 [2025-01-27 14:02:05] [config] lemma-dependency: "" [2025-01-27 14:02:05] [config] lemma-dim-emb: 0 [2025-01-27 14:02:05] [config] log: teacher-finetuned/model1.npz.finetune.log [2025-01-27 14:02:05] [config] log-level: info [2025-01-27 14:02:05] [config] log-time-zone: "" [2025-01-27 14:02:05] [config] logical-epoch: [2025-01-27 14:02:05] [config] - 1e [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay: 0.5 [2025-01-27 14:02:05] [config] lr-decay-freq: 50000 [2025-01-27 14:02:05] [config] lr-decay-inv-sqrt: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay-repeat-warmup: false [2025-01-27 14:02:05] [config] lr-decay-reset-optimizer: false [2025-01-27 14:02:05] [config] lr-decay-start: [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] lr-decay-strategy: stalled [2025-01-27 14:02:05] [config] lr-report: True [2025-01-27 14:02:05] [config] lr-warmup: 200 [2025-01-27 14:02:05] [config] lr-warmup-at-reload: false [2025-01-27 14:02:05] [config] lr-warmup-cycle: false [2025-01-27 14:02:05] [config] lr-warmup-start-rate: 0 [2025-01-27 14:02:05] [config] max-length: 300 [2025-01-27 14:02:05] [config] max-length-crop: false [2025-01-27 14:02:05] [config] max-length-factor: 3 [2025-01-27 14:02:05] [config] maxi-batch: 100 [2025-01-27 14:02:05] [config] maxi-batch-sort: trg [2025-01-27 14:02:05] [config] mini-batch: 64 [2025-01-27 14:02:05] [config] mini-batch-fit: True [2025-01-27 14:02:05] [config] mini-batch-fit-step: 10 [2025-01-27 14:02:05] [config] mini-batch-round-up: true [2025-01-27 14:02:05] [config] mini-batch-track-lr: false [2025-01-27 14:02:05] [config] mini-batch-warmup: 0 [2025-01-27 14:02:05] [config] mini-batch-words: 0 [2025-01-27 14:02:05] [config] mini-batch-words-ref: 0 [2025-01-27 14:02:05] [config] model: teacher-finetuned/model1.npz [2025-01-27 14:02:05] [config] multi-loss-type: sum [2025-01-27 14:02:05] [config] n-best: false [2025-01-27 14:02:05] [config] no-nccl: false [2025-01-27 14:02:05] [config] no-reload: false [2025-01-27 14:02:05] [config] no-restore-corpus: false [2025-01-27 14:02:05] [config] normalize: 1.0 [2025-01-27 14:02:05] [config] normalize-gradient: false [2025-01-27 14:02:05] [config] num-devices: 0 [2025-01-27 14:02:05] [config] optimizer: adam [2025-01-27 14:02:05] [config] optimizer-delay: 1 [2025-01-27 14:02:05] [config] optimizer-params: [2025-01-27 14:02:05] [config] - 0.9 [2025-01-27 14:02:05] [config] - 0.98 [2025-01-27 14:02:05] [config] - 1e-09 [2025-01-27 14:02:05] [config] output-omit-bias: false [2025-01-27 14:02:05] [config] overwrite: True [2025-01-27 14:02:05] [config] precision: [2025-01-27 14:02:05] [config] - float16 [2025-01-27 14:02:05] [config] - float32 [2025-01-27 14:02:05] [config] pretrained-model: "" [2025-01-27 14:02:05] [config] quantize-biases: false [2025-01-27 14:02:05] [config] quantize-bits: 0 [2025-01-27 14:02:05] [config] quantize-log-based: false [2025-01-27 14:02:05] [config] quantize-optimization-steps: 0 [2025-01-27 14:02:05] [config] quiet: false [2025-01-27 14:02:05] [config] quiet-translation: true [2025-01-27 14:02:05] [config] relative-paths: false [2025-01-27 14:02:05] [config] right-left: false [2025-01-27 14:02:05] [config] save-freq: 200 [2025-01-27 14:02:05] [config] seed: 1111 [2025-01-27 14:02:05] [config] sentencepiece-alphas: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] sentencepiece-max-lines: 2000000 [2025-01-27 14:02:05] [config] sentencepiece-options: "" [2025-01-27 14:02:05] [config] sharding: global [2025-01-27 14:02:05] [config] shuffle: data [2025-01-27 14:02:05] [config] shuffle-in-ram: true [2025-01-27 14:02:05] [config] sigterm: save-and-exit [2025-01-27 14:02:05] [config] skip: false [2025-01-27 14:02:05] [config] sqlite: "" [2025-01-27 14:02:05] [config] sqlite-drop: false [2025-01-27 14:02:05] [config] sync-freq: 200u [2025-01-27 14:02:05] [config] sync-sgd: true [2025-01-27 14:02:05] [config] tempdir: /tmp [2025-01-27 14:02:05] [config] tied-embeddings: false [2025-01-27 14:02:05] [config] tied-embeddings-all: true [2025-01-27 14:02:05] [config] tied-embeddings-src: false [2025-01-27 14:02:05] [config] train-embedder-rank: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] train-sets: [2025-01-27 14:02:05] [config] - train_teacher.gz [2025-01-27 14:02:05] [config] transformer-aan-activation: swish [2025-01-27 14:02:05] [config] transformer-aan-depth: 2 [2025-01-27 14:02:05] [config] transformer-aan-nogate: false [2025-01-27 14:02:05] [config] transformer-decoder-autoreg: self-attention [2025-01-27 14:02:05] [config] transformer-decoder-dim-ffn: 0 [2025-01-27 14:02:05] [config] transformer-decoder-ffn-depth: 0 [2025-01-27 14:02:05] [config] transformer-depth-scaling: false [2025-01-27 14:02:05] [config] transformer-dim-aan: 2048 [2025-01-27 14:02:05] [config] transformer-dim-ffn: 4096 [2025-01-27 14:02:05] [config] transformer-dropout: 0 [2025-01-27 14:02:05] [config] transformer-dropout-attention: 0 [2025-01-27 14:02:05] [config] transformer-dropout-ffn: 0 [2025-01-27 14:02:05] [config] transformer-ffn-activation: relu [2025-01-27 14:02:05] [config] transformer-ffn-depth: 2 [2025-01-27 14:02:05] [config] transformer-guided-alignment-layer: last [2025-01-27 14:02:05] [config] transformer-heads: 16 [2025-01-27 14:02:05] [config] transformer-no-projection: false [2025-01-27 14:02:05] [config] transformer-pool: false [2025-01-27 14:02:05] [config] transformer-postprocess: dan [2025-01-27 14:02:05] [config] transformer-postprocess-emb: d [2025-01-27 14:02:05] [config] transformer-postprocess-top: "" [2025-01-27 14:02:05] [config] transformer-preprocess: "" [2025-01-27 14:02:05] [config] transformer-rnn-projection: false [2025-01-27 14:02:05] [config] transformer-tied-layers: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] transformer-train-position-embeddings: false [2025-01-27 14:02:05] [config] tsv: true [2025-01-27 14:02:05] [config] tsv-fields: 2 [2025-01-27 14:02:05] [config] type: transformer [2025-01-27 14:02:05] [config] ulr: false [2025-01-27 14:02:05] [config] ulr-dim-emb: 0 [2025-01-27 14:02:05] [config] ulr-dropout: 0 [2025-01-27 14:02:05] [config] ulr-keys-vectors: "" [2025-01-27 14:02:05] [config] ulr-query-vectors: "" [2025-01-27 14:02:05] [config] ulr-softmax-temperature: 1 [2025-01-27 14:02:05] [config] ulr-trainable-transformation: false [2025-01-27 14:02:05] [config] unlikelihood-loss: false [2025-01-27 14:02:05] [config] valid-freq: 100 [2025-01-27 14:02:05] [config] valid-log: "" [2025-01-27 14:02:05] [config] valid-max-length: 1000 [2025-01-27 14:02:05] [config] valid-metrics: [2025-01-27 14:02:05] [config] - ce-mean-words [2025-01-27 14:02:05] [config] - chrf [2025-01-27 14:02:05] [config] - bleu-detok [2025-01-27 14:02:05] [config] valid-mini-batch: 32 [2025-01-27 14:02:05] [config] valid-reset-all: false [2025-01-27 14:02:05] [config] valid-reset-stalled: false [2025-01-27 14:02:05] [config] valid-script-args: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] valid-script-path: "" [2025-01-27 14:02:05] [config] valid-sets: [2025-01-27 14:02:05] [config] - dev.tsv [2025-01-27 14:02:05] [config] valid-translation-output: "" [2025-01-27 14:02:05] [config] version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [config] vocabs: [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] word-penalty: 0 [2025-01-27 14:02:05] [config] word-scores: false [2025-01-27 14:02:05] [config] workspace: -8000 [2025-01-27 14:02:05] [config] Loaded model has been created with Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] Using synchronous SGD [2025-01-27 14:02:05] [comm] Compiled without MPI support. Running as a single process on a1e8062f4fc4 [2025-01-27 14:02:05] Synced seed 1111 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 0 to 32,000 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 1 to 32,000 [2025-01-27 14:02:05] [batching] Collecting statistics for batch fitting with step size 10 [2025-01-27 14:02:05] Training with cost scaling - factor: 8, frequency: 10000, multiplier: 1, minimum: 8 [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:06] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:06] [comm] Using global sharding [2025-01-27 14:02:07] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:07] [training] Using 4 GPUs [2025-01-27 14:02:07] [logits] Applying loss function for 1 factor(s) [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:07] [gpu] 16-bit TensorCores enabled for float32 matrix operations [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:39] [batching] Done. Typical MB size is 62,372 target words [2025-01-27 14:02:39] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:40] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:40] [comm] Using global sharding [2025-01-27 14:02:40] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:40] [training] Using 4 GPUs [2025-01-27 14:02:40] Loading model from teacher-finetuned/model1.npz [2025-01-27 14:02:47] Allocating memory for general optimizer shards [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu2 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu3 [2025-01-27 14:02:47] Loading Adam parameters [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:49] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:50] [training] Master parameters and optimizers restored from training checkpoint teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz [2025-01-27 14:02:50] [data] Restoring the corpus state to epoch 4, batch 5400 [2025-01-27 14:02:50] [data] Shuffling data [2025-01-27 14:02:51] [data] Done reading 1,535,277 sentences [2025-01-27 14:02:51] [data] Done shuffling 1,535,277 sentences (cached in RAM) [2025-01-27 14:02:51] Training started [2025-01-27 14:02:51] [training] Batches are processed as 1 process(es) x 4 devices/process [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:52] Parameter type float16, optimization type float32, casting types true [2025-01-27 14:03:20] Ep. 4 : Up. 5450 : Sen. 57,593 : Cost 0.69126970 * 921,566 @ 25,911 after 98,507,941 : Time 40.64s : 22676.10 words/s : gNorm 0.6406 : L.r. 2.5000e-06 [2025-01-27 14:03:48] Ep. 4 : Up. 5500 : Sen. 100,288 : Cost 0.67408943 * 885,373 @ 16,279 after 99,393,314 : Time 28.13s : 31471.62 words/s : gNorm 0.6485 : L.r. 2.5000e-06 [2025-01-27 14:03:48] [valid] Ep. 4 : Up. 5500 : ce-mean-words : 0.813664 : stalled 2 times (last best: 0.813412) [2025-01-27 14:03:50] [valid] Ep. 4 : Up. 5500 : chrf : 66.2683 : stalled 2 times (last best: 66.3262) [2025-01-27 14:03:52] [valid] Ep. 4 : Up. 5500 : bleu-detok : 41.3057 : stalled 2 times (last best: 41.4621) [2025-01-27 14:03:52] Decaying learning rate to 1.25e-06 after having stalled 2 time(s) ```The text was updated successfully, but these errors were encountered: