`lr-decay-strategy epoch+stalled` not working #424

ZJaume · 2025-01-27T14:30:36Z

Bug description

lr-decay-strategy epoch+stalled does not decay the learning rate after stalled validation.

How to reproduce

Set --lr-decay 0.5 --lr-decay-strategy epoch+stalled --lr-decay-start 1 1 and wait until one validation stalls.

Instead, if --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 is set, the learning rate is correctly decayed after the first stalled validation.

Context

Marian version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800
CMake command: cmake .. -DCOMPILE_CPU=OFF -DCOMPILE_TURING=OFF -DCOMPILE_VOLTA=OFF -DCOMPILE_PASCAL=OFF -DCOMPILE_MAXWELL=OFF -DCMAKE_INSTALL_PREFIX:PATH=~/.local
Log file:

Training logs after one stalled validation with 'epoch+stalled' strategy

[2025-01-27 13:58:45] [marian] Marian v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] [marian] Running on a1e8062f4fc4 as process 40011 with command line:
[2025-01-27 13:58:45] [marian] marian -c teacher.finetune.yml -c teacher-finetuned/train.yml -m teacher-finetuned/model1.npz -v teacher-finetuned/vocab.spm teacher-finetuned/vocab.spm --tsv -t train_teacher.gz --valid-sets dev.tsv --seed 1111 --log teacher-finetuned/model1.npz.finetune.log --learn-rate 1e-05 --lr-decay-inv-sqrt 0 --lr-decay 0.5 --lr-decay-strategy stalled+epoch --lr-decay-start 1 1 --valid-freq 100
[2025-01-27 13:58:45] [config] after: 30e
[2025-01-27 13:58:45] [config] after-batches: 0
[2025-01-27 13:58:45] [config] after-epochs: 0
[2025-01-27 13:58:45] [config] all-caps-every: 0
[2025-01-27 13:58:45] [config] allow-unk: false
[2025-01-27 13:58:45] [config] authors: false
[2025-01-27 13:58:45] [config] beam-size: 4
[2025-01-27 13:58:45] [config] bert-class-symbol: "[CLS]"
[2025-01-27 13:58:45] [config] bert-mask-symbol: "[MASK]"
[2025-01-27 13:58:45] [config] bert-masking-fraction: 0.15
[2025-01-27 13:58:45] [config] bert-sep-symbol: "[SEP]"
[2025-01-27 13:58:45] [config] bert-train-type-embeddings: true
[2025-01-27 13:58:45] [config] bert-type-vocab-size: 2
[2025-01-27 13:58:45] [config] build-info: ""
[2025-01-27 13:58:45] [config] check-gradient-nan: false
[2025-01-27 13:58:45] [config] check-nan: false
[2025-01-27 13:58:45] [config] cite: false
[2025-01-27 13:58:45] [config] clip-norm: 0
[2025-01-27 13:58:45] [config] cost-scaling:
[2025-01-27 13:58:45] [config]   - 8.f
[2025-01-27 13:58:45] [config]   - 10000
[2025-01-27 13:58:45] [config]   - 1.f
[2025-01-27 13:58:45] [config]   - 8.f
[2025-01-27 13:58:45] [config] cost-type: ce-sum
[2025-01-27 13:58:45] [config] cpu-threads: 0
[2025-01-27 13:58:45] [config] data-threads: 8
[2025-01-27 13:58:45] [config] data-weighting: ""
[2025-01-27 13:58:45] [config] data-weighting-type: sentence
[2025-01-27 13:58:45] [config] dec-cell: gru
[2025-01-27 13:58:45] [config] dec-cell-base-depth: 2
[2025-01-27 13:58:45] [config] dec-cell-high-depth: 1
[2025-01-27 13:58:45] [config] dec-depth: 6
[2025-01-27 13:58:45] [config] devices:
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config]   - 2
[2025-01-27 13:58:45] [config]   - 3
[2025-01-27 13:58:45] [config] dim-emb: 1024
[2025-01-27 13:58:45] [config] dim-rnn: 1024
[2025-01-27 13:58:45] [config] dim-vocabs:
[2025-01-27 13:58:45] [config]   - 32000
[2025-01-27 13:58:45] [config]   - 32000
[2025-01-27 13:58:45] [config] disp-first: 10
[2025-01-27 13:58:45] [config] disp-freq: 50
[2025-01-27 13:58:45] [config] disp-label-counts: true
[2025-01-27 13:58:45] [config] dropout-rnn: 0
[2025-01-27 13:58:45] [config] dropout-src: 0
[2025-01-27 13:58:45] [config] dropout-trg: 0
[2025-01-27 13:58:45] [config] dump-config: ""
[2025-01-27 13:58:45] [config] dynamic-gradient-scaling:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] early-stopping: 20
[2025-01-27 13:58:45] [config] early-stopping-on: first
[2025-01-27 13:58:45] [config] embedding-fix-src: false
[2025-01-27 13:58:45] [config] embedding-fix-trg: false
[2025-01-27 13:58:45] [config] embedding-normalization: false
[2025-01-27 13:58:45] [config] embedding-vectors:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] enc-cell: gru
[2025-01-27 13:58:45] [config] enc-cell-depth: 1
[2025-01-27 13:58:45] [config] enc-depth: 6
[2025-01-27 13:58:45] [config] enc-type: bidirectional
[2025-01-27 13:58:45] [config] english-title-case-every: 0
[2025-01-27 13:58:45] [config] exponential-smoothing: 0.0001
[2025-01-27 13:58:45] [config] factor-weight: 1
[2025-01-27 13:58:45] [config] factors-combine: sum
[2025-01-27 13:58:45] [config] factors-dim-emb: 0
[2025-01-27 13:58:45] [config] gradient-checkpointing: false
[2025-01-27 13:58:45] [config] gradient-norm-average-window: 100
[2025-01-27 13:58:45] [config] guided-alignment: none
[2025-01-27 13:58:45] [config] guided-alignment-cost: ce
[2025-01-27 13:58:45] [config] guided-alignment-weight: 0.1
[2025-01-27 13:58:45] [config] ignore-model-config: false
[2025-01-27 13:58:45] [config] input-types:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] interpolate-env-vars: false
[2025-01-27 13:58:45] [config] keep-best: True
[2025-01-27 13:58:45] [config] label-smoothing: 0
[2025-01-27 13:58:45] [config] layer-normalization: false
[2025-01-27 13:58:45] [config] learn-rate: 1e-05
[2025-01-27 13:58:45] [config] lemma-dependency: ""
[2025-01-27 13:58:45] [config] lemma-dim-emb: 0
[2025-01-27 13:58:45] [config] log: teacher-finetuned/model1.npz.finetune.log
[2025-01-27 13:58:45] [config] log-level: info
[2025-01-27 13:58:45] [config] log-time-zone: ""
[2025-01-27 13:58:45] [config] logical-epoch:
[2025-01-27 13:58:45] [config]   - 1e
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config] lr-decay: 0.5
[2025-01-27 13:58:45] [config] lr-decay-freq: 50000
[2025-01-27 13:58:45] [config] lr-decay-inv-sqrt:
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config] lr-decay-repeat-warmup: false
[2025-01-27 13:58:45] [config] lr-decay-reset-optimizer: false
[2025-01-27 13:58:45] [config] lr-decay-start:
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config] lr-decay-strategy: stalled+epoch
[2025-01-27 13:58:45] [config] lr-report: True
[2025-01-27 13:58:45] [config] lr-warmup: 200
[2025-01-27 13:58:45] [config] lr-warmup-at-reload: false
[2025-01-27 13:58:45] [config] lr-warmup-cycle: false
[2025-01-27 13:58:45] [config] lr-warmup-start-rate: 0
[2025-01-27 13:58:45] [config] max-length: 300
[2025-01-27 13:58:45] [config] max-length-crop: false
[2025-01-27 13:58:45] [config] max-length-factor: 3
[2025-01-27 13:58:45] [config] maxi-batch: 100
[2025-01-27 13:58:45] [config] maxi-batch-sort: trg
[2025-01-27 13:58:45] [config] mini-batch: 64
[2025-01-27 13:58:45] [config] mini-batch-fit: True
[2025-01-27 13:58:45] [config] mini-batch-fit-step: 10
[2025-01-27 13:58:45] [config] mini-batch-round-up: true
[2025-01-27 13:58:45] [config] mini-batch-track-lr: false
[2025-01-27 13:58:45] [config] mini-batch-warmup: 0
[2025-01-27 13:58:45] [config] mini-batch-words: 0
[2025-01-27 13:58:45] [config] mini-batch-words-ref: 0
[2025-01-27 13:58:45] [config] model: teacher-finetuned/model1.npz
[2025-01-27 13:58:45] [config] multi-loss-type: sum
[2025-01-27 13:58:45] [config] n-best: false
[2025-01-27 13:58:45] [config] no-nccl: false
[2025-01-27 13:58:45] [config] no-reload: false
[2025-01-27 13:58:45] [config] no-restore-corpus: false
[2025-01-27 13:58:45] [config] normalize: 1.0
[2025-01-27 13:58:45] [config] normalize-gradient: false
[2025-01-27 13:58:45] [config] num-devices: 0
[2025-01-27 13:58:45] [config] optimizer: adam
[2025-01-27 13:58:45] [config] optimizer-delay: 1
[2025-01-27 13:58:45] [config] optimizer-params:
[2025-01-27 13:58:45] [config]   - 0.9
[2025-01-27 13:58:45] [config]   - 0.98
[2025-01-27 13:58:45] [config]   - 1e-09
[2025-01-27 13:58:45] [config] output-omit-bias: false
[2025-01-27 13:58:45] [config] overwrite: True
[2025-01-27 13:58:45] [config] precision:
[2025-01-27 13:58:45] [config]   - float16
[2025-01-27 13:58:45] [config]   - float32
[2025-01-27 13:58:45] [config] pretrained-model: ""
[2025-01-27 13:58:45] [config] quantize-biases: false
[2025-01-27 13:58:45] [config] quantize-bits: 0
[2025-01-27 13:58:45] [config] quantize-log-based: false
[2025-01-27 13:58:45] [config] quantize-optimization-steps: 0
[2025-01-27 13:58:45] [config] quiet: false
[2025-01-27 13:58:45] [config] quiet-translation: true
[2025-01-27 13:58:45] [config] relative-paths: false
[2025-01-27 13:58:45] [config] right-left: false
[2025-01-27 13:58:45] [config] save-freq: 200
[2025-01-27 13:58:45] [config] seed: 1111
[2025-01-27 13:58:45] [config] sentencepiece-alphas:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] sentencepiece-max-lines: 2000000
[2025-01-27 13:58:45] [config] sentencepiece-options: ""
[2025-01-27 13:58:45] [config] sharding: global
[2025-01-27 13:58:45] [config] shuffle: data
[2025-01-27 13:58:45] [config] shuffle-in-ram: true
[2025-01-27 13:58:45] [config] sigterm: save-and-exit
[2025-01-27 13:58:45] [config] skip: false
[2025-01-27 13:58:45] [config] sqlite: ""
[2025-01-27 13:58:45] [config] sqlite-drop: false
[2025-01-27 13:58:45] [config] sync-freq: 200u
[2025-01-27 13:58:45] [config] sync-sgd: true
[2025-01-27 13:58:45] [config] tempdir: /tmp
[2025-01-27 13:58:45] [config] tied-embeddings: false
[2025-01-27 13:58:45] [config] tied-embeddings-all: true
[2025-01-27 13:58:45] [config] tied-embeddings-src: false
[2025-01-27 13:58:45] [config] train-embedder-rank:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] train-sets:
[2025-01-27 13:58:45] [config]   - train_teacher.gz
[2025-01-27 13:58:45] [config] transformer-aan-activation: swish
[2025-01-27 13:58:45] [config] transformer-aan-depth: 2
[2025-01-27 13:58:45] [config] transformer-aan-nogate: false
[2025-01-27 13:58:45] [config] transformer-decoder-autoreg: self-attention
[2025-01-27 13:58:45] [config] transformer-decoder-dim-ffn: 0
[2025-01-27 13:58:45] [config] transformer-decoder-ffn-depth: 0
[2025-01-27 13:58:45] [config] transformer-depth-scaling: false
[2025-01-27 13:58:45] [config] transformer-dim-aan: 2048
[2025-01-27 13:58:45] [config] transformer-dim-ffn: 4096
[2025-01-27 13:58:45] [config] transformer-dropout: 0
[2025-01-27 13:58:45] [config] transformer-dropout-attention: 0
[2025-01-27 13:58:45] [config] transformer-dropout-ffn: 0
[2025-01-27 13:58:45] [config] transformer-ffn-activation: relu
[2025-01-27 13:58:45] [config] transformer-ffn-depth: 2
[2025-01-27 13:58:45] [config] transformer-guided-alignment-layer: last
[2025-01-27 13:58:45] [config] transformer-heads: 16
[2025-01-27 13:58:45] [config] transformer-no-projection: false
[2025-01-27 13:58:45] [config] transformer-pool: false
[2025-01-27 13:58:45] [config] transformer-postprocess: dan
[2025-01-27 13:58:45] [config] transformer-postprocess-emb: d
[2025-01-27 13:58:45] [config] transformer-postprocess-top: ""
[2025-01-27 13:58:45] [config] transformer-preprocess: ""
[2025-01-27 13:58:45] [config] transformer-rnn-projection: false
[2025-01-27 13:58:45] [config] transformer-tied-layers:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] transformer-train-position-embeddings: false
[2025-01-27 13:58:45] [config] tsv: true
[2025-01-27 13:58:45] [config] tsv-fields: 2
[2025-01-27 13:58:45] [config] type: transformer
[2025-01-27 13:58:45] [config] ulr: false
[2025-01-27 13:58:45] [config] ulr-dim-emb: 0
[2025-01-27 13:58:45] [config] ulr-dropout: 0
[2025-01-27 13:58:45] [config] ulr-keys-vectors: ""
[2025-01-27 13:58:45] [config] ulr-query-vectors: ""
[2025-01-27 13:58:45] [config] ulr-softmax-temperature: 1
[2025-01-27 13:58:45] [config] ulr-trainable-transformation: false
[2025-01-27 13:58:45] [config] unlikelihood-loss: false
[2025-01-27 13:58:45] [config] valid-freq: 100
[2025-01-27 13:58:45] [config] valid-log: ""
[2025-01-27 13:58:45] [config] valid-max-length: 1000
[2025-01-27 13:58:45] [config] valid-metrics:
[2025-01-27 13:58:45] [config]   - ce-mean-words
[2025-01-27 13:58:45] [config]   - chrf
[2025-01-27 13:58:45] [config]   - bleu-detok
[2025-01-27 13:58:45] [config] valid-mini-batch: 32
[2025-01-27 13:58:45] [config] valid-reset-all: false
[2025-01-27 13:58:45] [config] valid-reset-stalled: false
[2025-01-27 13:58:45] [config] valid-script-args:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] valid-script-path: ""
[2025-01-27 13:58:45] [config] valid-sets:
[2025-01-27 13:58:45] [config]   - dev.tsv
[2025-01-27 13:58:45] [config] valid-translation-output: ""
[2025-01-27 13:58:45] [config] version: v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] [config] vocabs:
[2025-01-27 13:58:45] [config]   - teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [config]   - teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [config] word-penalty: 0
[2025-01-27 13:58:45] [config] word-scores: false
[2025-01-27 13:58:45] [config] workspace: -8000
[2025-01-27 13:58:45] [config] Loaded model has been created with Marian v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] Using synchronous SGD
[2025-01-27 13:58:45] [comm] Compiled without MPI support. Running as a single process on a1e8062f4fc4
[2025-01-27 13:58:45] Synced seed 1111
[2025-01-27 13:58:45] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [data] Setting vocabulary size for input 0 to 32,000
[2025-01-27 13:58:45] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [data] Setting vocabulary size for input 1 to 32,000
[2025-01-27 13:58:45] [batching] Collecting statistics for batch fitting with step size 10
[2025-01-27 13:58:45] Training with cost scaling - factor: 8, frequency: 10000, multiplier: 1, minimum: 8
[2025-01-27 13:58:45] [memory] Extending reserved space to 16128 MB (device gpu0)
[2025-01-27 13:58:45] [memory] Extending reserved space to 16128 MB (device gpu1)
[2025-01-27 13:58:46] [memory] Extending reserved space to 16128 MB (device gpu2)
[2025-01-27 13:58:46] [memory] Extending reserved space to 16128 MB (device gpu3)
[2025-01-27 13:58:46] [comm] Using NCCL 2.8.3 for GPU communication
[2025-01-27 13:58:46] [comm] Using global sharding
[2025-01-27 13:58:46] [comm] NCCLCommunicators constructed successfully
[2025-01-27 13:58:46] [training] Using 4 GPUs
[2025-01-27 13:58:46] [logits] Applying loss function for 1 factor(s)
[2025-01-27 13:58:46] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:58:46] [gpu] 16-bit TensorCores enabled for float32 matrix operations
[2025-01-27 13:58:46] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:19] [batching] Done. Typical MB size is 62,372 target words
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu0)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu1)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu2)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu3)
[2025-01-27 13:59:19] [comm] Using NCCL 2.8.3 for GPU communication
[2025-01-27 13:59:19] [comm] Using global sharding
[2025-01-27 13:59:19] [comm] NCCLCommunicators constructed successfully
[2025-01-27 13:59:19] [training] Using 4 GPUs
[2025-01-27 13:59:19] Loading model from teacher-finetuned/model1.npz
[2025-01-27 13:59:26] Allocating memory for general optimizer shards
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu0
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu1
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu2
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu3
[2025-01-27 13:59:27] Loading Adam parameters
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:29] [training] Master parameters and optimizers restored from training checkpoint teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz
[2025-01-27 13:59:29] [data] Restoring the corpus state to epoch 3, batch 5200
[2025-01-27 13:59:29] [data] Shuffling data
[2025-01-27 13:59:31] [data] Done reading 1,535,277 sentences
[2025-01-27 13:59:31] [data] Done shuffling 1,535,277 sentences (cached in RAM)
[2025-01-27 13:59:38] Training started
[2025-01-27 13:59:38] [training] Batches are processed as 1 process(es) x 4 devices/process
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:39] Parameter type float16, optimization type float32, casting types true
[2025-01-27 14:00:07] Ep. 3 : Up. 5250 : Sen. 212,322 : Cost 0.68544847 * 885,744 @ 25,279 after 94,894,463 : Time 47.93s : 18480.48 words/s : gNorm 0.6284 : L.r. 2.5000e-06
[2025-01-27 14:00:35] Ep. 3 : Up. 5300 : Sen. 256,577 : Cost 0.68347019 * 920,439 @ 29,186 after 95,814,902 : Time 28.20s : 32642.03 words/s : gNorm 0.7331 : L.r. 2.5000e-06
[2025-01-27 14:00:35] [valid] Ep. 3 : Up. 5300 : ce-mean-words : 0.813648 : stalled 1 times (last best: 0.813412)
[2025-01-27 14:00:37] [valid] Ep. 3 : Up. 5300 : chrf : 66.2634 : stalled 1 times (last best: 66.3262)
[2025-01-27 14:00:39] [valid] Ep. 3 : Up. 5300 : bleu-detok : 41.2837 : stalled 1 times (last best: 41.4621)
[2025-01-27 14:01:07] Ep. 3 : Up. 5350 : Sen. 299,352 : Cost 0.68636936 * 881,133 @ 23,996 after 96,696,035 : Time 32.34s : 27242.83 words/s : gNorm 0.8900 : L.r. 2.5000e-06
[2025-01-27 14:01:25] Seen 326,239 samples
[2025-01-27 14:01:25] Starting data epoch 4 in logical epoch 4
[2025-01-27 14:01:25] [data] Shuffling data
[2025-01-27 14:01:25] [data] Done shuffling 1,535,277 sentences (cached in RAM)
[2025-01-27 14:01:36] Ep. 4 : Up. 5400 : Sen. 14,270 : Cost 0.69162977 * 890,340 @ 25,287 after 97,586,375 : Time 28.43s : 31317.39 words/s : gNorm 0.7633 : L.r. 2.5000e-06
[2025-01-27 14:01:36] Saving model weights and runtime parameters to teacher-finetuned/model1.npz
[2025-01-27 14:01:38] Saving Adam parameters
[2025-01-27 14:01:40] [training] Saving training checkpoint to teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz

Training logs after one stalled validation with 'stalled' strategy

``` [2025-01-27 14:02:05] [marian] Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [marian] Running on a1e8062f4fc4 as process 40103 with command line: [2025-01-27 14:02:05] [marian] marian -c teacher.finetune.yml -c teacher-finetuned/train.yml -m teacher-finetuned/model1.npz -v teacher-finetuned/vocab.spm teacher-finetuned/vocab.spm --tsv -t train_teacher.gz --valid-sets dev.tsv --seed 1111 --log teacher-finetuned/model1.npz.finetune.log --learn-rate 1e-05 --lr-decay-inv-sqrt 0 --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 --valid-freq 100 [2025-01-27 14:02:05] [config] after: 30e [2025-01-27 14:02:05] [config] after-batches: 0 [2025-01-27 14:02:05] [config] after-epochs: 0 [2025-01-27 14:02:05] [config] all-caps-every: 0 [2025-01-27 14:02:05] [config] allow-unk: false [2025-01-27 14:02:05] [config] authors: false [2025-01-27 14:02:05] [config] beam-size: 4 [2025-01-27 14:02:05] [config] bert-class-symbol: "[CLS]" [2025-01-27 14:02:05] [config] bert-mask-symbol: "[MASK]" [2025-01-27 14:02:05] [config] bert-masking-fraction: 0.15 [2025-01-27 14:02:05] [config] bert-sep-symbol: "[SEP]" [2025-01-27 14:02:05] [config] bert-train-type-embeddings: true [2025-01-27 14:02:05] [config] bert-type-vocab-size: 2 [2025-01-27 14:02:05] [config] build-info: "" [2025-01-27 14:02:05] [config] check-gradient-nan: false [2025-01-27 14:02:05] [config] check-nan: false [2025-01-27 14:02:05] [config] cite: false [2025-01-27 14:02:05] [config] clip-norm: 0 [2025-01-27 14:02:05] [config] cost-scaling: [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] - 10000 [2025-01-27 14:02:05] [config] - 1.f [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] cost-type: ce-sum [2025-01-27 14:02:05] [config] cpu-threads: 0 [2025-01-27 14:02:05] [config] data-threads: 8 [2025-01-27 14:02:05] [config] data-weighting: "" [2025-01-27 14:02:05] [config] data-weighting-type: sentence [2025-01-27 14:02:05] [config] dec-cell: gru [2025-01-27 14:02:05] [config] dec-cell-base-depth: 2 [2025-01-27 14:02:05] [config] dec-cell-high-depth: 1 [2025-01-27 14:02:05] [config] dec-depth: 6 [2025-01-27 14:02:05] [config] devices: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] - 2 [2025-01-27 14:02:05] [config] - 3 [2025-01-27 14:02:05] [config] dim-emb: 1024 [2025-01-27 14:02:05] [config] dim-rnn: 1024 [2025-01-27 14:02:05] [config] dim-vocabs: [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] disp-first: 10 [2025-01-27 14:02:05] [config] disp-freq: 50 [2025-01-27 14:02:05] [config] disp-label-counts: true [2025-01-27 14:02:05] [config] dropout-rnn: 0 [2025-01-27 14:02:05] [config] dropout-src: 0 [2025-01-27 14:02:05] [config] dropout-trg: 0 [2025-01-27 14:02:05] [config] dump-config: "" [2025-01-27 14:02:05] [config] dynamic-gradient-scaling: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] early-stopping: 20 [2025-01-27 14:02:05] [config] early-stopping-on: first [2025-01-27 14:02:05] [config] embedding-fix-src: false [2025-01-27 14:02:05] [config] embedding-fix-trg: false [2025-01-27 14:02:05] [config] embedding-normalization: false [2025-01-27 14:02:05] [config] embedding-vectors: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] enc-cell: gru [2025-01-27 14:02:05] [config] enc-cell-depth: 1 [2025-01-27 14:02:05] [config] enc-depth: 6 [2025-01-27 14:02:05] [config] enc-type: bidirectional [2025-01-27 14:02:05] [config] english-title-case-every: 0 [2025-01-27 14:02:05] [config] exponential-smoothing: 0.0001 [2025-01-27 14:02:05] [config] factor-weight: 1 [2025-01-27 14:02:05] [config] factors-combine: sum [2025-01-27 14:02:05] [config] factors-dim-emb: 0 [2025-01-27 14:02:05] [config] gradient-checkpointing: false [2025-01-27 14:02:05] [config] gradient-norm-average-window: 100 [2025-01-27 14:02:05] [config] guided-alignment: none [2025-01-27 14:02:05] [config] guided-alignment-cost: ce [2025-01-27 14:02:05] [config] guided-alignment-weight: 0.1 [2025-01-27 14:02:05] [config] ignore-model-config: false [2025-01-27 14:02:05] [config] input-types: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] interpolate-env-vars: false [2025-01-27 14:02:05] [config] keep-best: True [2025-01-27 14:02:05] [config] label-smoothing: 0 [2025-01-27 14:02:05] [config] layer-normalization: false [2025-01-27 14:02:05] [config] learn-rate: 1e-05 [2025-01-27 14:02:05] [config] lemma-dependency: "" [2025-01-27 14:02:05] [config] lemma-dim-emb: 0 [2025-01-27 14:02:05] [config] log: teacher-finetuned/model1.npz.finetune.log [2025-01-27 14:02:05] [config] log-level: info [2025-01-27 14:02:05] [config] log-time-zone: "" [2025-01-27 14:02:05] [config] logical-epoch: [2025-01-27 14:02:05] [config] - 1e [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay: 0.5 [2025-01-27 14:02:05] [config] lr-decay-freq: 50000 [2025-01-27 14:02:05] [config] lr-decay-inv-sqrt: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay-repeat-warmup: false [2025-01-27 14:02:05] [config] lr-decay-reset-optimizer: false [2025-01-27 14:02:05] [config] lr-decay-start: [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] lr-decay-strategy: stalled [2025-01-27 14:02:05] [config] lr-report: True [2025-01-27 14:02:05] [config] lr-warmup: 200 [2025-01-27 14:02:05] [config] lr-warmup-at-reload: false [2025-01-27 14:02:05] [config] lr-warmup-cycle: false [2025-01-27 14:02:05] [config] lr-warmup-start-rate: 0 [2025-01-27 14:02:05] [config] max-length: 300 [2025-01-27 14:02:05] [config] max-length-crop: false [2025-01-27 14:02:05] [config] max-length-factor: 3 [2025-01-27 14:02:05] [config] maxi-batch: 100 [2025-01-27 14:02:05] [config] maxi-batch-sort: trg [2025-01-27 14:02:05] [config] mini-batch: 64 [2025-01-27 14:02:05] [config] mini-batch-fit: True [2025-01-27 14:02:05] [config] mini-batch-fit-step: 10 [2025-01-27 14:02:05] [config] mini-batch-round-up: true [2025-01-27 14:02:05] [config] mini-batch-track-lr: false [2025-01-27 14:02:05] [config] mini-batch-warmup: 0 [2025-01-27 14:02:05] [config] mini-batch-words: 0 [2025-01-27 14:02:05] [config] mini-batch-words-ref: 0 [2025-01-27 14:02:05] [config] model: teacher-finetuned/model1.npz [2025-01-27 14:02:05] [config] multi-loss-type: sum [2025-01-27 14:02:05] [config] n-best: false [2025-01-27 14:02:05] [config] no-nccl: false [2025-01-27 14:02:05] [config] no-reload: false [2025-01-27 14:02:05] [config] no-restore-corpus: false [2025-01-27 14:02:05] [config] normalize: 1.0 [2025-01-27 14:02:05] [config] normalize-gradient: false [2025-01-27 14:02:05] [config] num-devices: 0 [2025-01-27 14:02:05] [config] optimizer: adam [2025-01-27 14:02:05] [config] optimizer-delay: 1 [2025-01-27 14:02:05] [config] optimizer-params: [2025-01-27 14:02:05] [config] - 0.9 [2025-01-27 14:02:05] [config] - 0.98 [2025-01-27 14:02:05] [config] - 1e-09 [2025-01-27 14:02:05] [config] output-omit-bias: false [2025-01-27 14:02:05] [config] overwrite: True [2025-01-27 14:02:05] [config] precision: [2025-01-27 14:02:05] [config] - float16 [2025-01-27 14:02:05] [config] - float32 [2025-01-27 14:02:05] [config] pretrained-model: "" [2025-01-27 14:02:05] [config] quantize-biases: false [2025-01-27 14:02:05] [config] quantize-bits: 0 [2025-01-27 14:02:05] [config] quantize-log-based: false [2025-01-27 14:02:05] [config] quantize-optimization-steps: 0 [2025-01-27 14:02:05] [config] quiet: false [2025-01-27 14:02:05] [config] quiet-translation: true [2025-01-27 14:02:05] [config] relative-paths: false [2025-01-27 14:02:05] [config] right-left: false [2025-01-27 14:02:05] [config] save-freq: 200 [2025-01-27 14:02:05] [config] seed: 1111 [2025-01-27 14:02:05] [config] sentencepiece-alphas: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] sentencepiece-max-lines: 2000000 [2025-01-27 14:02:05] [config] sentencepiece-options: "" [2025-01-27 14:02:05] [config] sharding: global [2025-01-27 14:02:05] [config] shuffle: data [2025-01-27 14:02:05] [config] shuffle-in-ram: true [2025-01-27 14:02:05] [config] sigterm: save-and-exit [2025-01-27 14:02:05] [config] skip: false [2025-01-27 14:02:05] [config] sqlite: "" [2025-01-27 14:02:05] [config] sqlite-drop: false [2025-01-27 14:02:05] [config] sync-freq: 200u [2025-01-27 14:02:05] [config] sync-sgd: true [2025-01-27 14:02:05] [config] tempdir: /tmp [2025-01-27 14:02:05] [config] tied-embeddings: false [2025-01-27 14:02:05] [config] tied-embeddings-all: true [2025-01-27 14:02:05] [config] tied-embeddings-src: false [2025-01-27 14:02:05] [config] train-embedder-rank: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] train-sets: [2025-01-27 14:02:05] [config] - train_teacher.gz [2025-01-27 14:02:05] [config] transformer-aan-activation: swish [2025-01-27 14:02:05] [config] transformer-aan-depth: 2 [2025-01-27 14:02:05] [config] transformer-aan-nogate: false [2025-01-27 14:02:05] [config] transformer-decoder-autoreg: self-attention [2025-01-27 14:02:05] [config] transformer-decoder-dim-ffn: 0 [2025-01-27 14:02:05] [config] transformer-decoder-ffn-depth: 0 [2025-01-27 14:02:05] [config] transformer-depth-scaling: false [2025-01-27 14:02:05] [config] transformer-dim-aan: 2048 [2025-01-27 14:02:05] [config] transformer-dim-ffn: 4096 [2025-01-27 14:02:05] [config] transformer-dropout: 0 [2025-01-27 14:02:05] [config] transformer-dropout-attention: 0 [2025-01-27 14:02:05] [config] transformer-dropout-ffn: 0 [2025-01-27 14:02:05] [config] transformer-ffn-activation: relu [2025-01-27 14:02:05] [config] transformer-ffn-depth: 2 [2025-01-27 14:02:05] [config] transformer-guided-alignment-layer: last [2025-01-27 14:02:05] [config] transformer-heads: 16 [2025-01-27 14:02:05] [config] transformer-no-projection: false [2025-01-27 14:02:05] [config] transformer-pool: false [2025-01-27 14:02:05] [config] transformer-postprocess: dan [2025-01-27 14:02:05] [config] transformer-postprocess-emb: d [2025-01-27 14:02:05] [config] transformer-postprocess-top: "" [2025-01-27 14:02:05] [config] transformer-preprocess: "" [2025-01-27 14:02:05] [config] transformer-rnn-projection: false [2025-01-27 14:02:05] [config] transformer-tied-layers: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] transformer-train-position-embeddings: false [2025-01-27 14:02:05] [config] tsv: true [2025-01-27 14:02:05] [config] tsv-fields: 2 [2025-01-27 14:02:05] [config] type: transformer [2025-01-27 14:02:05] [config] ulr: false [2025-01-27 14:02:05] [config] ulr-dim-emb: 0 [2025-01-27 14:02:05] [config] ulr-dropout: 0 [2025-01-27 14:02:05] [config] ulr-keys-vectors: "" [2025-01-27 14:02:05] [config] ulr-query-vectors: "" [2025-01-27 14:02:05] [config] ulr-softmax-temperature: 1 [2025-01-27 14:02:05] [config] ulr-trainable-transformation: false [2025-01-27 14:02:05] [config] unlikelihood-loss: false [2025-01-27 14:02:05] [config] valid-freq: 100 [2025-01-27 14:02:05] [config] valid-log: "" [2025-01-27 14:02:05] [config] valid-max-length: 1000 [2025-01-27 14:02:05] [config] valid-metrics: [2025-01-27 14:02:05] [config] - ce-mean-words [2025-01-27 14:02:05] [config] - chrf [2025-01-27 14:02:05] [config] - bleu-detok [2025-01-27 14:02:05] [config] valid-mini-batch: 32 [2025-01-27 14:02:05] [config] valid-reset-all: false [2025-01-27 14:02:05] [config] valid-reset-stalled: false [2025-01-27 14:02:05] [config] valid-script-args: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] valid-script-path: "" [2025-01-27 14:02:05] [config] valid-sets: [2025-01-27 14:02:05] [config] - dev.tsv [2025-01-27 14:02:05] [config] valid-translation-output: "" [2025-01-27 14:02:05] [config] version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [config] vocabs: [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] word-penalty: 0 [2025-01-27 14:02:05] [config] word-scores: false [2025-01-27 14:02:05] [config] workspace: -8000 [2025-01-27 14:02:05] [config] Loaded model has been created with Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] Using synchronous SGD [2025-01-27 14:02:05] [comm] Compiled without MPI support. Running as a single process on a1e8062f4fc4 [2025-01-27 14:02:05] Synced seed 1111 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 0 to 32,000 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 1 to 32,000 [2025-01-27 14:02:05] [batching] Collecting statistics for batch fitting with step size 10 [2025-01-27 14:02:05] Training with cost scaling - factor: 8, frequency: 10000, multiplier: 1, minimum: 8 [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:06] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:06] [comm] Using global sharding [2025-01-27 14:02:07] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:07] [training] Using 4 GPUs [2025-01-27 14:02:07] [logits] Applying loss function for 1 factor(s) [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:07] [gpu] 16-bit TensorCores enabled for float32 matrix operations [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:39] [batching] Done. Typical MB size is 62,372 target words [2025-01-27 14:02:39] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:40] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:40] [comm] Using global sharding [2025-01-27 14:02:40] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:40] [training] Using 4 GPUs [2025-01-27 14:02:40] Loading model from teacher-finetuned/model1.npz [2025-01-27 14:02:47] Allocating memory for general optimizer shards [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu2 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu3 [2025-01-27 14:02:47] Loading Adam parameters [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:49] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:50] [training] Master parameters and optimizers restored from training checkpoint teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz [2025-01-27 14:02:50] [data] Restoring the corpus state to epoch 4, batch 5400 [2025-01-27 14:02:50] [data] Shuffling data [2025-01-27 14:02:51] [data] Done reading 1,535,277 sentences [2025-01-27 14:02:51] [data] Done shuffling 1,535,277 sentences (cached in RAM) [2025-01-27 14:02:51] Training started [2025-01-27 14:02:51] [training] Batches are processed as 1 process(es) x 4 devices/process [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:52] Parameter type float16, optimization type float32, casting types true [2025-01-27 14:03:20] Ep. 4 : Up. 5450 : Sen. 57,593 : Cost 0.69126970 * 921,566 @ 25,911 after 98,507,941 : Time 40.64s : 22676.10 words/s : gNorm 0.6406 : L.r. 2.5000e-06 [2025-01-27 14:03:48] Ep. 4 : Up. 5500 : Sen. 100,288 : Cost 0.67408943 * 885,373 @ 16,279 after 99,393,314 : Time 28.13s : 31471.62 words/s : gNorm 0.6485 : L.r. 2.5000e-06 [2025-01-27 14:03:48] [valid] Ep. 4 : Up. 5500 : ce-mean-words : 0.813664 : stalled 2 times (last best: 0.813412) [2025-01-27 14:03:50] [valid] Ep. 4 : Up. 5500 : chrf : 66.2683 : stalled 2 times (last best: 66.3262) [2025-01-27 14:03:52] [valid] Ep. 4 : Up. 5500 : bleu-detok : 41.3057 : stalled 2 times (last best: 41.4621) [2025-01-27 14:03:52] Decaying learning rate to 1.25e-06 after having stalled 2 time(s) ```

The text was updated successfully, but these errors were encountered:

ZJaume added the bug label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`lr-decay-strategy epoch+stalled` not working #424

`lr-decay-strategy epoch+stalled` not working #424

ZJaume commented Jan 27, 2025

lr-decay-strategy epoch+stalled not working #424

lr-decay-strategy epoch+stalled not working #424

Comments

ZJaume commented Jan 27, 2025

Bug description

How to reproduce

Context

`lr-decay-strategy epoch+stalled` not working #424

`lr-decay-strategy epoch+stalled` not working #424