Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lr-decay-strategy epoch+stalled not working #424

Open
ZJaume opened this issue Jan 27, 2025 · 0 comments
Open

lr-decay-strategy epoch+stalled not working #424

ZJaume opened this issue Jan 27, 2025 · 0 comments
Labels

Comments

@ZJaume
Copy link

ZJaume commented Jan 27, 2025

Bug description

lr-decay-strategy epoch+stalled does not decay the learning rate after stalled validation.

How to reproduce

Set --lr-decay 0.5 --lr-decay-strategy epoch+stalled --lr-decay-start 1 1 and wait until one validation stalls.

Instead, if --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 is set, the learning rate is correctly decayed after the first stalled validation.

Context

  • Marian version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800
  • CMake command: cmake .. -DCOMPILE_CPU=OFF -DCOMPILE_TURING=OFF -DCOMPILE_VOLTA=OFF -DCOMPILE_PASCAL=OFF -DCOMPILE_MAXWELL=OFF -DCMAKE_INSTALL_PREFIX:PATH=~/.local
  • Log file:
Training logs after one stalled validation with 'epoch+stalled' strategy
[2025-01-27 13:58:45] [marian] Marian v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] [marian] Running on a1e8062f4fc4 as process 40011 with command line:
[2025-01-27 13:58:45] [marian] marian -c teacher.finetune.yml -c teacher-finetuned/train.yml -m teacher-finetuned/model1.npz -v teacher-finetuned/vocab.spm teacher-finetuned/vocab.spm --tsv -t train_teacher.gz --valid-sets dev.tsv --seed 1111 --log teacher-finetuned/model1.npz.finetune.log --learn-rate 1e-05 --lr-decay-inv-sqrt 0 --lr-decay 0.5 --lr-decay-strategy stalled+epoch --lr-decay-start 1 1 --valid-freq 100
[2025-01-27 13:58:45] [config] after: 30e
[2025-01-27 13:58:45] [config] after-batches: 0
[2025-01-27 13:58:45] [config] after-epochs: 0
[2025-01-27 13:58:45] [config] all-caps-every: 0
[2025-01-27 13:58:45] [config] allow-unk: false
[2025-01-27 13:58:45] [config] authors: false
[2025-01-27 13:58:45] [config] beam-size: 4
[2025-01-27 13:58:45] [config] bert-class-symbol: "[CLS]"
[2025-01-27 13:58:45] [config] bert-mask-symbol: "[MASK]"
[2025-01-27 13:58:45] [config] bert-masking-fraction: 0.15
[2025-01-27 13:58:45] [config] bert-sep-symbol: "[SEP]"
[2025-01-27 13:58:45] [config] bert-train-type-embeddings: true
[2025-01-27 13:58:45] [config] bert-type-vocab-size: 2
[2025-01-27 13:58:45] [config] build-info: ""
[2025-01-27 13:58:45] [config] check-gradient-nan: false
[2025-01-27 13:58:45] [config] check-nan: false
[2025-01-27 13:58:45] [config] cite: false
[2025-01-27 13:58:45] [config] clip-norm: 0
[2025-01-27 13:58:45] [config] cost-scaling:
[2025-01-27 13:58:45] [config]   - 8.f
[2025-01-27 13:58:45] [config]   - 10000
[2025-01-27 13:58:45] [config]   - 1.f
[2025-01-27 13:58:45] [config]   - 8.f
[2025-01-27 13:58:45] [config] cost-type: ce-sum
[2025-01-27 13:58:45] [config] cpu-threads: 0
[2025-01-27 13:58:45] [config] data-threads: 8
[2025-01-27 13:58:45] [config] data-weighting: ""
[2025-01-27 13:58:45] [config] data-weighting-type: sentence
[2025-01-27 13:58:45] [config] dec-cell: gru
[2025-01-27 13:58:45] [config] dec-cell-base-depth: 2
[2025-01-27 13:58:45] [config] dec-cell-high-depth: 1
[2025-01-27 13:58:45] [config] dec-depth: 6
[2025-01-27 13:58:45] [config] devices:
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config]   - 2
[2025-01-27 13:58:45] [config]   - 3
[2025-01-27 13:58:45] [config] dim-emb: 1024
[2025-01-27 13:58:45] [config] dim-rnn: 1024
[2025-01-27 13:58:45] [config] dim-vocabs:
[2025-01-27 13:58:45] [config]   - 32000
[2025-01-27 13:58:45] [config]   - 32000
[2025-01-27 13:58:45] [config] disp-first: 10
[2025-01-27 13:58:45] [config] disp-freq: 50
[2025-01-27 13:58:45] [config] disp-label-counts: true
[2025-01-27 13:58:45] [config] dropout-rnn: 0
[2025-01-27 13:58:45] [config] dropout-src: 0
[2025-01-27 13:58:45] [config] dropout-trg: 0
[2025-01-27 13:58:45] [config] dump-config: ""
[2025-01-27 13:58:45] [config] dynamic-gradient-scaling:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] early-stopping: 20
[2025-01-27 13:58:45] [config] early-stopping-on: first
[2025-01-27 13:58:45] [config] embedding-fix-src: false
[2025-01-27 13:58:45] [config] embedding-fix-trg: false
[2025-01-27 13:58:45] [config] embedding-normalization: false
[2025-01-27 13:58:45] [config] embedding-vectors:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] enc-cell: gru
[2025-01-27 13:58:45] [config] enc-cell-depth: 1
[2025-01-27 13:58:45] [config] enc-depth: 6
[2025-01-27 13:58:45] [config] enc-type: bidirectional
[2025-01-27 13:58:45] [config] english-title-case-every: 0
[2025-01-27 13:58:45] [config] exponential-smoothing: 0.0001
[2025-01-27 13:58:45] [config] factor-weight: 1
[2025-01-27 13:58:45] [config] factors-combine: sum
[2025-01-27 13:58:45] [config] factors-dim-emb: 0
[2025-01-27 13:58:45] [config] gradient-checkpointing: false
[2025-01-27 13:58:45] [config] gradient-norm-average-window: 100
[2025-01-27 13:58:45] [config] guided-alignment: none
[2025-01-27 13:58:45] [config] guided-alignment-cost: ce
[2025-01-27 13:58:45] [config] guided-alignment-weight: 0.1
[2025-01-27 13:58:45] [config] ignore-model-config: false
[2025-01-27 13:58:45] [config] input-types:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] interpolate-env-vars: false
[2025-01-27 13:58:45] [config] keep-best: True
[2025-01-27 13:58:45] [config] label-smoothing: 0
[2025-01-27 13:58:45] [config] layer-normalization: false
[2025-01-27 13:58:45] [config] learn-rate: 1e-05
[2025-01-27 13:58:45] [config] lemma-dependency: ""
[2025-01-27 13:58:45] [config] lemma-dim-emb: 0
[2025-01-27 13:58:45] [config] log: teacher-finetuned/model1.npz.finetune.log
[2025-01-27 13:58:45] [config] log-level: info
[2025-01-27 13:58:45] [config] log-time-zone: ""
[2025-01-27 13:58:45] [config] logical-epoch:
[2025-01-27 13:58:45] [config]   - 1e
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config] lr-decay: 0.5
[2025-01-27 13:58:45] [config] lr-decay-freq: 50000
[2025-01-27 13:58:45] [config] lr-decay-inv-sqrt:
[2025-01-27 13:58:45] [config]   - 0
[2025-01-27 13:58:45] [config] lr-decay-repeat-warmup: false
[2025-01-27 13:58:45] [config] lr-decay-reset-optimizer: false
[2025-01-27 13:58:45] [config] lr-decay-start:
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config]   - 1
[2025-01-27 13:58:45] [config] lr-decay-strategy: stalled+epoch
[2025-01-27 13:58:45] [config] lr-report: True
[2025-01-27 13:58:45] [config] lr-warmup: 200
[2025-01-27 13:58:45] [config] lr-warmup-at-reload: false
[2025-01-27 13:58:45] [config] lr-warmup-cycle: false
[2025-01-27 13:58:45] [config] lr-warmup-start-rate: 0
[2025-01-27 13:58:45] [config] max-length: 300
[2025-01-27 13:58:45] [config] max-length-crop: false
[2025-01-27 13:58:45] [config] max-length-factor: 3
[2025-01-27 13:58:45] [config] maxi-batch: 100
[2025-01-27 13:58:45] [config] maxi-batch-sort: trg
[2025-01-27 13:58:45] [config] mini-batch: 64
[2025-01-27 13:58:45] [config] mini-batch-fit: True
[2025-01-27 13:58:45] [config] mini-batch-fit-step: 10
[2025-01-27 13:58:45] [config] mini-batch-round-up: true
[2025-01-27 13:58:45] [config] mini-batch-track-lr: false
[2025-01-27 13:58:45] [config] mini-batch-warmup: 0
[2025-01-27 13:58:45] [config] mini-batch-words: 0
[2025-01-27 13:58:45] [config] mini-batch-words-ref: 0
[2025-01-27 13:58:45] [config] model: teacher-finetuned/model1.npz
[2025-01-27 13:58:45] [config] multi-loss-type: sum
[2025-01-27 13:58:45] [config] n-best: false
[2025-01-27 13:58:45] [config] no-nccl: false
[2025-01-27 13:58:45] [config] no-reload: false
[2025-01-27 13:58:45] [config] no-restore-corpus: false
[2025-01-27 13:58:45] [config] normalize: 1.0
[2025-01-27 13:58:45] [config] normalize-gradient: false
[2025-01-27 13:58:45] [config] num-devices: 0
[2025-01-27 13:58:45] [config] optimizer: adam
[2025-01-27 13:58:45] [config] optimizer-delay: 1
[2025-01-27 13:58:45] [config] optimizer-params:
[2025-01-27 13:58:45] [config]   - 0.9
[2025-01-27 13:58:45] [config]   - 0.98
[2025-01-27 13:58:45] [config]   - 1e-09
[2025-01-27 13:58:45] [config] output-omit-bias: false
[2025-01-27 13:58:45] [config] overwrite: True
[2025-01-27 13:58:45] [config] precision:
[2025-01-27 13:58:45] [config]   - float16
[2025-01-27 13:58:45] [config]   - float32
[2025-01-27 13:58:45] [config] pretrained-model: ""
[2025-01-27 13:58:45] [config] quantize-biases: false
[2025-01-27 13:58:45] [config] quantize-bits: 0
[2025-01-27 13:58:45] [config] quantize-log-based: false
[2025-01-27 13:58:45] [config] quantize-optimization-steps: 0
[2025-01-27 13:58:45] [config] quiet: false
[2025-01-27 13:58:45] [config] quiet-translation: true
[2025-01-27 13:58:45] [config] relative-paths: false
[2025-01-27 13:58:45] [config] right-left: false
[2025-01-27 13:58:45] [config] save-freq: 200
[2025-01-27 13:58:45] [config] seed: 1111
[2025-01-27 13:58:45] [config] sentencepiece-alphas:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] sentencepiece-max-lines: 2000000
[2025-01-27 13:58:45] [config] sentencepiece-options: ""
[2025-01-27 13:58:45] [config] sharding: global
[2025-01-27 13:58:45] [config] shuffle: data
[2025-01-27 13:58:45] [config] shuffle-in-ram: true
[2025-01-27 13:58:45] [config] sigterm: save-and-exit
[2025-01-27 13:58:45] [config] skip: false
[2025-01-27 13:58:45] [config] sqlite: ""
[2025-01-27 13:58:45] [config] sqlite-drop: false
[2025-01-27 13:58:45] [config] sync-freq: 200u
[2025-01-27 13:58:45] [config] sync-sgd: true
[2025-01-27 13:58:45] [config] tempdir: /tmp
[2025-01-27 13:58:45] [config] tied-embeddings: false
[2025-01-27 13:58:45] [config] tied-embeddings-all: true
[2025-01-27 13:58:45] [config] tied-embeddings-src: false
[2025-01-27 13:58:45] [config] train-embedder-rank:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] train-sets:
[2025-01-27 13:58:45] [config]   - train_teacher.gz
[2025-01-27 13:58:45] [config] transformer-aan-activation: swish
[2025-01-27 13:58:45] [config] transformer-aan-depth: 2
[2025-01-27 13:58:45] [config] transformer-aan-nogate: false
[2025-01-27 13:58:45] [config] transformer-decoder-autoreg: self-attention
[2025-01-27 13:58:45] [config] transformer-decoder-dim-ffn: 0
[2025-01-27 13:58:45] [config] transformer-decoder-ffn-depth: 0
[2025-01-27 13:58:45] [config] transformer-depth-scaling: false
[2025-01-27 13:58:45] [config] transformer-dim-aan: 2048
[2025-01-27 13:58:45] [config] transformer-dim-ffn: 4096
[2025-01-27 13:58:45] [config] transformer-dropout: 0
[2025-01-27 13:58:45] [config] transformer-dropout-attention: 0
[2025-01-27 13:58:45] [config] transformer-dropout-ffn: 0
[2025-01-27 13:58:45] [config] transformer-ffn-activation: relu
[2025-01-27 13:58:45] [config] transformer-ffn-depth: 2
[2025-01-27 13:58:45] [config] transformer-guided-alignment-layer: last
[2025-01-27 13:58:45] [config] transformer-heads: 16
[2025-01-27 13:58:45] [config] transformer-no-projection: false
[2025-01-27 13:58:45] [config] transformer-pool: false
[2025-01-27 13:58:45] [config] transformer-postprocess: dan
[2025-01-27 13:58:45] [config] transformer-postprocess-emb: d
[2025-01-27 13:58:45] [config] transformer-postprocess-top: ""
[2025-01-27 13:58:45] [config] transformer-preprocess: ""
[2025-01-27 13:58:45] [config] transformer-rnn-projection: false
[2025-01-27 13:58:45] [config] transformer-tied-layers:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] transformer-train-position-embeddings: false
[2025-01-27 13:58:45] [config] tsv: true
[2025-01-27 13:58:45] [config] tsv-fields: 2
[2025-01-27 13:58:45] [config] type: transformer
[2025-01-27 13:58:45] [config] ulr: false
[2025-01-27 13:58:45] [config] ulr-dim-emb: 0
[2025-01-27 13:58:45] [config] ulr-dropout: 0
[2025-01-27 13:58:45] [config] ulr-keys-vectors: ""
[2025-01-27 13:58:45] [config] ulr-query-vectors: ""
[2025-01-27 13:58:45] [config] ulr-softmax-temperature: 1
[2025-01-27 13:58:45] [config] ulr-trainable-transformation: false
[2025-01-27 13:58:45] [config] unlikelihood-loss: false
[2025-01-27 13:58:45] [config] valid-freq: 100
[2025-01-27 13:58:45] [config] valid-log: ""
[2025-01-27 13:58:45] [config] valid-max-length: 1000
[2025-01-27 13:58:45] [config] valid-metrics:
[2025-01-27 13:58:45] [config]   - ce-mean-words
[2025-01-27 13:58:45] [config]   - chrf
[2025-01-27 13:58:45] [config]   - bleu-detok
[2025-01-27 13:58:45] [config] valid-mini-batch: 32
[2025-01-27 13:58:45] [config] valid-reset-all: false
[2025-01-27 13:58:45] [config] valid-reset-stalled: false
[2025-01-27 13:58:45] [config] valid-script-args:
[2025-01-27 13:58:45] [config]   []
[2025-01-27 13:58:45] [config] valid-script-path: ""
[2025-01-27 13:58:45] [config] valid-sets:
[2025-01-27 13:58:45] [config]   - dev.tsv
[2025-01-27 13:58:45] [config] valid-translation-output: ""
[2025-01-27 13:58:45] [config] version: v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] [config] vocabs:
[2025-01-27 13:58:45] [config]   - teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [config]   - teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [config] word-penalty: 0
[2025-01-27 13:58:45] [config] word-scores: false
[2025-01-27 13:58:45] [config] workspace: -8000
[2025-01-27 13:58:45] [config] Loaded model has been created with Marian v1.12.0 65bf82ff 2023-02-21 09:56:29 -0800
[2025-01-27 13:58:45] Using synchronous SGD
[2025-01-27 13:58:45] [comm] Compiled without MPI support. Running as a single process on a1e8062f4fc4
[2025-01-27 13:58:45] Synced seed 1111
[2025-01-27 13:58:45] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [data] Setting vocabulary size for input 0 to 32,000
[2025-01-27 13:58:45] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm
[2025-01-27 13:58:45] [data] Setting vocabulary size for input 1 to 32,000
[2025-01-27 13:58:45] [batching] Collecting statistics for batch fitting with step size 10
[2025-01-27 13:58:45] Training with cost scaling - factor: 8, frequency: 10000, multiplier: 1, minimum: 8
[2025-01-27 13:58:45] [memory] Extending reserved space to 16128 MB (device gpu0)
[2025-01-27 13:58:45] [memory] Extending reserved space to 16128 MB (device gpu1)
[2025-01-27 13:58:46] [memory] Extending reserved space to 16128 MB (device gpu2)
[2025-01-27 13:58:46] [memory] Extending reserved space to 16128 MB (device gpu3)
[2025-01-27 13:58:46] [comm] Using NCCL 2.8.3 for GPU communication
[2025-01-27 13:58:46] [comm] Using global sharding
[2025-01-27 13:58:46] [comm] NCCLCommunicators constructed successfully
[2025-01-27 13:58:46] [training] Using 4 GPUs
[2025-01-27 13:58:46] [logits] Applying loss function for 1 factor(s)
[2025-01-27 13:58:46] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:58:46] [gpu] 16-bit TensorCores enabled for float32 matrix operations
[2025-01-27 13:58:46] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:19] [batching] Done. Typical MB size is 62,372 target words
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu0)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu1)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu2)
[2025-01-27 13:59:19] [memory] Extending reserved space to 16128 MB (device gpu3)
[2025-01-27 13:59:19] [comm] Using NCCL 2.8.3 for GPU communication
[2025-01-27 13:59:19] [comm] Using global sharding
[2025-01-27 13:59:19] [comm] NCCLCommunicators constructed successfully
[2025-01-27 13:59:19] [training] Using 4 GPUs
[2025-01-27 13:59:19] Loading model from teacher-finetuned/model1.npz
[2025-01-27 13:59:26] Allocating memory for general optimizer shards
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu0
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu1
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu2
[2025-01-27 13:59:26] [memory] Reserving 598 MB, device gpu3
[2025-01-27 13:59:27] Loading Adam parameters
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:27] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:29] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:29] [training] Master parameters and optimizers restored from training checkpoint teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz
[2025-01-27 13:59:29] [data] Restoring the corpus state to epoch 3, batch 5200
[2025-01-27 13:59:29] [data] Shuffling data
[2025-01-27 13:59:31] [data] Done reading 1,535,277 sentences
[2025-01-27 13:59:31] [data] Done shuffling 1,535,277 sentences (cached in RAM)
[2025-01-27 13:59:38] Training started
[2025-01-27 13:59:38] [training] Batches are processed as 1 process(es) x 4 devices/process
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu0
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu1
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu2
[2025-01-27 13:59:38] [memory] Reserving 398 MB, device gpu3
[2025-01-27 13:59:39] Parameter type float16, optimization type float32, casting types true
[2025-01-27 14:00:07] Ep. 3 : Up. 5250 : Sen. 212,322 : Cost 0.68544847 * 885,744 @ 25,279 after 94,894,463 : Time 47.93s : 18480.48 words/s : gNorm 0.6284 : L.r. 2.5000e-06
[2025-01-27 14:00:35] Ep. 3 : Up. 5300 : Sen. 256,577 : Cost 0.68347019 * 920,439 @ 29,186 after 95,814,902 : Time 28.20s : 32642.03 words/s : gNorm 0.7331 : L.r. 2.5000e-06
[2025-01-27 14:00:35] [valid] Ep. 3 : Up. 5300 : ce-mean-words : 0.813648 : stalled 1 times (last best: 0.813412)
[2025-01-27 14:00:37] [valid] Ep. 3 : Up. 5300 : chrf : 66.2634 : stalled 1 times (last best: 66.3262)
[2025-01-27 14:00:39] [valid] Ep. 3 : Up. 5300 : bleu-detok : 41.2837 : stalled 1 times (last best: 41.4621)
[2025-01-27 14:01:07] Ep. 3 : Up. 5350 : Sen. 299,352 : Cost 0.68636936 * 881,133 @ 23,996 after 96,696,035 : Time 32.34s : 27242.83 words/s : gNorm 0.8900 : L.r. 2.5000e-06
[2025-01-27 14:01:25] Seen 326,239 samples
[2025-01-27 14:01:25] Starting data epoch 4 in logical epoch 4
[2025-01-27 14:01:25] [data] Shuffling data
[2025-01-27 14:01:25] [data] Done shuffling 1,535,277 sentences (cached in RAM)
[2025-01-27 14:01:36] Ep. 4 : Up. 5400 : Sen. 14,270 : Cost 0.69162977 * 890,340 @ 25,287 after 97,586,375 : Time 28.43s : 31317.39 words/s : gNorm 0.7633 : L.r. 2.5000e-06
[2025-01-27 14:01:36] Saving model weights and runtime parameters to teacher-finetuned/model1.npz
[2025-01-27 14:01:38] Saving Adam parameters
[2025-01-27 14:01:40] [training] Saving training checkpoint to teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz
Training logs after one stalled validation with 'stalled' strategy ``` [2025-01-27 14:02:05] [marian] Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [marian] Running on a1e8062f4fc4 as process 40103 with command line: [2025-01-27 14:02:05] [marian] marian -c teacher.finetune.yml -c teacher-finetuned/train.yml -m teacher-finetuned/model1.npz -v teacher-finetuned/vocab.spm teacher-finetuned/vocab.spm --tsv -t train_teacher.gz --valid-sets dev.tsv --seed 1111 --log teacher-finetuned/model1.npz.finetune.log --learn-rate 1e-05 --lr-decay-inv-sqrt 0 --lr-decay 0.5 --lr-decay-strategy stalled --lr-decay-start 1 --valid-freq 100 [2025-01-27 14:02:05] [config] after: 30e [2025-01-27 14:02:05] [config] after-batches: 0 [2025-01-27 14:02:05] [config] after-epochs: 0 [2025-01-27 14:02:05] [config] all-caps-every: 0 [2025-01-27 14:02:05] [config] allow-unk: false [2025-01-27 14:02:05] [config] authors: false [2025-01-27 14:02:05] [config] beam-size: 4 [2025-01-27 14:02:05] [config] bert-class-symbol: "[CLS]" [2025-01-27 14:02:05] [config] bert-mask-symbol: "[MASK]" [2025-01-27 14:02:05] [config] bert-masking-fraction: 0.15 [2025-01-27 14:02:05] [config] bert-sep-symbol: "[SEP]" [2025-01-27 14:02:05] [config] bert-train-type-embeddings: true [2025-01-27 14:02:05] [config] bert-type-vocab-size: 2 [2025-01-27 14:02:05] [config] build-info: "" [2025-01-27 14:02:05] [config] check-gradient-nan: false [2025-01-27 14:02:05] [config] check-nan: false [2025-01-27 14:02:05] [config] cite: false [2025-01-27 14:02:05] [config] clip-norm: 0 [2025-01-27 14:02:05] [config] cost-scaling: [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] - 10000 [2025-01-27 14:02:05] [config] - 1.f [2025-01-27 14:02:05] [config] - 8.f [2025-01-27 14:02:05] [config] cost-type: ce-sum [2025-01-27 14:02:05] [config] cpu-threads: 0 [2025-01-27 14:02:05] [config] data-threads: 8 [2025-01-27 14:02:05] [config] data-weighting: "" [2025-01-27 14:02:05] [config] data-weighting-type: sentence [2025-01-27 14:02:05] [config] dec-cell: gru [2025-01-27 14:02:05] [config] dec-cell-base-depth: 2 [2025-01-27 14:02:05] [config] dec-cell-high-depth: 1 [2025-01-27 14:02:05] [config] dec-depth: 6 [2025-01-27 14:02:05] [config] devices: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] - 2 [2025-01-27 14:02:05] [config] - 3 [2025-01-27 14:02:05] [config] dim-emb: 1024 [2025-01-27 14:02:05] [config] dim-rnn: 1024 [2025-01-27 14:02:05] [config] dim-vocabs: [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] - 32000 [2025-01-27 14:02:05] [config] disp-first: 10 [2025-01-27 14:02:05] [config] disp-freq: 50 [2025-01-27 14:02:05] [config] disp-label-counts: true [2025-01-27 14:02:05] [config] dropout-rnn: 0 [2025-01-27 14:02:05] [config] dropout-src: 0 [2025-01-27 14:02:05] [config] dropout-trg: 0 [2025-01-27 14:02:05] [config] dump-config: "" [2025-01-27 14:02:05] [config] dynamic-gradient-scaling: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] early-stopping: 20 [2025-01-27 14:02:05] [config] early-stopping-on: first [2025-01-27 14:02:05] [config] embedding-fix-src: false [2025-01-27 14:02:05] [config] embedding-fix-trg: false [2025-01-27 14:02:05] [config] embedding-normalization: false [2025-01-27 14:02:05] [config] embedding-vectors: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] enc-cell: gru [2025-01-27 14:02:05] [config] enc-cell-depth: 1 [2025-01-27 14:02:05] [config] enc-depth: 6 [2025-01-27 14:02:05] [config] enc-type: bidirectional [2025-01-27 14:02:05] [config] english-title-case-every: 0 [2025-01-27 14:02:05] [config] exponential-smoothing: 0.0001 [2025-01-27 14:02:05] [config] factor-weight: 1 [2025-01-27 14:02:05] [config] factors-combine: sum [2025-01-27 14:02:05] [config] factors-dim-emb: 0 [2025-01-27 14:02:05] [config] gradient-checkpointing: false [2025-01-27 14:02:05] [config] gradient-norm-average-window: 100 [2025-01-27 14:02:05] [config] guided-alignment: none [2025-01-27 14:02:05] [config] guided-alignment-cost: ce [2025-01-27 14:02:05] [config] guided-alignment-weight: 0.1 [2025-01-27 14:02:05] [config] ignore-model-config: false [2025-01-27 14:02:05] [config] input-types: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] interpolate-env-vars: false [2025-01-27 14:02:05] [config] keep-best: True [2025-01-27 14:02:05] [config] label-smoothing: 0 [2025-01-27 14:02:05] [config] layer-normalization: false [2025-01-27 14:02:05] [config] learn-rate: 1e-05 [2025-01-27 14:02:05] [config] lemma-dependency: "" [2025-01-27 14:02:05] [config] lemma-dim-emb: 0 [2025-01-27 14:02:05] [config] log: teacher-finetuned/model1.npz.finetune.log [2025-01-27 14:02:05] [config] log-level: info [2025-01-27 14:02:05] [config] log-time-zone: "" [2025-01-27 14:02:05] [config] logical-epoch: [2025-01-27 14:02:05] [config] - 1e [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay: 0.5 [2025-01-27 14:02:05] [config] lr-decay-freq: 50000 [2025-01-27 14:02:05] [config] lr-decay-inv-sqrt: [2025-01-27 14:02:05] [config] - 0 [2025-01-27 14:02:05] [config] lr-decay-repeat-warmup: false [2025-01-27 14:02:05] [config] lr-decay-reset-optimizer: false [2025-01-27 14:02:05] [config] lr-decay-start: [2025-01-27 14:02:05] [config] - 1 [2025-01-27 14:02:05] [config] lr-decay-strategy: stalled [2025-01-27 14:02:05] [config] lr-report: True [2025-01-27 14:02:05] [config] lr-warmup: 200 [2025-01-27 14:02:05] [config] lr-warmup-at-reload: false [2025-01-27 14:02:05] [config] lr-warmup-cycle: false [2025-01-27 14:02:05] [config] lr-warmup-start-rate: 0 [2025-01-27 14:02:05] [config] max-length: 300 [2025-01-27 14:02:05] [config] max-length-crop: false [2025-01-27 14:02:05] [config] max-length-factor: 3 [2025-01-27 14:02:05] [config] maxi-batch: 100 [2025-01-27 14:02:05] [config] maxi-batch-sort: trg [2025-01-27 14:02:05] [config] mini-batch: 64 [2025-01-27 14:02:05] [config] mini-batch-fit: True [2025-01-27 14:02:05] [config] mini-batch-fit-step: 10 [2025-01-27 14:02:05] [config] mini-batch-round-up: true [2025-01-27 14:02:05] [config] mini-batch-track-lr: false [2025-01-27 14:02:05] [config] mini-batch-warmup: 0 [2025-01-27 14:02:05] [config] mini-batch-words: 0 [2025-01-27 14:02:05] [config] mini-batch-words-ref: 0 [2025-01-27 14:02:05] [config] model: teacher-finetuned/model1.npz [2025-01-27 14:02:05] [config] multi-loss-type: sum [2025-01-27 14:02:05] [config] n-best: false [2025-01-27 14:02:05] [config] no-nccl: false [2025-01-27 14:02:05] [config] no-reload: false [2025-01-27 14:02:05] [config] no-restore-corpus: false [2025-01-27 14:02:05] [config] normalize: 1.0 [2025-01-27 14:02:05] [config] normalize-gradient: false [2025-01-27 14:02:05] [config] num-devices: 0 [2025-01-27 14:02:05] [config] optimizer: adam [2025-01-27 14:02:05] [config] optimizer-delay: 1 [2025-01-27 14:02:05] [config] optimizer-params: [2025-01-27 14:02:05] [config] - 0.9 [2025-01-27 14:02:05] [config] - 0.98 [2025-01-27 14:02:05] [config] - 1e-09 [2025-01-27 14:02:05] [config] output-omit-bias: false [2025-01-27 14:02:05] [config] overwrite: True [2025-01-27 14:02:05] [config] precision: [2025-01-27 14:02:05] [config] - float16 [2025-01-27 14:02:05] [config] - float32 [2025-01-27 14:02:05] [config] pretrained-model: "" [2025-01-27 14:02:05] [config] quantize-biases: false [2025-01-27 14:02:05] [config] quantize-bits: 0 [2025-01-27 14:02:05] [config] quantize-log-based: false [2025-01-27 14:02:05] [config] quantize-optimization-steps: 0 [2025-01-27 14:02:05] [config] quiet: false [2025-01-27 14:02:05] [config] quiet-translation: true [2025-01-27 14:02:05] [config] relative-paths: false [2025-01-27 14:02:05] [config] right-left: false [2025-01-27 14:02:05] [config] save-freq: 200 [2025-01-27 14:02:05] [config] seed: 1111 [2025-01-27 14:02:05] [config] sentencepiece-alphas: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] sentencepiece-max-lines: 2000000 [2025-01-27 14:02:05] [config] sentencepiece-options: "" [2025-01-27 14:02:05] [config] sharding: global [2025-01-27 14:02:05] [config] shuffle: data [2025-01-27 14:02:05] [config] shuffle-in-ram: true [2025-01-27 14:02:05] [config] sigterm: save-and-exit [2025-01-27 14:02:05] [config] skip: false [2025-01-27 14:02:05] [config] sqlite: "" [2025-01-27 14:02:05] [config] sqlite-drop: false [2025-01-27 14:02:05] [config] sync-freq: 200u [2025-01-27 14:02:05] [config] sync-sgd: true [2025-01-27 14:02:05] [config] tempdir: /tmp [2025-01-27 14:02:05] [config] tied-embeddings: false [2025-01-27 14:02:05] [config] tied-embeddings-all: true [2025-01-27 14:02:05] [config] tied-embeddings-src: false [2025-01-27 14:02:05] [config] train-embedder-rank: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] train-sets: [2025-01-27 14:02:05] [config] - train_teacher.gz [2025-01-27 14:02:05] [config] transformer-aan-activation: swish [2025-01-27 14:02:05] [config] transformer-aan-depth: 2 [2025-01-27 14:02:05] [config] transformer-aan-nogate: false [2025-01-27 14:02:05] [config] transformer-decoder-autoreg: self-attention [2025-01-27 14:02:05] [config] transformer-decoder-dim-ffn: 0 [2025-01-27 14:02:05] [config] transformer-decoder-ffn-depth: 0 [2025-01-27 14:02:05] [config] transformer-depth-scaling: false [2025-01-27 14:02:05] [config] transformer-dim-aan: 2048 [2025-01-27 14:02:05] [config] transformer-dim-ffn: 4096 [2025-01-27 14:02:05] [config] transformer-dropout: 0 [2025-01-27 14:02:05] [config] transformer-dropout-attention: 0 [2025-01-27 14:02:05] [config] transformer-dropout-ffn: 0 [2025-01-27 14:02:05] [config] transformer-ffn-activation: relu [2025-01-27 14:02:05] [config] transformer-ffn-depth: 2 [2025-01-27 14:02:05] [config] transformer-guided-alignment-layer: last [2025-01-27 14:02:05] [config] transformer-heads: 16 [2025-01-27 14:02:05] [config] transformer-no-projection: false [2025-01-27 14:02:05] [config] transformer-pool: false [2025-01-27 14:02:05] [config] transformer-postprocess: dan [2025-01-27 14:02:05] [config] transformer-postprocess-emb: d [2025-01-27 14:02:05] [config] transformer-postprocess-top: "" [2025-01-27 14:02:05] [config] transformer-preprocess: "" [2025-01-27 14:02:05] [config] transformer-rnn-projection: false [2025-01-27 14:02:05] [config] transformer-tied-layers: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] transformer-train-position-embeddings: false [2025-01-27 14:02:05] [config] tsv: true [2025-01-27 14:02:05] [config] tsv-fields: 2 [2025-01-27 14:02:05] [config] type: transformer [2025-01-27 14:02:05] [config] ulr: false [2025-01-27 14:02:05] [config] ulr-dim-emb: 0 [2025-01-27 14:02:05] [config] ulr-dropout: 0 [2025-01-27 14:02:05] [config] ulr-keys-vectors: "" [2025-01-27 14:02:05] [config] ulr-query-vectors: "" [2025-01-27 14:02:05] [config] ulr-softmax-temperature: 1 [2025-01-27 14:02:05] [config] ulr-trainable-transformation: false [2025-01-27 14:02:05] [config] unlikelihood-loss: false [2025-01-27 14:02:05] [config] valid-freq: 100 [2025-01-27 14:02:05] [config] valid-log: "" [2025-01-27 14:02:05] [config] valid-max-length: 1000 [2025-01-27 14:02:05] [config] valid-metrics: [2025-01-27 14:02:05] [config] - ce-mean-words [2025-01-27 14:02:05] [config] - chrf [2025-01-27 14:02:05] [config] - bleu-detok [2025-01-27 14:02:05] [config] valid-mini-batch: 32 [2025-01-27 14:02:05] [config] valid-reset-all: false [2025-01-27 14:02:05] [config] valid-reset-stalled: false [2025-01-27 14:02:05] [config] valid-script-args: [2025-01-27 14:02:05] [config] [] [2025-01-27 14:02:05] [config] valid-script-path: "" [2025-01-27 14:02:05] [config] valid-sets: [2025-01-27 14:02:05] [config] - dev.tsv [2025-01-27 14:02:05] [config] valid-translation-output: "" [2025-01-27 14:02:05] [config] version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] [config] vocabs: [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] - teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [config] word-penalty: 0 [2025-01-27 14:02:05] [config] word-scores: false [2025-01-27 14:02:05] [config] workspace: -8000 [2025-01-27 14:02:05] [config] Loaded model has been created with Marian v1.12.0 65bf82f 2023-02-21 09:56:29 -0800 [2025-01-27 14:02:05] Using synchronous SGD [2025-01-27 14:02:05] [comm] Compiled without MPI support. Running as a single process on a1e8062f4fc4 [2025-01-27 14:02:05] Synced seed 1111 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 0 to 32,000 [2025-01-27 14:02:05] [data] Loading SentencePiece vocabulary from file teacher-finetuned/vocab.spm [2025-01-27 14:02:05] [data] Setting vocabulary size for input 1 to 32,000 [2025-01-27 14:02:05] [batching] Collecting statistics for batch fitting with step size 10 [2025-01-27 14:02:05] Training with cost scaling - factor: 8, frequency: 10000, multiplier: 1, minimum: 8 [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:06] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:06] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:06] [comm] Using global sharding [2025-01-27 14:02:07] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:07] [training] Using 4 GPUs [2025-01-27 14:02:07] [logits] Applying loss function for 1 factor(s) [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:07] [gpu] 16-bit TensorCores enabled for float32 matrix operations [2025-01-27 14:02:07] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:39] [batching] Done. Typical MB size is 62,372 target words [2025-01-27 14:02:39] [memory] Extending reserved space to 16128 MB (device gpu0) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu1) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu2) [2025-01-27 14:02:40] [memory] Extending reserved space to 16128 MB (device gpu3) [2025-01-27 14:02:40] [comm] Using NCCL 2.8.3 for GPU communication [2025-01-27 14:02:40] [comm] Using global sharding [2025-01-27 14:02:40] [comm] NCCLCommunicators constructed successfully [2025-01-27 14:02:40] [training] Using 4 GPUs [2025-01-27 14:02:40] Loading model from teacher-finetuned/model1.npz [2025-01-27 14:02:47] Allocating memory for general optimizer shards [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu2 [2025-01-27 14:02:47] [memory] Reserving 598 MB, device gpu3 [2025-01-27 14:02:47] Loading Adam parameters [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:47] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:48] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:49] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:50] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:50] [training] Master parameters and optimizers restored from training checkpoint teacher-finetuned/model1.npz and teacher-finetuned/model1.npz.optimizer.npz [2025-01-27 14:02:50] [data] Restoring the corpus state to epoch 4, batch 5400 [2025-01-27 14:02:50] [data] Shuffling data [2025-01-27 14:02:51] [data] Done reading 1,535,277 sentences [2025-01-27 14:02:51] [data] Done shuffling 1,535,277 sentences (cached in RAM) [2025-01-27 14:02:51] Training started [2025-01-27 14:02:51] [training] Batches are processed as 1 process(es) x 4 devices/process [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu0 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu3 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu1 [2025-01-27 14:02:52] [memory] Reserving 398 MB, device gpu2 [2025-01-27 14:02:52] Parameter type float16, optimization type float32, casting types true [2025-01-27 14:03:20] Ep. 4 : Up. 5450 : Sen. 57,593 : Cost 0.69126970 * 921,566 @ 25,911 after 98,507,941 : Time 40.64s : 22676.10 words/s : gNorm 0.6406 : L.r. 2.5000e-06 [2025-01-27 14:03:48] Ep. 4 : Up. 5500 : Sen. 100,288 : Cost 0.67408943 * 885,373 @ 16,279 after 99,393,314 : Time 28.13s : 31471.62 words/s : gNorm 0.6485 : L.r. 2.5000e-06 [2025-01-27 14:03:48] [valid] Ep. 4 : Up. 5500 : ce-mean-words : 0.813664 : stalled 2 times (last best: 0.813412) [2025-01-27 14:03:50] [valid] Ep. 4 : Up. 5500 : chrf : 66.2683 : stalled 2 times (last best: 66.3262) [2025-01-27 14:03:52] [valid] Ep. 4 : Up. 5500 : bleu-detok : 41.3057 : stalled 2 times (last best: 41.4621) [2025-01-27 14:03:52] Decaying learning rate to 1.25e-06 after having stalled 2 time(s) ```
@ZJaume ZJaume added the bug label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant