forked from CGCL-codes/naturalcc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpython.log
241 lines (241 loc) · 45.2 KB
/
python.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
nohup: ignoring input
[2021-03-22 15:51:32] INFO >> Load arguments in /home/wanyao/yang/naturalcc-dev/run/completion/gpt2/config/raw_py150/python.yml (train.py:302, cli_main())
[2021-03-22 15:51:32] INFO >> {'criterion': 'completion_cross_entropy', 'optimizer': 'torch_adam', 'lr_scheduler': 'fixed', 'tokenizer': None, 'bpe': None, 'common': {'no_progress_bar': 0, 'log_interval': 500, 'log_format': 'simple', 'tensorboard_logdir': '', 'memory_efficient_fp16': 0, 'fp16_no_flatten_grads': 0, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'empty_cache_freq': 0, 'task': 'completion', 'seed': 666, 'cpu': 0, 'fp16': 0, 'fp16_opt_level': '01', 'server_ip': '', 'server_port': ''}, 'dataset': {'num_workers': 3, 'skip_invalid_size_inputs_valid_test': 0, 'max_tokens': 100000.0, 'max_sentences': 32, 'required_batch_size_multiple': 1, 'dataset_impl': 'mmap', 'train_subset': 'train', 'valid_subset': 'test', 'validate_interval': 1, 'fixed_validation_seed': None, 'disable_validation': 0, 'max_tokens_valid': None, 'max_sentences_valid': 64, 'curriculum': 5, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'distributed_training': {'distributed_world_size': 4, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': 0, 'ddp_backend': 'no_c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': None, 'find_unused_parameters': 0, 'fast_stat_sync': 0, 'broadcast_buffers': 0, 'global_sync_iter': 50, 'warmup_iterations': 500, 'local_rank': -1}, 'task': {'data': '/home/wanyao/.ncc/raw_py150/completion/data-mmap', 'target_lang': 'code_tokens', 'ext': 'code_tokens.ext', 'code_types': ['attr', 'num', 'name', 'param'], 'max_target_positions': 500, 'add_bos_token': 0, 'eval_bleu': 0, 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': None, 'eval_tokenized_bleu': 0, 'eval_bleu_remove_bpe': None, 'eval_bleu_args': None, 'eval_bleu_print_samples': 0, 'eval_mrr': 1}, 'model': {'arch': 'completion_gpt2', 'dropout': 0.5, 'decoder_embed_dim': 300, 'decoder_hidden_size': 300, 'decoder_layers': 6, 'decoder_attention_heads': 6, 'max_target_positions': 500, 'activation_fn': 'gelu', 'decoder_ffn_embed_dim': 1200}, 'optimization': {'max_epoch': 100, 'max_update': 0, 'clip_norm': 25, 'update_freq': [1], 'lrs': [0.001], 'min_lr': -1, 'use_bmuf': 0, 'force_anneal': None, 'warmup_updates': 0, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 1000000, 'sentence_avg': None, 'adam': {'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': 0}, 'weight_decay': 0.0, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 5, 'max_steps': -1, 'warmup_steps': 0, 'gradient_accumulation_steps': 1}, 'checkpoint': {'save_dir': '/home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints', 'restore_file': 'checkpoint_best.pt', 'reset_dataloader': None, 'reset_lr_scheduler': None, 'reset_meters': None, 'reset_optimizer': None, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': 0, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': 0, 'no_epoch_checkpoints': 1, 'no_last_checkpoints': 0, 'no_save_optimizer_state': None, 'best_checkpoint_metric': 'mrr', 'maximize_best_checkpoint_metric': 1, 'patience': 5, 'should_continue': 0, 'model_name_or_path': None, 'cache_dir': None, 'logging_steps': 500, 'save_steps': 2000, 'save_total_limit': 2, 'overwrite_output_dir': 0, 'overwrite_cache': 0}, 'eval': {'path': '/home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt', 'model_overrides': '{}', 'checkpoint_suffix': '', 'max_sentences_eval': 64}} (train.py:304, cli_main())
[2021-03-22 15:51:34] INFO >> distributed init (rank 2): tcp://localhost:16310 (distributed_utils.py:89, distributed_init())
[2021-03-22 15:51:34] INFO >> distributed init (rank 3): tcp://localhost:16310 (distributed_utils.py:89, distributed_init())
[2021-03-22 15:51:34] INFO >> distributed init (rank 0): tcp://localhost:16310 (distributed_utils.py:89, distributed_init())
[2021-03-22 15:51:34] INFO >> distributed init (rank 1): tcp://localhost:16310 (distributed_utils.py:89, distributed_init())
[2021-03-22 15:51:40] INFO >> initialized host node13 as rank 2 (distributed_utils.py:98, distributed_init())
[2021-03-22 15:51:40] INFO >> initialized host node13 as rank 3 (distributed_utils.py:98, distributed_init())
[2021-03-22 15:51:40] INFO >> initialized host node13 as rank 1 (distributed_utils.py:98, distributed_init())
[2021-03-22 15:51:40] INFO >> initialized host node13 as rank 0 (distributed_utils.py:98, distributed_init())
[2021-03-22 15:51:40] INFO >> {'criterion': 'completion_cross_entropy', 'optimizer': 'torch_adam', 'lr_scheduler': 'fixed', 'tokenizer': None, 'bpe': None, 'common': {'no_progress_bar': 0, 'log_interval': 500, 'log_format': 'simple', 'tensorboard_logdir': '', 'memory_efficient_fp16': 0, 'fp16_no_flatten_grads': 0, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'empty_cache_freq': 0, 'task': 'completion', 'seed': 666, 'cpu': 0, 'fp16': 0, 'fp16_opt_level': '01', 'server_ip': '', 'server_port': ''}, 'dataset': {'num_workers': 3, 'skip_invalid_size_inputs_valid_test': 0, 'max_tokens': 100000.0, 'max_sentences': 32, 'required_batch_size_multiple': 1, 'dataset_impl': 'mmap', 'train_subset': 'train', 'valid_subset': 'test', 'validate_interval': 1, 'fixed_validation_seed': None, 'disable_validation': 0, 'max_tokens_valid': None, 'max_sentences_valid': 64, 'curriculum': 5, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'distributed_training': {'distributed_world_size': 4, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'tcp://localhost:16310', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': 0, 'ddp_backend': 'no_c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': None, 'find_unused_parameters': 0, 'fast_stat_sync': 0, 'broadcast_buffers': 0, 'global_sync_iter': 50, 'warmup_iterations': 500, 'local_rank': -1}, 'task': {'data': '/home/wanyao/.ncc/raw_py150/completion/data-mmap', 'target_lang': 'code_tokens', 'ext': 'code_tokens.ext', 'code_types': ['attr', 'num', 'name', 'param'], 'max_target_positions': 500, 'add_bos_token': 0, 'eval_bleu': 0, 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': None, 'eval_tokenized_bleu': 0, 'eval_bleu_remove_bpe': None, 'eval_bleu_args': None, 'eval_bleu_print_samples': 0, 'eval_mrr': 1}, 'model': {'arch': 'completion_gpt2', 'dropout': 0.5, 'decoder_embed_dim': 300, 'decoder_hidden_size': 300, 'decoder_layers': 6, 'decoder_attention_heads': 6, 'max_target_positions': 500, 'activation_fn': 'gelu', 'decoder_ffn_embed_dim': 1200}, 'optimization': {'max_epoch': 100, 'max_update': 0, 'clip_norm': 25, 'update_freq': [1], 'lrs': [0.001], 'min_lr': -1, 'use_bmuf': 0, 'force_anneal': None, 'warmup_updates': 0, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 1000000, 'sentence_avg': None, 'adam': {'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': 0}, 'weight_decay': 0.0, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 5, 'max_steps': -1, 'warmup_steps': 0, 'gradient_accumulation_steps': 1}, 'checkpoint': {'save_dir': '/home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints', 'restore_file': 'checkpoint_best.pt', 'reset_dataloader': None, 'reset_lr_scheduler': None, 'reset_meters': None, 'reset_optimizer': None, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': 0, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': 0, 'no_epoch_checkpoints': 1, 'no_last_checkpoints': 0, 'no_save_optimizer_state': None, 'best_checkpoint_metric': 'mrr', 'maximize_best_checkpoint_metric': 1, 'patience': 5, 'should_continue': 0, 'model_name_or_path': None, 'cache_dir': None, 'logging_steps': 500, 'save_steps': 2000, 'save_total_limit': 2, 'overwrite_output_dir': 0, 'overwrite_cache': 0}, 'eval': {'path': '/home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt', 'model_overrides': '{}', 'checkpoint_suffix': '', 'max_sentences_eval': 64}} (train.py:212, single_main())
[2021-03-22 15:51:40] INFO >> [code_tokens] dictionary: 50000 types (completion.py:97, setup_task())
[2021-03-22 15:51:40] INFO >> [code_tokens] dictionary: 19 types (completion.py:101, setup_task())
[2021-03-22 15:51:40] INFO >> loaded 122521 examples from: /home/wanyao/.ncc/raw_py150/completion/data-mmap/test.code_tokens (completion.py:44, load_token_dataset())
[2021-03-22 15:51:40] INFO >> loaded 122521 examples from: /home/wanyao/.ncc/raw_py150/completion/data-mmap/test.code_types (completion.py:58, load_token_dataset())
[2021-03-22 15:51:41] INFO >> GPT2(
(decoder): TransformerDecoder(
(embed_tokens): Embedding(50000, 300, padding_idx=0)
(layers): ModuleList(
(0): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
(1): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
(2): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
(3): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
(4): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
(5): TransformerDecoderLayer(
(in_layer_norm): LayerNorm()
(attention): MultiheadAttention(
(k_proj): Linear(in_features=300, out_features=300, bias=True)
(v_proj): Linear(in_features=300, out_features=300, bias=True)
(q_proj): Linear(in_features=300, out_features=300, bias=True)
(out_proj): Linear(in_features=300, out_features=300, bias=True)
)
(ff_layer_norm): LayerNorm()
(fc1): Linear(in_features=300, out_features=1200, bias=True)
(fc2): Linear(in_features=1200, out_features=300, bias=True)
)
)
(out_layer_norm): LayerNorm()
)
) (train.py:223, single_main())
[2021-03-22 15:51:41] INFO >> model completion_gpt2, criterion CompletionCrossEntropyCriterion (train.py:224, single_main())
[2021-03-22 15:51:41] INFO >> num. model params: 22998006 (num. trained: 22998006) (train.py:227, single_main())
[2021-03-22 15:51:41] INFO >> training on 4 GPUs (train.py:232, single_main())
[2021-03-22 15:51:41] INFO >> max tokens per GPU = 100000.0 and max sentences per GPU = 32 (train.py:235, single_main())
[2021-03-22 15:51:41] INFO >> no existing checkpoint found checkpoint_best.pt (ncc_trainer.py:269, load_checkpoint())
[2021-03-22 15:51:41] INFO >> loading train data for epoch 1 (ncc_trainer.py:283, get_train_iterator())
[2021-03-22 15:51:41] INFO >> loaded 253013 examples from: /home/wanyao/.ncc/raw_py150/completion/data-mmap/train.code_tokens (completion.py:44, load_token_dataset())
[2021-03-22 15:51:42] INFO >> loaded 253013 examples from: /home/wanyao/.ncc/raw_py150/completion/data-mmap/train.code_types (completion.py:58, load_token_dataset())
[2021-03-22 15:51:43] INFO >> NOTE: your device may support faster training with fp16 (ncc_trainer.py:154, _setup_optimizer())
/home/wanyao/yang/naturalcc-dev/ncc/utils/utils.py:575: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
[2021-03-22 15:55:23] INFO >> epoch 001: 500 / 1973 loss=4.76, accuracy=0, mrr=0, ppl=27.1, wps=73158.4, ups=2.36, wpb=31045.4, bsz=128, num_updates=500, lr=0.001, gnorm=0.734, clip=0, train_wall=212, wall=222 (progress_bar.py:262, log())
[2021-03-22 15:58:57] INFO >> epoch 001: 1000 / 1973 loss=3.543, accuracy=0, mrr=0, ppl=11.66, wps=72877.1, ups=2.35, wpb=31055.9, bsz=128, num_updates=1000, lr=0.001, gnorm=0.813, clip=0, train_wall=211, wall=435 (progress_bar.py:262, log())
[2021-03-22 16:02:30] INFO >> epoch 001: 1500 / 1973 loss=2.474, accuracy=0, mrr=0, ppl=5.56, wps=72611.1, ups=2.34, wpb=30979.9, bsz=128, num_updates=1500, lr=0.001, gnorm=0.795, clip=0, train_wall=212, wall=649 (progress_bar.py:262, log())
[2021-03-22 16:05:53] INFO >> epoch 001 | loss 3.286 | accuracy 0 | mrr 0 | ppl 9.75 | wps 72724.9 | ups 2.34 | wpb 31027.9 | bsz 128 | num_updates 1973 | lr 0.001 | gnorm 0.801 | clip 0 | train_wall 836 | wall 852 (progress_bar.py:269, print())
/home/wanyao/yang/naturalcc-dev/ncc/utils/utils.py:575: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/home/wanyao/yang/naturalcc-dev/ncc/utils/utils.py:575: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/home/wanyao/yang/naturalcc-dev/ncc/utils/utils.py:575: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
[2021-03-22 16:09:37] INFO >> epoch 001 | valid on 'test' subset | loss 1.933 | accuracy 0.622153 | mrr 0.692209 | ppl 3.82 | wps 136749 | wpb 61759.1 | bsz 255.8 | num_updates 1973 (progress_bar.py:269, print())
[2021-03-22 16:09:39] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 1 @ 1973 updates, score 0.692209) (writing took 1.737648 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 16:09:58] INFO >> epoch 002: 27 / 1973 loss=2.291, accuracy=0, mrr=0, ppl=4.89, wps=34623.9, ups=1.12, wpb=31027.8, bsz=127.9, num_updates=2000, lr=0.001, gnorm=0.853, clip=0, train_wall=213, wall=1097 (progress_bar.py:262, log())
[2021-03-22 16:13:32] INFO >> epoch 002: 527 / 1973 loss=1.829, accuracy=0, mrr=0, ppl=3.55, wps=72346.1, ups=2.33, wpb=31033.1, bsz=128, num_updates=2500, lr=0.001, gnorm=0.583, clip=0, train_wall=213, wall=1311 (progress_bar.py:262, log())
[2021-03-22 16:17:07] INFO >> epoch 002: 1027 / 1973 loss=1.68, accuracy=0, mrr=0, ppl=3.2, wps=72294.6, ups=2.33, wpb=31058.5, bsz=128, num_updates=3000, lr=0.001, gnorm=0.541, clip=0, train_wall=213, wall=1526 (progress_bar.py:262, log())
[2021-03-22 16:20:42] INFO >> epoch 002: 1527 / 1973 loss=1.584, accuracy=0, mrr=0, ppl=3, wps=72101.5, ups=2.33, wpb=31005, bsz=128, num_updates=3500, lr=0.001, gnorm=0.516, clip=0, train_wall=213, wall=1741 (progress_bar.py:262, log())
[2021-03-22 16:23:54] INFO >> epoch 002 | loss 1.664 | accuracy 0 | mrr 0 | ppl 3.17 | wps 56649.1 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 3946 | lr 0.001 | gnorm 0.529 | clip 0 | train_wall 841 | wall 1932 (progress_bar.py:269, print())
[2021-03-22 16:27:33] INFO >> epoch 002 | valid on 'test' subset | loss 1.55 | accuracy 0.661614 | mrr 0.726299 | ppl 2.93 | wps 139349 | wpb 61759.1 | bsz 255.8 | num_updates 3946 | best_mrr 0.726299 (progress_bar.py:269, print())
[2021-03-22 16:27:36] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 2 @ 3946 updates, score 0.726299) (writing took 3.545247 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 16:28:06] INFO >> epoch 003: 54 / 1973 loss=1.532, accuracy=0, mrr=0, ppl=2.89, wps=34923.5, ups=1.13, wpb=31024, bsz=127.9, num_updates=4000, lr=0.001, gnorm=0.466, clip=0, train_wall=213, wall=2185 (progress_bar.py:262, log())
[2021-03-22 16:31:42] INFO >> epoch 003: 554 / 1973 loss=1.476, accuracy=0, mrr=0, ppl=2.78, wps=72051.6, ups=2.32, wpb=31024.6, bsz=128, num_updates=4500, lr=0.001, gnorm=0.446, clip=0, train_wall=214, wall=2400 (progress_bar.py:262, log())
[2021-03-22 16:35:19] INFO >> epoch 003: 1054 / 1973 loss=1.429, accuracy=0, mrr=0, ppl=2.69, wps=71586.9, ups=2.3, wpb=31063.1, bsz=128, num_updates=5000, lr=0.001, gnorm=0.426, clip=0, train_wall=215, wall=2617 (progress_bar.py:262, log())
[2021-03-22 16:38:55] INFO >> epoch 003: 1554 / 1973 loss=1.392, accuracy=0, mrr=0, ppl=2.62, wps=71674.3, ups=2.31, wpb=31007, bsz=128, num_updates=5500, lr=0.001, gnorm=0.411, clip=0, train_wall=215, wall=2834 (progress_bar.py:262, log())
[2021-03-22 16:41:57] INFO >> epoch 003 | loss 1.422 | accuracy 0 | mrr 0 | ppl 2.68 | wps 56496.1 | ups 1.82 | wpb 31027.9 | bsz 128 | num_updates 5919 | lr 0.001 | gnorm 0.421 | clip 0 | train_wall 848 | wall 3016 (progress_bar.py:269, print())
[2021-03-22 16:45:36] INFO >> epoch 003 | valid on 'test' subset | loss 1.448 | accuracy 0.67296 | mrr 0.736016 | ppl 2.73 | wps 139300 | wpb 61759.1 | bsz 255.8 | num_updates 5919 | best_mrr 0.736016 (progress_bar.py:269, print())
[2021-03-22 16:45:40] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 3 @ 5919 updates, score 0.736016) (writing took 3.376208 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 16:46:22] INFO >> epoch 004: 81 / 1973 loss=1.371, accuracy=0, mrr=0, ppl=2.59, wps=34721.2, ups=1.12, wpb=31012, bsz=127.9, num_updates=6000, lr=0.001, gnorm=0.395, clip=0, train_wall=216, wall=3280 (progress_bar.py:262, log())
[2021-03-22 16:49:57] INFO >> epoch 004: 581 / 1973 loss=1.345, accuracy=0, mrr=0, ppl=2.54, wps=72159.9, ups=2.33, wpb=31032.9, bsz=128, num_updates=6500, lr=0.001, gnorm=0.389, clip=0, train_wall=214, wall=3495 (progress_bar.py:262, log())
[2021-03-22 16:53:32] INFO >> epoch 004: 1081 / 1973 loss=1.314, accuracy=0, mrr=0, ppl=2.49, wps=71980.5, ups=2.32, wpb=31043.8, bsz=128, num_updates=7000, lr=0.001, gnorm=0.367, clip=0, train_wall=214, wall=3711 (progress_bar.py:262, log())
[2021-03-22 16:57:08] INFO >> epoch 004: 1581 / 1973 loss=1.29, accuracy=0, mrr=0, ppl=2.45, wps=71750.3, ups=2.31, wpb=31015.2, bsz=128, num_updates=7500, lr=0.001, gnorm=0.369, clip=0, train_wall=215, wall=3927 (progress_bar.py:262, log())
[2021-03-22 16:59:58] INFO >> epoch 004 | loss 1.311 | accuracy 0 | mrr 0 | ppl 2.48 | wps 56641.5 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 7892 | lr 0.001 | gnorm 0.372 | clip 0 | train_wall 845 | wall 4097 (progress_bar.py:269, print())
[2021-03-22 17:03:37] INFO >> epoch 004 | valid on 'test' subset | loss 1.389 | accuracy 0.680564 | mrr 0.742045 | ppl 2.62 | wps 139162 | wpb 61759.1 | bsz 255.8 | num_updates 7892 | best_mrr 0.742045 (progress_bar.py:269, print())
[2021-03-22 17:03:41] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 4 @ 7892 updates, score 0.742045) (writing took 3.564409 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 17:04:34] INFO >> epoch 005: 108 / 1973 loss=1.28, accuracy=0, mrr=0, ppl=2.43, wps=34884.8, ups=1.12, wpb=31059.7, bsz=127.9, num_updates=8000, lr=0.001, gnorm=0.357, clip=0, train_wall=214, wall=4372 (progress_bar.py:262, log())
[2021-03-22 17:08:08] INFO >> epoch 005: 608 / 1973 loss=1.257, accuracy=0, mrr=0, ppl=2.39, wps=72393.5, ups=2.33, wpb=31011.3, bsz=128, num_updates=8500, lr=0.001, gnorm=0.344, clip=0, train_wall=213, wall=4586 (progress_bar.py:262, log())
[2021-03-22 17:11:42] INFO >> epoch 005: 1108 / 1973 loss=1.24, accuracy=0, mrr=0, ppl=2.36, wps=72198.7, ups=2.33, wpb=31006.4, bsz=128, num_updates=9000, lr=0.001, gnorm=0.335, clip=0, train_wall=213, wall=4801 (progress_bar.py:262, log())
[2021-03-22 17:15:19] INFO >> epoch 005: 1608 / 1973 loss=1.219, accuracy=0, mrr=0, ppl=2.33, wps=71538.6, ups=2.31, wpb=31027.6, bsz=128, num_updates=9500, lr=0.001, gnorm=0.341, clip=0, train_wall=215, wall=5018 (progress_bar.py:262, log())
[2021-03-22 17:17:59] INFO >> epoch 005 | loss 1.236 | accuracy 0 | mrr 0 | ppl 2.36 | wps 56653.6 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 9865 | lr 0.001 | gnorm 0.339 | clip 0 | train_wall 845 | wall 5177 (progress_bar.py:269, print())
[2021-03-22 17:21:38] INFO >> epoch 005 | valid on 'test' subset | loss 1.363 | accuracy 0.683468 | mrr 0.744463 | ppl 2.57 | wps 139364 | wpb 61759.1 | bsz 255.8 | num_updates 9865 | best_mrr 0.744463 (progress_bar.py:269, print())
[2021-03-22 17:21:41] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 5 @ 9865 updates, score 0.744463) (writing took 3.490740 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 17:22:46] INFO >> epoch 006: 135 / 1973 loss=1.203, accuracy=0, mrr=0, ppl=2.3, wps=34737.1, ups=1.12, wpb=31063.5, bsz=127.9, num_updates=10000, lr=0.001, gnorm=0.315, clip=0, train_wall=216, wall=5465 (progress_bar.py:262, log())
[2021-03-22 17:26:22] INFO >> epoch 006: 635 / 1973 loss=1.182, accuracy=0, mrr=0, ppl=2.27, wps=71869.8, ups=2.32, wpb=30967.3, bsz=128, num_updates=10500, lr=0.001, gnorm=0.284, clip=0, train_wall=214, wall=5681 (progress_bar.py:262, log())
[2021-03-22 17:29:57] INFO >> epoch 006: 1135 / 1973 loss=1.183, accuracy=0, mrr=0, ppl=2.27, wps=72004.2, ups=2.32, wpb=31016.4, bsz=128, num_updates=11000, lr=0.001, gnorm=0.27, clip=0, train_wall=214, wall=5896 (progress_bar.py:262, log())
[2021-03-22 17:33:33] INFO >> epoch 006: 1635 / 1973 loss=1.183, accuracy=0, mrr=0, ppl=2.27, wps=71818, ups=2.32, wpb=31016.4, bsz=128, num_updates=11500, lr=0.001, gnorm=0.269, clip=0, train_wall=214, wall=6112 (progress_bar.py:262, log())
[2021-03-22 17:35:59] INFO >> epoch 006 | loss 1.183 | accuracy 0 | mrr 0 | ppl 2.27 | wps 56692.7 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 11838 | lr 0.001 | gnorm 0.275 | clip 0 | train_wall 844 | wall 6257 (progress_bar.py:269, print())
[2021-03-22 17:39:37] INFO >> epoch 006 | valid on 'test' subset | loss 1.313 | accuracy 0.689804 | mrr 0.749534 | ppl 2.49 | wps 139236 | wpb 61759.1 | bsz 255.8 | num_updates 11838 | best_mrr 0.749534 (progress_bar.py:269, print())
[2021-03-22 17:39:41] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 6 @ 11838 updates, score 0.749534) (writing took 3.340980 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 17:40:57] INFO >> epoch 007: 162 / 1973 loss=1.162, accuracy=0, mrr=0, ppl=2.24, wps=35042.9, ups=1.13, wpb=31118.4, bsz=127.9, num_updates=12000, lr=0.001, gnorm=0.272, clip=0, train_wall=213, wall=6556 (progress_bar.py:262, log())
[2021-03-22 17:44:31] INFO >> epoch 007: 662 / 1973 loss=1.123, accuracy=0, mrr=0, ppl=2.18, wps=72512.6, ups=2.34, wpb=31041.1, bsz=128, num_updates=12500, lr=0.001, gnorm=0.267, clip=0, train_wall=213, wall=6770 (progress_bar.py:262, log())
[2021-03-22 17:48:07] INFO >> epoch 007: 1162 / 1973 loss=1.14, accuracy=0, mrr=0, ppl=2.2, wps=71942.4, ups=2.32, wpb=31063, bsz=128, num_updates=13000, lr=0.001, gnorm=0.265, clip=0, train_wall=214, wall=6986 (progress_bar.py:262, log())
[2021-03-22 17:51:42] INFO >> epoch 007: 1662 / 1973 loss=1.14, accuracy=0, mrr=0, ppl=2.2, wps=72088.2, ups=2.32, wpb=31028.8, bsz=128, num_updates=13500, lr=0.001, gnorm=0.259, clip=0, train_wall=214, wall=7201 (progress_bar.py:262, log())
[2021-03-22 17:53:57] INFO >> epoch 007 | loss 1.134 | accuracy 0 | mrr 0 | ppl 2.19 | wps 56783.4 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 13811 | lr 0.001 | gnorm 0.263 | clip 0 | train_wall 843 | wall 7335 (progress_bar.py:269, print())
[2021-03-22 17:57:35] INFO >> epoch 007 | valid on 'test' subset | loss 1.3 | accuracy 0.691996 | mrr 0.751224 | ppl 2.46 | wps 139411 | wpb 61759.1 | bsz 255.8 | num_updates 13811 | best_mrr 0.751224 (progress_bar.py:269, print())
[2021-03-22 17:57:39] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 7 @ 13811 updates, score 0.751224) (writing took 3.497918 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 17:59:07] INFO >> epoch 008: 189 / 1973 loss=1.118, accuracy=0, mrr=0, ppl=2.17, wps=34871.8, ups=1.13, wpb=30996.4, bsz=127.9, num_updates=14000, lr=0.001, gnorm=0.264, clip=0, train_wall=213, wall=7646 (progress_bar.py:262, log())
[2021-03-22 18:02:41] INFO >> epoch 008: 689 / 1973 loss=1.084, accuracy=0, mrr=0, ppl=2.12, wps=72388.6, ups=2.33, wpb=31031.8, bsz=128, num_updates=14500, lr=0.001, gnorm=0.257, clip=0, train_wall=213, wall=7860 (progress_bar.py:262, log())
[2021-03-22 18:06:19] INFO >> epoch 008: 1189 / 1973 loss=1.1, accuracy=0, mrr=0, ppl=2.14, wps=71453, ups=2.3, wpb=31075.3, bsz=128, num_updates=15000, lr=0.001, gnorm=0.254, clip=0, train_wall=216, wall=8077 (progress_bar.py:262, log())
[2021-03-22 18:09:56] INFO >> epoch 008: 1689 / 1973 loss=1.106, accuracy=0, mrr=0, ppl=2.15, wps=71329.3, ups=2.3, wpb=30953.2, bsz=128, num_updates=15500, lr=0.001, gnorm=0.254, clip=0, train_wall=215, wall=8294 (progress_bar.py:262, log())
[2021-03-22 18:11:58] INFO >> epoch 008 | loss 1.095 | accuracy 0 | mrr 0 | ppl 2.14 | wps 56625.1 | ups 1.82 | wpb 31027.9 | bsz 128 | num_updates 15784 | lr 0.001 | gnorm 0.256 | clip 0 | train_wall 846 | wall 8416 (progress_bar.py:269, print())
[2021-03-22 18:15:37] INFO >> epoch 008 | valid on 'test' subset | loss 1.286 | accuracy 0.694309 | mrr 0.75301 | ppl 2.44 | wps 139066 | wpb 61759.1 | bsz 255.8 | num_updates 15784 | best_mrr 0.75301 (progress_bar.py:269, print())
[2021-03-22 18:15:40] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 8 @ 15784 updates, score 0.75301) (writing took 3.558886 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 18:17:19] INFO >> epoch 009: 216 / 1973 loss=1.074, accuracy=0, mrr=0, ppl=2.11, wps=34907.5, ups=1.13, wpb=30992, bsz=127.9, num_updates=16000, lr=0.001, gnorm=0.248, clip=0, train_wall=213, wall=8738 (progress_bar.py:262, log())
[2021-03-22 18:20:55] INFO >> epoch 009: 716 / 1973 loss=1.052, accuracy=0, mrr=0, ppl=2.07, wps=71789.5, ups=2.32, wpb=30987.2, bsz=128, num_updates=16500, lr=0.001, gnorm=0.254, clip=0, train_wall=214, wall=8954 (progress_bar.py:262, log())
[2021-03-22 18:24:30] INFO >> epoch 009: 1216 / 1973 loss=1.069, accuracy=0, mrr=0, ppl=2.1, wps=72457.6, ups=2.33, wpb=31122.5, bsz=128, num_updates=17000, lr=0.001, gnorm=0.247, clip=0, train_wall=213, wall=9169 (progress_bar.py:262, log())
[2021-03-22 18:28:05] INFO >> epoch 009: 1716 / 1973 loss=1.072, accuracy=0, mrr=0, ppl=2.1, wps=72277.1, ups=2.33, wpb=31033.8, bsz=128, num_updates=17500, lr=0.001, gnorm=0.245, clip=0, train_wall=213, wall=9383 (progress_bar.py:262, log())
[2021-03-22 18:29:56] INFO >> epoch 009 | loss 1.062 | accuracy 0 | mrr 0 | ppl 2.09 | wps 56772.9 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 17757 | lr 0.001 | gnorm 0.246 | clip 0 | train_wall 842 | wall 9495 (progress_bar.py:269, print())
[2021-03-22 18:33:35] INFO >> epoch 009 | valid on 'test' subset | loss 1.278 | accuracy 0.695777 | mrr 0.754083 | ppl 2.42 | wps 139027 | wpb 61759.1 | bsz 255.8 | num_updates 17757 | best_mrr 0.754083 (progress_bar.py:269, print())
[2021-03-22 18:33:39] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 9 @ 17757 updates, score 0.754083) (writing took 3.363997 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 18:35:30] INFO >> epoch 010: 243 / 1973 loss=1.04, accuracy=0, mrr=0, ppl=2.06, wps=34770.3, ups=1.12, wpb=30936.6, bsz=127.9, num_updates=18000, lr=0.001, gnorm=0.239, clip=0, train_wall=213, wall=9828 (progress_bar.py:262, log())
[2021-03-22 18:39:05] INFO >> epoch 010: 743 / 1973 loss=1.026, accuracy=0, mrr=0, ppl=2.04, wps=72233.4, ups=2.33, wpb=31057.3, bsz=128, num_updates=18500, lr=0.001, gnorm=0.247, clip=0, train_wall=213, wall=10043 (progress_bar.py:262, log())
[2021-03-22 18:42:40] INFO >> epoch 010: 1243 / 1973 loss=1.037, accuracy=0, mrr=0, ppl=2.05, wps=71882.6, ups=2.32, wpb=30993.2, bsz=128, num_updates=19000, lr=0.001, gnorm=0.241, clip=0, train_wall=214, wall=10259 (progress_bar.py:262, log())
[2021-03-22 18:46:16] INFO >> epoch 010: 1743 / 1973 loss=1.045, accuracy=0, mrr=0, ppl=2.06, wps=71937, ups=2.32, wpb=31029.1, bsz=128, num_updates=19500, lr=0.001, gnorm=0.234, clip=0, train_wall=214, wall=10475 (progress_bar.py:262, log())
[2021-03-22 18:47:56] INFO >> epoch 010 | loss 1.034 | accuracy 0 | mrr 0 | ppl 2.05 | wps 56685.7 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 19730 | lr 0.001 | gnorm 0.241 | clip 0 | train_wall 844 | wall 10575 (progress_bar.py:269, print())
[2021-03-22 18:51:35] INFO >> epoch 010 | valid on 'test' subset | loss 1.273 | accuracy 0.696917 | mrr 0.754943 | ppl 2.42 | wps 139346 | wpb 61759.1 | bsz 255.8 | num_updates 19730 | best_mrr 0.754943 (progress_bar.py:269, print())
[2021-03-22 18:51:38] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 10 @ 19730 updates, score 0.754943) (writing took 3.278995 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 18:53:42] INFO >> epoch 011: 270 / 1973 loss=1.011, accuracy=0, mrr=0, ppl=2.02, wps=34847, ups=1.12, wpb=31110.9, bsz=127.9, num_updates=20000, lr=0.001, gnorm=0.23, clip=0, train_wall=215, wall=10921 (progress_bar.py:262, log())
[2021-03-22 18:57:17] INFO >> epoch 011: 770 / 1973 loss=0.998, accuracy=0, mrr=0, ppl=2, wps=72406.7, ups=2.33, wpb=31142.6, bsz=128, num_updates=20500, lr=0.001, gnorm=0.231, clip=0, train_wall=214, wall=11136 (progress_bar.py:262, log())
[2021-03-22 19:00:52] INFO >> epoch 011: 1270 / 1973 loss=1.013, accuracy=0, mrr=0, ppl=2.02, wps=72141, ups=2.33, wpb=31012.3, bsz=128, num_updates=21000, lr=0.001, gnorm=0.237, clip=0, train_wall=213, wall=11351 (progress_bar.py:262, log())
[2021-03-22 19:04:28] INFO >> epoch 011: 1770 / 1973 loss=1.025, accuracy=0, mrr=0, ppl=2.04, wps=71776.2, ups=2.32, wpb=30997.8, bsz=128, num_updates=21500, lr=0.001, gnorm=0.23, clip=0, train_wall=214, wall=11567 (progress_bar.py:262, log())
[2021-03-22 19:05:56] INFO >> epoch 011 | loss 1.009 | accuracy 0 | mrr 0 | ppl 2.01 | wps 56656.3 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 21703 | lr 0.001 | gnorm 0.232 | clip 0 | train_wall 845 | wall 11655 (progress_bar.py:269, print())
[2021-03-22 19:09:36] INFO >> epoch 011 | valid on 'test' subset | loss 1.272 | accuracy 0.697991 | mrr 0.755574 | ppl 2.41 | wps 138895 | wpb 61759.1 | bsz 255.8 | num_updates 21703 | best_mrr 0.755574 (progress_bar.py:269, print())
[2021-03-22 19:09:39] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 11 @ 21703 updates, score 0.755574) (writing took 3.402816 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 19:11:54] INFO >> epoch 012: 297 / 1973 loss=0.983, accuracy=0, mrr=0, ppl=1.98, wps=34699.4, ups=1.12, wpb=30914.9, bsz=127.9, num_updates=22000, lr=0.001, gnorm=0.236, clip=0, train_wall=214, wall=12012 (progress_bar.py:262, log())
[2021-03-22 19:15:28] INFO >> epoch 012: 797 / 1973 loss=0.98, accuracy=0, mrr=0, ppl=1.97, wps=72242, ups=2.33, wpb=31000.6, bsz=128, num_updates=22500, lr=0.001, gnorm=0.24, clip=0, train_wall=213, wall=12227 (progress_bar.py:262, log())
[2021-03-22 19:19:04] INFO >> epoch 012: 1297 / 1973 loss=0.992, accuracy=0, mrr=0, ppl=1.99, wps=71935, ups=2.31, wpb=31116.2, bsz=128, num_updates=23000, lr=0.001, gnorm=0.226, clip=0, train_wall=215, wall=12443 (progress_bar.py:262, log())
[2021-03-22 19:22:40] INFO >> epoch 012: 1797 / 1973 loss=0.999, accuracy=0, mrr=0, ppl=2, wps=72129.2, ups=2.33, wpb=31022.2, bsz=128, num_updates=23500, lr=0.001, gnorm=0.226, clip=0, train_wall=214, wall=12658 (progress_bar.py:262, log())
[2021-03-22 19:23:56] INFO >> epoch 012 | loss 0.986 | accuracy 0 | mrr 0 | ppl 1.98 | wps 56729.8 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 23676 | lr 0.001 | gnorm 0.232 | clip 0 | train_wall 843 | wall 12734 (progress_bar.py:269, print())
[2021-03-22 19:27:35] INFO >> epoch 012 | valid on 'test' subset | loss 1.278 | accuracy 0.698114 | mrr 0.755618 | ppl 2.42 | wps 139107 | wpb 61759.1 | bsz 255.8 | num_updates 23676 | best_mrr 0.755618 (progress_bar.py:269, print())
[2021-03-22 19:27:38] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 12 @ 23676 updates, score 0.755618) (writing took 3.290560 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 19:30:05] INFO >> epoch 013: 324 / 1973 loss=0.958, accuracy=0, mrr=0, ppl=1.94, wps=34800.2, ups=1.12, wpb=30969.7, bsz=127.9, num_updates=24000, lr=0.001, gnorm=0.229, clip=0, train_wall=213, wall=13103 (progress_bar.py:262, log())
[2021-03-22 19:33:40] INFO >> epoch 013: 824 / 1973 loss=0.958, accuracy=0, mrr=0, ppl=1.94, wps=72187, ups=2.32, wpb=31060.3, bsz=128, num_updates=24500, lr=0.001, gnorm=0.234, clip=0, train_wall=214, wall=13318 (progress_bar.py:262, log())
[2021-03-22 19:37:16] INFO >> epoch 013: 1324 / 1973 loss=0.975, accuracy=0, mrr=0, ppl=1.97, wps=71593.5, ups=2.31, wpb=31030.1, bsz=128, num_updates=25000, lr=0.001, gnorm=0.229, clip=0, train_wall=215, wall=13535 (progress_bar.py:262, log())
[2021-03-22 19:40:51] INFO >> epoch 013: 1824 / 1973 loss=0.979, accuracy=0, mrr=0, ppl=1.97, wps=72363.8, ups=2.33, wpb=31012.9, bsz=128, num_updates=25500, lr=0.001, gnorm=0.226, clip=0, train_wall=213, wall=13749 (progress_bar.py:262, log())
[2021-03-22 19:41:55] INFO >> epoch 013 | loss 0.966 | accuracy 0 | mrr 0 | ppl 1.95 | wps 56714.4 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 25649 | lr 0.001 | gnorm 0.229 | clip 0 | train_wall 844 | wall 13814 (progress_bar.py:269, print())
[2021-03-22 19:45:35] INFO >> epoch 013 | valid on 'test' subset | loss 1.285 | accuracy 0.698369 | mrr 0.755624 | ppl 2.44 | wps 138907 | wpb 61759.1 | bsz 255.8 | num_updates 25649 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 19:45:38] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_best.pt (epoch 13 @ 25649 updates, score 0.755624) (writing took 3.380713 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 19:48:16] INFO >> epoch 014: 351 / 1973 loss=0.938, accuracy=0, mrr=0, ppl=1.92, wps=34814.4, ups=1.12, wpb=30998.8, bsz=127.9, num_updates=26000, lr=0.001, gnorm=0.233, clip=0, train_wall=214, wall=14195 (progress_bar.py:262, log())
[2021-03-22 19:51:51] INFO >> epoch 014: 851 / 1973 loss=0.939, accuracy=0, mrr=0, ppl=1.92, wps=71923.1, ups=2.32, wpb=30958.9, bsz=128, num_updates=26500, lr=0.001, gnorm=0.237, clip=0, train_wall=214, wall=14410 (progress_bar.py:262, log())
[2021-03-22 19:55:26] INFO >> epoch 014: 1351 / 1973 loss=0.953, accuracy=0, mrr=0, ppl=1.94, wps=72264.4, ups=2.32, wpb=31118.2, bsz=128, num_updates=27000, lr=0.001, gnorm=0.223, clip=0, train_wall=214, wall=14625 (progress_bar.py:262, log())
[2021-03-22 19:59:00] INFO >> epoch 014: 1851 / 1973 loss=0.964, accuracy=0, mrr=0, ppl=1.95, wps=72574.6, ups=2.34, wpb=31003.1, bsz=128, num_updates=27500, lr=0.001, gnorm=0.218, clip=0, train_wall=212, wall=14839 (progress_bar.py:262, log())
[2021-03-22 19:59:53] INFO >> epoch 014 | loss 0.947 | accuracy 0 | mrr 0 | ppl 1.93 | wps 56796.7 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 27622 | lr 0.001 | gnorm 0.229 | clip 0 | train_wall 842 | wall 14892 (progress_bar.py:269, print())
[2021-03-22 20:03:32] INFO >> epoch 014 | valid on 'test' subset | loss 1.283 | accuracy 0.697562 | mrr 0.755183 | ppl 2.43 | wps 139235 | wpb 61759.1 | bsz 255.8 | num_updates 27622 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 20:03:34] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt (epoch 14 @ 27622 updates, score 0.755183) (writing took 2.259267 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 20:06:25] INFO >> epoch 015: 378 / 1973 loss=0.918, accuracy=0, mrr=0, ppl=1.89, wps=34840.9, ups=1.12, wpb=31042.5, bsz=127.9, num_updates=28000, lr=0.001, gnorm=0.232, clip=0, train_wall=215, wall=15284 (progress_bar.py:262, log())
[2021-03-22 20:10:01] INFO >> epoch 015: 878 / 1973 loss=0.922, accuracy=0, mrr=0, ppl=1.89, wps=71804.9, ups=2.32, wpb=30995.8, bsz=128, num_updates=28500, lr=0.001, gnorm=0.235, clip=0, train_wall=214, wall=15500 (progress_bar.py:262, log())
[2021-03-22 20:13:36] INFO >> epoch 015: 1378 / 1973 loss=0.938, accuracy=0, mrr=0, ppl=1.92, wps=72570.8, ups=2.33, wpb=31104, bsz=128, num_updates=29000, lr=0.001, gnorm=0.227, clip=0, train_wall=213, wall=15714 (progress_bar.py:262, log())
[2021-03-22 20:17:11] INFO >> epoch 015: 1878 / 1973 loss=0.949, accuracy=0, mrr=0, ppl=1.93, wps=71793, ups=2.32, wpb=30988.5, bsz=128, num_updates=29500, lr=0.001, gnorm=0.232, clip=0, train_wall=214, wall=15930 (progress_bar.py:262, log())
[2021-03-22 20:17:53] INFO >> epoch 015 | loss 0.93 | accuracy 0 | mrr 0 | ppl 1.91 | wps 56695.6 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 29595 | lr 0.001 | gnorm 0.232 | clip 0 | train_wall 845 | wall 15971 (progress_bar.py:269, print())
[2021-03-22 20:21:31] INFO >> epoch 015 | valid on 'test' subset | loss 1.283 | accuracy 0.698105 | mrr 0.755453 | ppl 2.43 | wps 139176 | wpb 61759.1 | bsz 255.8 | num_updates 29595 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 20:21:34] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt (epoch 15 @ 29595 updates, score 0.755453) (writing took 2.210821 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 20:24:35] INFO >> epoch 016: 405 / 1973 loss=0.898, accuracy=0, mrr=0, ppl=1.86, wps=35068.9, ups=1.13, wpb=31116.3, bsz=127.9, num_updates=30000, lr=0.001, gnorm=0.231, clip=0, train_wall=214, wall=16374 (progress_bar.py:262, log())
[2021-03-22 20:28:10] INFO >> epoch 016: 905 / 1973 loss=0.909, accuracy=0, mrr=0, ppl=1.88, wps=72239.7, ups=2.33, wpb=30987.6, bsz=128, num_updates=30500, lr=0.001, gnorm=0.239, clip=0, train_wall=213, wall=16588 (progress_bar.py:262, log())
[2021-03-22 20:31:44] INFO >> epoch 016: 1405 / 1973 loss=0.922, accuracy=0, mrr=0, ppl=1.89, wps=72225.2, ups=2.33, wpb=31032.7, bsz=128, num_updates=31000, lr=0.001, gnorm=0.23, clip=0, train_wall=213, wall=16803 (progress_bar.py:262, log())
[2021-03-22 20:35:19] INFO >> epoch 016: 1905 / 1973 loss=0.933, accuracy=0, mrr=0, ppl=1.91, wps=72250.1, ups=2.33, wpb=31017.1, bsz=128, num_updates=31500, lr=0.001, gnorm=0.225, clip=0, train_wall=213, wall=17018 (progress_bar.py:262, log())
[2021-03-22 20:35:49] INFO >> epoch 016 | loss 0.915 | accuracy 0 | mrr 0 | ppl 1.88 | wps 56883.4 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 31568 | lr 0.001 | gnorm 0.231 | clip 0 | train_wall 842 | wall 17048 (progress_bar.py:269, print())
[2021-03-22 20:39:28] INFO >> epoch 016 | valid on 'test' subset | loss 1.287 | accuracy 0.698082 | mrr 0.755451 | ppl 2.44 | wps 139213 | wpb 61759.1 | bsz 255.8 | num_updates 31568 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 20:39:30] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt (epoch 16 @ 31568 updates, score 0.755451) (writing took 2.100866 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 20:42:43] INFO >> epoch 017: 432 / 1973 loss=0.884, accuracy=0, mrr=0, ppl=1.85, wps=34943.7, ups=1.13, wpb=31034.3, bsz=127.9, num_updates=32000, lr=0.001, gnorm=0.232, clip=0, train_wall=214, wall=17462 (progress_bar.py:262, log())
[2021-03-22 20:46:18] INFO >> epoch 017: 932 / 1973 loss=0.893, accuracy=0, mrr=0, ppl=1.86, wps=72246.8, ups=2.33, wpb=31040, bsz=128, num_updates=32500, lr=0.001, gnorm=0.232, clip=0, train_wall=213, wall=17677 (progress_bar.py:262, log())
[2021-03-22 20:49:53] INFO >> epoch 017: 1432 / 1973 loss=0.908, accuracy=0, mrr=0, ppl=1.88, wps=71963.4, ups=2.32, wpb=31007.9, bsz=128, num_updates=33000, lr=0.001, gnorm=0.228, clip=0, train_wall=214, wall=17892 (progress_bar.py:262, log())
[2021-03-22 20:53:28] INFO >> epoch 017: 1932 / 1973 loss=0.918, accuracy=0, mrr=0, ppl=1.89, wps=72101.3, ups=2.33, wpb=30993.8, bsz=128, num_updates=33500, lr=0.001, gnorm=0.233, clip=0, train_wall=213, wall=18107 (progress_bar.py:262, log())
[2021-03-22 20:53:46] INFO >> epoch 017 | loss 0.9 | accuracy 0 | mrr 0 | ppl 1.87 | wps 56806.7 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 33541 | lr 0.001 | gnorm 0.231 | clip 0 | train_wall 843 | wall 18125 (progress_bar.py:269, print())
[2021-03-22 20:57:26] INFO >> epoch 017 | valid on 'test' subset | loss 1.303 | accuracy 0.697507 | mrr 0.75486 | ppl 2.47 | wps 139000 | wpb 61759.1 | bsz 255.8 | num_updates 33541 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 20:57:28] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt (epoch 17 @ 33541 updates, score 0.75486) (writing took 2.209759 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 21:00:54] INFO >> epoch 018: 459 / 1973 loss=0.864, accuracy=0, mrr=0, ppl=1.82, wps=34737.3, ups=1.12, wpb=30963.7, bsz=127.9, num_updates=34000, lr=0.001, gnorm=0.235, clip=0, train_wall=215, wall=18553 (progress_bar.py:262, log())
[2021-03-22 21:04:29] INFO >> epoch 018: 959 / 1973 loss=0.88, accuracy=0, mrr=0, ppl=1.84, wps=72447.2, ups=2.33, wpb=31098.5, bsz=128, num_updates=34500, lr=0.001, gnorm=0.231, clip=0, train_wall=213, wall=18767 (progress_bar.py:262, log())
[2021-03-22 21:08:04] INFO >> epoch 018: 1459 / 1973 loss=0.898, accuracy=0, mrr=0, ppl=1.86, wps=71935.2, ups=2.32, wpb=30998.7, bsz=128, num_updates=35000, lr=0.001, gnorm=0.231, clip=0, train_wall=214, wall=18983 (progress_bar.py:262, log())
[2021-03-22 21:11:41] INFO >> epoch 018: 1959 / 1973 loss=0.904, accuracy=0, mrr=0, ppl=1.87, wps=71527.9, ups=2.3, wpb=31038.6, bsz=128, num_updates=35500, lr=0.001, gnorm=0.229, clip=0, train_wall=215, wall=19200 (progress_bar.py:262, log())
[2021-03-22 21:11:47] INFO >> epoch 018 | loss 0.886 | accuracy 0 | mrr 0 | ppl 1.85 | wps 56631.1 | ups 1.83 | wpb 31027.9 | bsz 128 | num_updates 35514 | lr 0.001 | gnorm 0.232 | clip 0 | train_wall 846 | wall 19206 (progress_bar.py:269, print())
[2021-03-22 21:15:26] INFO >> epoch 018 | valid on 'test' subset | loss 1.304 | accuracy 0.698116 | mrr 0.755306 | ppl 2.47 | wps 139267 | wpb 61759.1 | bsz 255.8 | num_updates 35514 | best_mrr 0.755624 (progress_bar.py:269, print())
[2021-03-22 21:15:29] INFO >> saved checkpoint /home/wanyao/.ncc/raw_py150/completion/data-mmap/gpt2/checkpoints/checkpoint_last.pt (epoch 18 @ 35514 updates, score 0.755306) (writing took 2.134230 seconds) (checkpoint_utils.py:81, save_checkpoint())
[2021-03-22 21:15:29] INFO >> early stop since valid performance hasn't improved for last 5 runs (train.py:185, should_stop_early())
[2021-03-22 21:15:29] INFO >> early stop since valid performance hasn't improved for last 5 runs (train.py:271, single_main())
[2021-03-22 21:15:29] INFO >> done training in 19425.9 seconds (train.py:282, single_main())