-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtraining_log.txt
9708 lines (9708 loc) · 606 KB
/
training_log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Epoch [1/3], Step [1/3236], Loss: 9.2210, Perplexity: 10107.3275
Epoch [1/3], Step [2/3236], Loss: 8.9678, Perplexity: 7845.9926
Epoch [1/3], Step [3/3236], Loss: 8.7267, Perplexity: 6165.2949
Epoch [1/3], Step [4/3236], Loss: 8.3413, Perplexity: 4193.3459
Epoch [1/3], Step [5/3236], Loss: 7.9210, Perplexity: 2754.5360
Epoch [1/3], Step [6/3236], Loss: 7.3273, Perplexity: 1521.2232
Epoch [1/3], Step [7/3236], Loss: 6.6220, Perplexity: 751.4295
Epoch [1/3], Step [8/3236], Loss: 6.0439, Perplexity: 421.5457
Epoch [1/3], Step [9/3236], Loss: 5.4463, Perplexity: 231.8892
Epoch [1/3], Step [10/3236], Loss: 5.1307, Perplexity: 169.1432
Epoch [1/3], Step [11/3236], Loss: 4.8741, Perplexity: 130.8585
Epoch [1/3], Step [12/3236], Loss: 4.7995, Perplexity: 121.4497
Epoch [1/3], Step [13/3236], Loss: 4.7192, Perplexity: 112.0818
Epoch [1/3], Step [14/3236], Loss: 4.6477, Perplexity: 104.3406
Epoch [1/3], Step [15/3236], Loss: 4.7182, Perplexity: 111.9695
Epoch [1/3], Step [16/3236], Loss: 4.7434, Perplexity: 114.8269
Epoch [1/3], Step [17/3236], Loss: 4.6268, Perplexity: 102.1823
Epoch [1/3], Step [18/3236], Loss: 4.5979, Perplexity: 99.2724
Epoch [1/3], Step [19/3236], Loss: 4.6455, Perplexity: 104.1190
Epoch [1/3], Step [20/3236], Loss: 4.9828, Perplexity: 145.8846
Epoch [1/3], Step [21/3236], Loss: 4.7044, Perplexity: 110.4341
Epoch [1/3], Step [22/3236], Loss: 4.4547, Perplexity: 86.0262
Epoch [1/3], Step [23/3236], Loss: 4.5448, Perplexity: 94.1414
Epoch [1/3], Step [24/3236], Loss: 4.7700, Perplexity: 117.9170
Epoch [1/3], Step [25/3236], Loss: 4.4713, Perplexity: 87.4717
Epoch [1/3], Step [26/3236], Loss: 4.4556, Perplexity: 86.1064
Epoch [1/3], Step [27/3236], Loss: 4.5689, Perplexity: 96.4407
Epoch [1/3], Step [28/3236], Loss: 4.4791, Perplexity: 88.1589
Epoch [1/3], Step [29/3236], Loss: 4.4881, Perplexity: 88.9563
Epoch [1/3], Step [30/3236], Loss: 4.4632, Perplexity: 86.7639
Epoch [1/3], Step [31/3236], Loss: 4.2412, Perplexity: 69.4922
Epoch [1/3], Step [32/3236], Loss: 4.4337, Perplexity: 84.2410
Epoch [1/3], Step [33/3236], Loss: 4.3634, Perplexity: 78.5237
Epoch [1/3], Step [34/3236], Loss: 4.3723, Perplexity: 79.2253
Epoch [1/3], Step [35/3236], Loss: 4.3800, Perplexity: 79.8379
Epoch [1/3], Step [36/3236], Loss: 4.5235, Perplexity: 92.1541
Epoch [1/3], Step [37/3236], Loss: 4.3002, Perplexity: 73.7131
Epoch [1/3], Step [38/3236], Loss: 4.2319, Perplexity: 68.8449
Epoch [1/3], Step [39/3236], Loss: 4.4919, Perplexity: 89.2922
Epoch [1/3], Step [40/3236], Loss: 4.3306, Perplexity: 75.9914
Epoch [1/3], Step [41/3236], Loss: 4.1876, Perplexity: 65.8621
Epoch [1/3], Step [42/3236], Loss: 4.3210, Perplexity: 75.2640
Epoch [1/3], Step [43/3236], Loss: 4.1217, Perplexity: 61.6629
Epoch [1/3], Step [44/3236], Loss: 4.1848, Perplexity: 65.6798
Epoch [1/3], Step [45/3236], Loss: 4.1226, Perplexity: 61.7203
Epoch [1/3], Step [46/3236], Loss: 4.2208, Perplexity: 68.0871
Epoch [1/3], Step [47/3236], Loss: 4.0849, Perplexity: 59.4387
Epoch [1/3], Step [48/3236], Loss: 4.3299, Perplexity: 75.9367
Epoch [1/3], Step [49/3236], Loss: 4.0933, Perplexity: 59.9367
Epoch [1/3], Step [50/3236], Loss: 4.1116, Perplexity: 61.0466
Epoch [1/3], Step [51/3236], Loss: 4.1649, Perplexity: 64.3868
Epoch [1/3], Step [52/3236], Loss: 4.2018, Perplexity: 66.8098
Epoch [1/3], Step [53/3236], Loss: 4.0778, Perplexity: 59.0159
Epoch [1/3], Step [54/3236], Loss: 4.2000, Perplexity: 66.6868
Epoch [1/3], Step [55/3236], Loss: 3.9134, Perplexity: 50.0684
Epoch [1/3], Step [56/3236], Loss: 4.0148, Perplexity: 55.4096
Epoch [1/3], Step [57/3236], Loss: 3.9654, Perplexity: 52.7423
Epoch [1/3], Step [58/3236], Loss: 4.0311, Perplexity: 56.3250
Epoch [1/3], Step [59/3236], Loss: 3.9569, Perplexity: 52.2928
Epoch [1/3], Step [60/3236], Loss: 4.0463, Perplexity: 57.1832
Epoch [1/3], Step [61/3236], Loss: 3.9615, Perplexity: 52.5380
Epoch [1/3], Step [62/3236], Loss: 3.8693, Perplexity: 47.9093
Epoch [1/3], Step [63/3236], Loss: 3.7686, Perplexity: 43.3178
Epoch [1/3], Step [64/3236], Loss: 3.7518, Perplexity: 42.5977
Epoch [1/3], Step [65/3236], Loss: 3.8213, Perplexity: 45.6650
Epoch [1/3], Step [66/3236], Loss: 3.8113, Perplexity: 45.2080
Epoch [1/3], Step [67/3236], Loss: 3.8018, Perplexity: 44.7799
Epoch [1/3], Step [68/3236], Loss: 4.3632, Perplexity: 78.5109
Epoch [1/3], Step [69/3236], Loss: 3.7526, Perplexity: 42.6308
Epoch [1/3], Step [70/3236], Loss: 3.8523, Perplexity: 47.1021
Epoch [1/3], Step [71/3236], Loss: 3.7027, Perplexity: 40.5549
Epoch [1/3], Step [72/3236], Loss: 4.0737, Perplexity: 58.7751
Epoch [1/3], Step [73/3236], Loss: 3.9439, Perplexity: 51.6202
Epoch [1/3], Step [74/3236], Loss: 3.7156, Perplexity: 41.0850
Epoch [1/3], Step [75/3236], Loss: 3.7377, Perplexity: 42.0012
Epoch [1/3], Step [76/3236], Loss: 3.8249, Perplexity: 45.8270
Epoch [1/3], Step [77/3236], Loss: 3.6394, Perplexity: 38.0686
Epoch [1/3], Step [78/3236], Loss: 3.8668, Perplexity: 47.7881
Epoch [1/3], Step [79/3236], Loss: 3.8431, Perplexity: 46.6716
Epoch [1/3], Step [80/3236], Loss: 3.7322, Perplexity: 41.7709
Epoch [1/3], Step [81/3236], Loss: 3.8512, Perplexity: 47.0485
Epoch [1/3], Step [82/3236], Loss: 3.8421, Perplexity: 46.6214
Epoch [1/3], Step [83/3236], Loss: 3.8569, Perplexity: 47.3183
Epoch [1/3], Step [84/3236], Loss: 3.7417, Perplexity: 42.1705
Epoch [1/3], Step [85/3236], Loss: 3.7524, Perplexity: 42.6233
Epoch [1/3], Step [86/3236], Loss: 3.7090, Perplexity: 40.8128
Epoch [1/3], Step [87/3236], Loss: 3.7608, Perplexity: 42.9831
Epoch [1/3], Step [88/3236], Loss: 3.6762, Perplexity: 39.4943
Epoch [1/3], Step [89/3236], Loss: 3.7383, Perplexity: 42.0268
Epoch [1/3], Step [90/3236], Loss: 3.7280, Perplexity: 41.5962
Epoch [1/3], Step [91/3236], Loss: 3.5101, Perplexity: 33.4517
Epoch [1/3], Step [92/3236], Loss: 3.7594, Perplexity: 42.9244
Epoch [1/3], Step [93/3236], Loss: 3.5640, Perplexity: 35.3040
Epoch [1/3], Step [94/3236], Loss: 4.1132, Perplexity: 61.1394
Epoch [1/3], Step [95/3236], Loss: 3.7849, Perplexity: 44.0321
Epoch [1/3], Step [96/3236], Loss: 3.6265, Perplexity: 37.5796
Epoch [1/3], Step [97/3236], Loss: 3.5118, Perplexity: 33.5096
Epoch [1/3], Step [98/3236], Loss: 3.6898, Perplexity: 40.0378
Epoch [1/3], Step [99/3236], Loss: 3.4818, Perplexity: 32.5194
Epoch [1/3], Step [100/3236], Loss: 3.8979, Perplexity: 49.2979
Epoch [1/3], Step [101/3236], Loss: 3.5721, Perplexity: 35.5926
Epoch [1/3], Step [102/3236], Loss: 3.7119, Perplexity: 40.9296
Epoch [1/3], Step [103/3236], Loss: 3.7364, Perplexity: 41.9483
Epoch [1/3], Step [104/3236], Loss: 3.6620, Perplexity: 38.9410
Epoch [1/3], Step [105/3236], Loss: 4.4920, Perplexity: 89.2983
Epoch [1/3], Step [106/3236], Loss: 3.5353, Perplexity: 34.3049
Epoch [1/3], Step [107/3236], Loss: 3.6171, Perplexity: 37.2307
Epoch [1/3], Step [108/3236], Loss: 3.5858, Perplexity: 36.0804
Epoch [1/3], Step [109/3236], Loss: 3.6836, Perplexity: 39.7878
Epoch [1/3], Step [110/3236], Loss: 3.5677, Perplexity: 35.4338
Epoch [1/3], Step [111/3236], Loss: 3.6519, Perplexity: 38.5489
Epoch [1/3], Step [112/3236], Loss: 3.6182, Perplexity: 37.2698
Epoch [1/3], Step [113/3236], Loss: 3.5907, Perplexity: 36.2583
Epoch [1/3], Step [114/3236], Loss: 3.3359, Perplexity: 28.1040
Epoch [1/3], Step [115/3236], Loss: 3.4753, Perplexity: 32.3065
Epoch [1/3], Step [116/3236], Loss: 3.9931, Perplexity: 54.2205
Epoch [1/3], Step [117/3236], Loss: 3.6238, Perplexity: 37.4814
Epoch [1/3], Step [118/3236], Loss: 3.5514, Perplexity: 34.8628
Epoch [1/3], Step [119/3236], Loss: 3.2791, Perplexity: 26.5522
Epoch [1/3], Step [120/3236], Loss: 3.3879, Perplexity: 29.6031
Epoch [1/3], Step [121/3236], Loss: 3.4878, Perplexity: 32.7139
Epoch [1/3], Step [122/3236], Loss: 3.5223, Perplexity: 33.8629
Epoch [1/3], Step [123/3236], Loss: 3.4198, Perplexity: 30.5634
Epoch [1/3], Step [124/3236], Loss: 3.4743, Perplexity: 32.2752
Epoch [1/3], Step [125/3236], Loss: 3.5683, Perplexity: 35.4567
Epoch [1/3], Step [126/3236], Loss: 3.6040, Perplexity: 36.7464
Epoch [1/3], Step [127/3236], Loss: 3.5523, Perplexity: 34.8943
Epoch [1/3], Step [128/3236], Loss: 3.3661, Perplexity: 28.9647
Epoch [1/3], Step [129/3236], Loss: 3.6254, Perplexity: 37.5412
Epoch [1/3], Step [130/3236], Loss: 3.3960, Perplexity: 29.8453
Epoch [1/3], Step [131/3236], Loss: 3.7935, Perplexity: 44.4131
Epoch [1/3], Step [132/3236], Loss: 4.1419, Perplexity: 62.9244
Epoch [1/3], Step [133/3236], Loss: 3.6739, Perplexity: 39.4072
Epoch [1/3], Step [134/3236], Loss: 3.4088, Perplexity: 30.2292
Epoch [1/3], Step [135/3236], Loss: 3.5209, Perplexity: 33.8150
Epoch [1/3], Step [136/3236], Loss: 3.5430, Perplexity: 34.5704
Epoch [1/3], Step [137/3236], Loss: 3.6121, Perplexity: 37.0443
Epoch [1/3], Step [138/3236], Loss: 3.3680, Perplexity: 29.0192
Epoch [1/3], Step [139/3236], Loss: 3.5405, Perplexity: 34.4837
Epoch [1/3], Step [140/3236], Loss: 3.4420, Perplexity: 31.2495
Epoch [1/3], Step [141/3236], Loss: 3.4783, Perplexity: 32.4047
Epoch [1/3], Step [142/3236], Loss: 3.4360, Perplexity: 31.0623
Epoch [1/3], Step [143/3236], Loss: 3.4550, Perplexity: 31.6567
Epoch [1/3], Step [144/3236], Loss: 3.2979, Perplexity: 27.0552
Epoch [1/3], Step [145/3236], Loss: 3.2451, Perplexity: 25.6636
Epoch [1/3], Step [146/3236], Loss: 3.6531, Perplexity: 38.5949
Epoch [1/3], Step [147/3236], Loss: 3.4664, Perplexity: 32.0227
Epoch [1/3], Step [148/3236], Loss: 3.6055, Perplexity: 36.8015
Epoch [1/3], Step [149/3236], Loss: 3.5149, Perplexity: 33.6121
Epoch [1/3], Step [150/3236], Loss: 4.0118, Perplexity: 55.2468
Epoch [1/3], Step [151/3236], Loss: 3.8836, Perplexity: 48.5984
Epoch [1/3], Step [152/3236], Loss: 3.7363, Perplexity: 41.9432
Epoch [1/3], Step [153/3236], Loss: 3.5699, Perplexity: 35.5114
Epoch [1/3], Step [154/3236], Loss: 3.3863, Perplexity: 29.5554
Epoch [1/3], Step [155/3236], Loss: 3.5694, Perplexity: 35.4956
Epoch [1/3], Step [156/3236], Loss: 3.3101, Perplexity: 27.3868
Epoch [1/3], Step [157/3236], Loss: 3.5876, Perplexity: 36.1482
Epoch [1/3], Step [158/3236], Loss: 3.5319, Perplexity: 34.1882
Epoch [1/3], Step [159/3236], Loss: 3.2654, Perplexity: 26.1906
Epoch [1/3], Step [160/3236], Loss: 3.4972, Perplexity: 33.0215
Epoch [1/3], Step [161/3236], Loss: 3.4138, Perplexity: 30.3819
Epoch [1/3], Step [162/3236], Loss: 3.6734, Perplexity: 39.3865
Epoch [1/3], Step [163/3236], Loss: 3.3474, Perplexity: 28.4279
Epoch [1/3], Step [164/3236], Loss: 3.7371, Perplexity: 41.9765
Epoch [1/3], Step [165/3236], Loss: 3.4907, Perplexity: 32.8083
Epoch [1/3], Step [166/3236], Loss: 3.4906, Perplexity: 32.8065
Epoch [1/3], Step [167/3236], Loss: 3.4086, Perplexity: 30.2220
Epoch [1/3], Step [168/3236], Loss: 3.4886, Perplexity: 32.7411
Epoch [1/3], Step [169/3236], Loss: 3.3808, Perplexity: 29.3954
Epoch [1/3], Step [170/3236], Loss: 3.3055, Perplexity: 27.2611
Epoch [1/3], Step [171/3236], Loss: 3.2514, Perplexity: 25.8255
Epoch [1/3], Step [172/3236], Loss: 3.3501, Perplexity: 28.5058
Epoch [1/3], Step [173/3236], Loss: 3.3822, Perplexity: 29.4348
Epoch [1/3], Step [174/3236], Loss: 4.1159, Perplexity: 61.3099
Epoch [1/3], Step [175/3236], Loss: 3.4125, Perplexity: 30.3399
Epoch [1/3], Step [176/3236], Loss: 3.3490, Perplexity: 28.4742
Epoch [1/3], Step [177/3236], Loss: 3.3540, Perplexity: 28.6159
Epoch [1/3], Step [178/3236], Loss: 3.6407, Perplexity: 38.1185
Epoch [1/3], Step [179/3236], Loss: 3.3179, Perplexity: 27.6011
Epoch [1/3], Step [180/3236], Loss: 3.3157, Perplexity: 27.5413
Epoch [1/3], Step [181/3236], Loss: 3.2996, Perplexity: 27.1029
Epoch [1/3], Step [182/3236], Loss: 3.4796, Perplexity: 32.4470
Epoch [1/3], Step [183/3236], Loss: 3.3324, Perplexity: 28.0042
Epoch [1/3], Step [184/3236], Loss: 3.4960, Perplexity: 32.9818
Epoch [1/3], Step [185/3236], Loss: 3.3090, Perplexity: 27.3591
Epoch [1/3], Step [186/3236], Loss: 3.4468, Perplexity: 31.4003
Epoch [1/3], Step [187/3236], Loss: 3.2841, Perplexity: 26.6845
Epoch [1/3], Step [188/3236], Loss: 3.4174, Perplexity: 30.4901
Epoch [1/3], Step [189/3236], Loss: 3.4838, Perplexity: 32.5832
Epoch [1/3], Step [190/3236], Loss: 3.3666, Perplexity: 28.9794
Epoch [1/3], Step [191/3236], Loss: 3.2978, Perplexity: 27.0543
Epoch [1/3], Step [192/3236], Loss: 3.2233, Perplexity: 25.1102
Epoch [1/3], Step [193/3236], Loss: 3.0727, Perplexity: 21.6001
Epoch [1/3], Step [194/3236], Loss: 3.3099, Perplexity: 27.3828
Epoch [1/3], Step [195/3236], Loss: 3.0943, Perplexity: 22.0720
Epoch [1/3], Step [196/3236], Loss: 3.1997, Perplexity: 24.5248
Epoch [1/3], Step [197/3236], Loss: 3.4410, Perplexity: 31.2185
Epoch [1/3], Step [198/3236], Loss: 3.3911, Perplexity: 29.6996
Epoch [1/3], Step [199/3236], Loss: 3.1392, Perplexity: 23.0844
Epoch [1/3], Step [200/3236], Loss: 3.7585, Perplexity: 42.8825
Epoch [1/3], Step [201/3236], Loss: 3.3610, Perplexity: 28.8173
Epoch [1/3], Step [202/3236], Loss: 3.2951, Perplexity: 26.9812
Epoch [1/3], Step [203/3236], Loss: 3.3879, Perplexity: 29.6038
Epoch [1/3], Step [204/3236], Loss: 3.3319, Perplexity: 27.9911
Epoch [1/3], Step [205/3236], Loss: 3.1655, Perplexity: 23.7009
Epoch [1/3], Step [206/3236], Loss: 3.2551, Perplexity: 25.9230
Epoch [1/3], Step [207/3236], Loss: 3.1428, Perplexity: 23.1677
Epoch [1/3], Step [208/3236], Loss: 3.3312, Perplexity: 27.9710
Epoch [1/3], Step [209/3236], Loss: 3.4937, Perplexity: 32.9085
Epoch [1/3], Step [210/3236], Loss: 3.4944, Perplexity: 32.9306
Epoch [1/3], Step [211/3236], Loss: 3.2989, Perplexity: 27.0830
Epoch [1/3], Step [212/3236], Loss: 3.4179, Perplexity: 30.5057
Epoch [1/3], Step [213/3236], Loss: 3.6781, Perplexity: 39.5724
Epoch [1/3], Step [214/3236], Loss: 3.3263, Perplexity: 27.8342
Epoch [1/3], Step [215/3236], Loss: 3.2905, Perplexity: 26.8554
Epoch [1/3], Step [216/3236], Loss: 3.4089, Perplexity: 30.2317
Epoch [1/3], Step [217/3236], Loss: 3.3395, Perplexity: 28.2039
Epoch [1/3], Step [218/3236], Loss: 3.2313, Perplexity: 25.3116
Epoch [1/3], Step [219/3236], Loss: 3.1777, Perplexity: 23.9925
Epoch [1/3], Step [220/3236], Loss: 3.1632, Perplexity: 23.6465
Epoch [1/3], Step [221/3236], Loss: 3.0751, Perplexity: 21.6528
Epoch [1/3], Step [222/3236], Loss: 3.7242, Perplexity: 41.4368
Epoch [1/3], Step [223/3236], Loss: 4.1508, Perplexity: 63.4824
Epoch [1/3], Step [224/3236], Loss: 3.2952, Perplexity: 26.9832
Epoch [1/3], Step [225/3236], Loss: 3.5485, Perplexity: 34.7595
Epoch [1/3], Step [226/3236], Loss: 3.2490, Perplexity: 25.7658
Epoch [1/3], Step [227/3236], Loss: 3.5428, Perplexity: 34.5641
Epoch [1/3], Step [228/3236], Loss: 3.3783, Perplexity: 29.3221
Epoch [1/3], Step [229/3236], Loss: 3.2614, Perplexity: 26.0868
Epoch [1/3], Step [230/3236], Loss: 3.4698, Perplexity: 32.1302
Epoch [1/3], Step [231/3236], Loss: 3.6791, Perplexity: 39.6110
Epoch [1/3], Step [232/3236], Loss: 3.1473, Perplexity: 23.2721
Epoch [1/3], Step [233/3236], Loss: 3.1748, Perplexity: 23.9219
Epoch [1/3], Step [234/3236], Loss: 3.5870, Perplexity: 36.1245
Epoch [1/3], Step [235/3236], Loss: 3.5142, Perplexity: 33.5900
Epoch [1/3], Step [236/3236], Loss: 3.2949, Perplexity: 26.9739
Epoch [1/3], Step [237/3236], Loss: 3.2351, Perplexity: 25.4078
Epoch [1/3], Step [238/3236], Loss: 3.3649, Perplexity: 28.9310
Epoch [1/3], Step [239/3236], Loss: 3.0343, Perplexity: 20.7874
Epoch [1/3], Step [240/3236], Loss: 3.2069, Perplexity: 24.7023
Epoch [1/3], Step [241/3236], Loss: 3.0801, Perplexity: 21.7598
Epoch [1/3], Step [242/3236], Loss: 3.2382, Perplexity: 25.4879
Epoch [1/3], Step [243/3236], Loss: 3.2178, Perplexity: 24.9728
Epoch [1/3], Step [244/3236], Loss: 3.2462, Perplexity: 25.6936
Epoch [1/3], Step [245/3236], Loss: 3.2456, Perplexity: 25.6769
Epoch [1/3], Step [246/3236], Loss: 3.3593, Perplexity: 28.7680
Epoch [1/3], Step [247/3236], Loss: 3.1375, Perplexity: 23.0461
Epoch [1/3], Step [248/3236], Loss: 3.3912, Perplexity: 29.7009
Epoch [1/3], Step [249/3236], Loss: 3.1979, Perplexity: 24.4809
Epoch [1/3], Step [250/3236], Loss: 3.2226, Perplexity: 25.0943
Epoch [1/3], Step [251/3236], Loss: 3.5747, Perplexity: 35.6832
Epoch [1/3], Step [252/3236], Loss: 3.7123, Perplexity: 40.9463
Epoch [1/3], Step [253/3236], Loss: 3.4739, Perplexity: 32.2639
Epoch [1/3], Step [254/3236], Loss: 3.7615, Perplexity: 43.0120
Epoch [1/3], Step [255/3236], Loss: 3.3393, Perplexity: 28.1986
Epoch [1/3], Step [256/3236], Loss: 3.5749, Perplexity: 35.6906
Epoch [1/3], Step [257/3236], Loss: 3.2459, Perplexity: 25.6855
Epoch [1/3], Step [258/3236], Loss: 3.2952, Perplexity: 26.9833
Epoch [1/3], Step [259/3236], Loss: 3.7229, Perplexity: 41.3863
Epoch [1/3], Step [260/3236], Loss: 3.2529, Perplexity: 25.8652
Epoch [1/3], Step [261/3236], Loss: 3.5747, Perplexity: 35.6831
Epoch [1/3], Step [262/3236], Loss: 3.1688, Perplexity: 23.7788
Epoch [1/3], Step [263/3236], Loss: 3.1143, Perplexity: 22.5186
Epoch [1/3], Step [264/3236], Loss: 3.2080, Perplexity: 24.7284
Epoch [1/3], Step [265/3236], Loss: 3.4318, Perplexity: 30.9321
Epoch [1/3], Step [266/3236], Loss: 3.5708, Perplexity: 35.5452
Epoch [1/3], Step [267/3236], Loss: 3.2859, Perplexity: 26.7322
Epoch [1/3], Step [268/3236], Loss: 3.2517, Perplexity: 25.8332
Epoch [1/3], Step [269/3236], Loss: 3.6493, Perplexity: 38.4495
Epoch [1/3], Step [270/3236], Loss: 3.5678, Perplexity: 35.4371
Epoch [1/3], Step [271/3236], Loss: 3.2996, Perplexity: 27.1027
Epoch [1/3], Step [272/3236], Loss: 3.1371, Perplexity: 23.0372
Epoch [1/3], Step [273/3236], Loss: 3.1461, Perplexity: 23.2461
Epoch [1/3], Step [274/3236], Loss: 3.4051, Perplexity: 30.1183
Epoch [1/3], Step [275/3236], Loss: 3.5045, Perplexity: 33.2651
Epoch [1/3], Step [276/3236], Loss: 3.0551, Perplexity: 21.2237
Epoch [1/3], Step [277/3236], Loss: 3.3436, Perplexity: 28.3195
Epoch [1/3], Step [278/3236], Loss: 3.2918, Perplexity: 26.8913
Epoch [1/3], Step [279/3236], Loss: 3.3862, Perplexity: 29.5528
Epoch [1/3], Step [280/3236], Loss: 3.0112, Perplexity: 20.3108
Epoch [1/3], Step [281/3236], Loss: 3.0217, Perplexity: 20.5272
Epoch [1/3], Step [282/3236], Loss: 3.1801, Perplexity: 24.0496
Epoch [1/3], Step [283/3236], Loss: 3.1250, Perplexity: 22.7606
Epoch [1/3], Step [284/3236], Loss: 3.0029, Perplexity: 20.1434
Epoch [1/3], Step [285/3236], Loss: 3.1478, Perplexity: 23.2841
Epoch [1/3], Step [286/3236], Loss: 3.2745, Perplexity: 26.4298
Epoch [1/3], Step [287/3236], Loss: 3.8731, Perplexity: 48.0901
Epoch [1/3], Step [288/3236], Loss: 3.1638, Perplexity: 23.6609
Epoch [1/3], Step [289/3236], Loss: 3.1463, Perplexity: 23.2499
Epoch [1/3], Step [290/3236], Loss: 3.6858, Perplexity: 39.8762
Epoch [1/3], Step [291/3236], Loss: 3.0368, Perplexity: 20.8394
Epoch [1/3], Step [292/3236], Loss: 2.9961, Perplexity: 20.0082
Epoch [1/3], Step [293/3236], Loss: 3.3210, Perplexity: 27.6878
Epoch [1/3], Step [294/3236], Loss: 2.9860, Perplexity: 19.8063
Epoch [1/3], Step [295/3236], Loss: 3.3013, Perplexity: 27.1488
Epoch [1/3], Step [296/3236], Loss: 3.5152, Perplexity: 33.6233
Epoch [1/3], Step [297/3236], Loss: 3.1960, Perplexity: 24.4352
Epoch [1/3], Step [298/3236], Loss: 3.1684, Perplexity: 23.7691
Epoch [1/3], Step [299/3236], Loss: 3.0265, Perplexity: 20.6249
Epoch [1/3], Step [300/3236], Loss: 3.1383, Perplexity: 23.0641
Epoch [1/3], Step [301/3236], Loss: 3.1901, Perplexity: 24.2900
Epoch [1/3], Step [302/3236], Loss: 3.2916, Perplexity: 26.8848
Epoch [1/3], Step [303/3236], Loss: 3.0142, Perplexity: 20.3720
Epoch [1/3], Step [304/3236], Loss: 3.2636, Perplexity: 26.1440
Epoch [1/3], Step [305/3236], Loss: 3.0584, Perplexity: 21.2928
Epoch [1/3], Step [306/3236], Loss: 3.1878, Perplexity: 24.2350
Epoch [1/3], Step [307/3236], Loss: 3.0352, Perplexity: 20.8052
Epoch [1/3], Step [308/3236], Loss: 3.0374, Perplexity: 20.8516
Epoch [1/3], Step [309/3236], Loss: 3.1858, Perplexity: 24.1857
Epoch [1/3], Step [310/3236], Loss: 3.2882, Perplexity: 26.7934
Epoch [1/3], Step [311/3236], Loss: 3.2831, Perplexity: 26.6584
Epoch [1/3], Step [312/3236], Loss: 3.3208, Perplexity: 27.6812
Epoch [1/3], Step [313/3236], Loss: 2.9979, Perplexity: 20.0425
Epoch [1/3], Step [314/3236], Loss: 3.0175, Perplexity: 20.4400
Epoch [1/3], Step [315/3236], Loss: 3.3333, Perplexity: 28.0299
Epoch [1/3], Step [316/3236], Loss: 3.0279, Perplexity: 20.6541
Epoch [1/3], Step [317/3236], Loss: 3.2319, Perplexity: 25.3287
Epoch [1/3], Step [318/3236], Loss: 3.1310, Perplexity: 22.8979
Epoch [1/3], Step [319/3236], Loss: 3.2080, Perplexity: 24.7296
Epoch [1/3], Step [320/3236], Loss: 3.4465, Perplexity: 31.3896
Epoch [1/3], Step [321/3236], Loss: 3.1032, Perplexity: 22.2690
Epoch [1/3], Step [322/3236], Loss: 3.0729, Perplexity: 21.6046
Epoch [1/3], Step [323/3236], Loss: 3.4212, Perplexity: 30.6051
Epoch [1/3], Step [324/3236], Loss: 3.0973, Perplexity: 22.1373
Epoch [1/3], Step [325/3236], Loss: 3.1472, Perplexity: 23.2717
Epoch [1/3], Step [326/3236], Loss: 3.3027, Perplexity: 27.1860
Epoch [1/3], Step [327/3236], Loss: 3.0766, Perplexity: 21.6845
Epoch [1/3], Step [328/3236], Loss: 3.0070, Perplexity: 20.2270
Epoch [1/3], Step [329/3236], Loss: 3.4480, Perplexity: 31.4373
Epoch [1/3], Step [330/3236], Loss: 3.0739, Perplexity: 21.6256
Epoch [1/3], Step [331/3236], Loss: 3.2253, Perplexity: 25.1619
Epoch [1/3], Step [332/3236], Loss: 3.0618, Perplexity: 21.3669
Epoch [1/3], Step [333/3236], Loss: 3.0310, Perplexity: 20.7178
Epoch [1/3], Step [334/3236], Loss: 2.9880, Perplexity: 19.8453
Epoch [1/3], Step [335/3236], Loss: 3.0508, Perplexity: 21.1332
Epoch [1/3], Step [336/3236], Loss: 3.1165, Perplexity: 22.5676
Epoch [1/3], Step [337/3236], Loss: 3.0337, Perplexity: 20.7744
Epoch [1/3], Step [338/3236], Loss: 3.2109, Perplexity: 24.8015
Epoch [1/3], Step [339/3236], Loss: 3.2828, Perplexity: 26.6491
Epoch [1/3], Step [340/3236], Loss: 3.0814, Perplexity: 21.7880
Epoch [1/3], Step [341/3236], Loss: 3.1557, Perplexity: 23.4689
Epoch [1/3], Step [342/3236], Loss: 3.2115, Perplexity: 24.8171
Epoch [1/3], Step [343/3236], Loss: 3.3854, Perplexity: 29.5295
Epoch [1/3], Step [344/3236], Loss: 3.0983, Perplexity: 22.1610
Epoch [1/3], Step [345/3236], Loss: 3.0879, Perplexity: 21.9300
Epoch [1/3], Step [346/3236], Loss: 3.1024, Perplexity: 22.2510
Epoch [1/3], Step [347/3236], Loss: 3.3569, Perplexity: 28.6996
Epoch [1/3], Step [348/3236], Loss: 3.1030, Perplexity: 22.2642
Epoch [1/3], Step [349/3236], Loss: 3.0741, Perplexity: 21.6295
Epoch [1/3], Step [350/3236], Loss: 3.0603, Perplexity: 21.3350
Epoch [1/3], Step [351/3236], Loss: 3.3460, Perplexity: 28.3899
Epoch [1/3], Step [352/3236], Loss: 3.2692, Perplexity: 26.2906
Epoch [1/3], Step [353/3236], Loss: 3.1419, Perplexity: 23.1489
Epoch [1/3], Step [354/3236], Loss: 2.9523, Perplexity: 19.1509
Epoch [1/3], Step [355/3236], Loss: 3.2393, Perplexity: 25.5169
Epoch [1/3], Step [356/3236], Loss: 3.1584, Perplexity: 23.5338
Epoch [1/3], Step [357/3236], Loss: 2.9010, Perplexity: 18.1924
Epoch [1/3], Step [358/3236], Loss: 3.3666, Perplexity: 28.9796
Epoch [1/3], Step [359/3236], Loss: 3.1765, Perplexity: 23.9634
Epoch [1/3], Step [360/3236], Loss: 2.9918, Perplexity: 19.9224
Epoch [1/3], Step [361/3236], Loss: 3.3553, Perplexity: 28.6535
Epoch [1/3], Step [362/3236], Loss: 3.0088, Perplexity: 20.2632
Epoch [1/3], Step [363/3236], Loss: 2.9551, Perplexity: 19.2042
Epoch [1/3], Step [364/3236], Loss: 3.0139, Perplexity: 20.3672
Epoch [1/3], Step [365/3236], Loss: 3.1363, Perplexity: 23.0179
Epoch [1/3], Step [366/3236], Loss: 2.9792, Perplexity: 19.6730
Epoch [1/3], Step [367/3236], Loss: 3.1436, Perplexity: 23.1882
Epoch [1/3], Step [368/3236], Loss: 3.0488, Perplexity: 21.0893
Epoch [1/3], Step [369/3236], Loss: 3.8481, Perplexity: 46.9023
Epoch [1/3], Step [370/3236], Loss: 3.3264, Perplexity: 27.8381
Epoch [1/3], Step [371/3236], Loss: 2.9632, Perplexity: 19.3596
Epoch [1/3], Step [372/3236], Loss: 3.0369, Perplexity: 20.8399
Epoch [1/3], Step [373/3236], Loss: 2.9389, Perplexity: 18.8945
Epoch [1/3], Step [374/3236], Loss: 2.9014, Perplexity: 18.1992
Epoch [1/3], Step [375/3236], Loss: 3.0159, Perplexity: 20.4079
Epoch [1/3], Step [376/3236], Loss: 2.9600, Perplexity: 19.2981
Epoch [1/3], Step [377/3236], Loss: 3.3868, Perplexity: 29.5717
Epoch [1/3], Step [378/3236], Loss: 3.1582, Perplexity: 23.5292
Epoch [1/3], Step [379/3236], Loss: 3.0747, Perplexity: 21.6431
Epoch [1/3], Step [380/3236], Loss: 2.9812, Perplexity: 19.7109
Epoch [1/3], Step [381/3236], Loss: 3.2470, Perplexity: 25.7132
Epoch [1/3], Step [382/3236], Loss: 3.3611, Perplexity: 28.8206
Epoch [1/3], Step [383/3236], Loss: 3.0355, Perplexity: 20.8120
Epoch [1/3], Step [384/3236], Loss: 2.8746, Perplexity: 17.7188
Epoch [1/3], Step [385/3236], Loss: 3.2220, Perplexity: 25.0786
Epoch [1/3], Step [386/3236], Loss: 3.0340, Perplexity: 20.7794
Epoch [1/3], Step [387/3236], Loss: 3.0522, Perplexity: 21.1627
Epoch [1/3], Step [388/3236], Loss: 2.9192, Perplexity: 18.5262
Epoch [1/3], Step [389/3236], Loss: 3.1155, Perplexity: 22.5450
Epoch [1/3], Step [390/3236], Loss: 3.0453, Perplexity: 21.0156
Epoch [1/3], Step [391/3236], Loss: 3.0065, Perplexity: 20.2171
Epoch [1/3], Step [392/3236], Loss: 3.0652, Perplexity: 21.4398
Epoch [1/3], Step [393/3236], Loss: 3.0920, Perplexity: 22.0200
Epoch [1/3], Step [394/3236], Loss: 2.9819, Perplexity: 19.7248
Epoch [1/3], Step [395/3236], Loss: 3.1263, Perplexity: 22.7901
Epoch [1/3], Step [396/3236], Loss: 3.0850, Perplexity: 21.8685
Epoch [1/3], Step [397/3236], Loss: 2.8682, Perplexity: 17.6058
Epoch [1/3], Step [398/3236], Loss: 3.4055, Perplexity: 30.1292
Epoch [1/3], Step [399/3236], Loss: 2.9360, Perplexity: 18.8407
Epoch [1/3], Step [400/3236], Loss: 3.1200, Perplexity: 22.6454
Epoch [1/3], Step [401/3236], Loss: 3.1899, Perplexity: 24.2852
Epoch [1/3], Step [402/3236], Loss: 3.4290, Perplexity: 30.8468
Epoch [1/3], Step [403/3236], Loss: 3.3198, Perplexity: 27.6553
Epoch [1/3], Step [404/3236], Loss: 2.9428, Perplexity: 18.9686
Epoch [1/3], Step [405/3236], Loss: 2.9419, Perplexity: 18.9517
Epoch [1/3], Step [406/3236], Loss: 3.0897, Perplexity: 21.9699
Epoch [1/3], Step [407/3236], Loss: 3.0343, Perplexity: 20.7873
Epoch [1/3], Step [408/3236], Loss: 3.2243, Perplexity: 25.1357
Epoch [1/3], Step [409/3236], Loss: 2.9230, Perplexity: 18.5975
Epoch [1/3], Step [410/3236], Loss: 3.1991, Perplexity: 24.5097
Epoch [1/3], Step [411/3236], Loss: 3.2012, Perplexity: 24.5625
Epoch [1/3], Step [412/3236], Loss: 3.1947, Perplexity: 24.4041
Epoch [1/3], Step [413/3236], Loss: 2.7936, Perplexity: 16.3398
Epoch [1/3], Step [414/3236], Loss: 2.9769, Perplexity: 19.6271
Epoch [1/3], Step [415/3236], Loss: 2.9432, Perplexity: 18.9757
Epoch [1/3], Step [416/3236], Loss: 3.1930, Perplexity: 24.3619
Epoch [1/3], Step [417/3236], Loss: 2.9422, Perplexity: 18.9578
Epoch [1/3], Step [418/3236], Loss: 3.1084, Perplexity: 22.3858
Epoch [1/3], Step [419/3236], Loss: 3.2239, Perplexity: 25.1266
Epoch [1/3], Step [420/3236], Loss: 3.2174, Perplexity: 24.9631
Epoch [1/3], Step [421/3236], Loss: 2.9427, Perplexity: 18.9667
Epoch [1/3], Step [422/3236], Loss: 3.0553, Perplexity: 21.2277
Epoch [1/3], Step [423/3236], Loss: 2.9832, Perplexity: 19.7518
Epoch [1/3], Step [424/3236], Loss: 3.2282, Perplexity: 25.2339
Epoch [1/3], Step [425/3236], Loss: 3.0786, Perplexity: 21.7282
Epoch [1/3], Step [426/3236], Loss: 2.9504, Perplexity: 19.1127
Epoch [1/3], Step [427/3236], Loss: 4.1422, Perplexity: 62.9430
Epoch [1/3], Step [428/3236], Loss: 2.9400, Perplexity: 18.9158
Epoch [1/3], Step [429/3236], Loss: 2.8457, Perplexity: 17.2144
Epoch [1/3], Step [430/3236], Loss: 3.0712, Perplexity: 21.5668
Epoch [1/3], Step [431/3236], Loss: 3.8067, Perplexity: 45.0033
Epoch [1/3], Step [432/3236], Loss: 3.1168, Perplexity: 22.5749
Epoch [1/3], Step [433/3236], Loss: 3.1241, Perplexity: 22.7391
Epoch [1/3], Step [434/3236], Loss: 2.9198, Perplexity: 18.5367
Epoch [1/3], Step [435/3236], Loss: 3.0852, Perplexity: 21.8728
Epoch [1/3], Step [436/3236], Loss: 3.2144, Perplexity: 24.8874
Epoch [1/3], Step [437/3236], Loss: 2.9680, Perplexity: 19.4522
Epoch [1/3], Step [438/3236], Loss: 2.9491, Perplexity: 19.0885
Epoch [1/3], Step [439/3236], Loss: 2.9125, Perplexity: 18.4037
Epoch [1/3], Step [440/3236], Loss: 2.8369, Perplexity: 17.0630
Epoch [1/3], Step [441/3236], Loss: 2.9157, Perplexity: 18.4613
Epoch [1/3], Step [442/3236], Loss: 2.9462, Perplexity: 19.0331
Epoch [1/3], Step [443/3236], Loss: 2.9600, Perplexity: 19.2989
Epoch [1/3], Step [444/3236], Loss: 2.9339, Perplexity: 18.8001
Epoch [1/3], Step [445/3236], Loss: 3.1580, Perplexity: 23.5232
Epoch [1/3], Step [446/3236], Loss: 3.0188, Perplexity: 20.4660
Epoch [1/3], Step [447/3236], Loss: 3.1000, Perplexity: 22.1977
Epoch [1/3], Step [448/3236], Loss: 2.8962, Perplexity: 18.1059
Epoch [1/3], Step [449/3236], Loss: 2.9717, Perplexity: 19.5246
Epoch [1/3], Step [450/3236], Loss: 2.9626, Perplexity: 19.3485
Epoch [1/3], Step [451/3236], Loss: 3.2430, Perplexity: 25.6108
Epoch [1/3], Step [452/3236], Loss: 2.9856, Perplexity: 19.7989
Epoch [1/3], Step [453/3236], Loss: 2.8583, Perplexity: 17.4322
Epoch [1/3], Step [454/3236], Loss: 3.2909, Perplexity: 26.8679
Epoch [1/3], Step [455/3236], Loss: 2.7695, Perplexity: 15.9513
Epoch [1/3], Step [456/3236], Loss: 3.0111, Perplexity: 20.3096
Epoch [1/3], Step [457/3236], Loss: 3.0631, Perplexity: 21.3941
Epoch [1/3], Step [458/3236], Loss: 3.1881, Perplexity: 24.2429
Epoch [1/3], Step [459/3236], Loss: 2.8546, Perplexity: 17.3667
Epoch [1/3], Step [460/3236], Loss: 3.0431, Perplexity: 20.9708
Epoch [1/3], Step [461/3236], Loss: 2.9529, Perplexity: 19.1621
Epoch [1/3], Step [462/3236], Loss: 3.6106, Perplexity: 36.9877
Epoch [1/3], Step [463/3236], Loss: 3.1658, Perplexity: 23.7083
Epoch [1/3], Step [464/3236], Loss: 3.1325, Perplexity: 22.9302
Epoch [1/3], Step [465/3236], Loss: 3.8297, Perplexity: 46.0481
Epoch [1/3], Step [466/3236], Loss: 3.0591, Perplexity: 21.3079
Epoch [1/3], Step [467/3236], Loss: 3.0940, Perplexity: 22.0644
Epoch [1/3], Step [468/3236], Loss: 2.9678, Perplexity: 19.4496
Epoch [1/3], Step [469/3236], Loss: 3.3762, Perplexity: 29.2586
Epoch [1/3], Step [470/3236], Loss: 2.9715, Perplexity: 19.5204
Epoch [1/3], Step [471/3236], Loss: 2.9595, Perplexity: 19.2886
Epoch [1/3], Step [472/3236], Loss: 2.8852, Perplexity: 17.9075
Epoch [1/3], Step [473/3236], Loss: 2.8206, Perplexity: 16.7866
Epoch [1/3], Step [474/3236], Loss: 2.9862, Perplexity: 19.8096
Epoch [1/3], Step [475/3236], Loss: 2.9346, Perplexity: 18.8140
Epoch [1/3], Step [476/3236], Loss: 3.0224, Perplexity: 20.5407
Epoch [1/3], Step [477/3236], Loss: 2.9017, Perplexity: 18.2054
Epoch [1/3], Step [478/3236], Loss: 3.0333, Perplexity: 20.7658
Epoch [1/3], Step [479/3236], Loss: 3.0258, Perplexity: 20.6113
Epoch [1/3], Step [480/3236], Loss: 3.1514, Perplexity: 23.3680
Epoch [1/3], Step [481/3236], Loss: 3.1423, Perplexity: 23.1576
Epoch [1/3], Step [482/3236], Loss: 2.9372, Perplexity: 18.8626
Epoch [1/3], Step [483/3236], Loss: 3.1674, Perplexity: 23.7462
Epoch [1/3], Step [484/3236], Loss: 2.9438, Perplexity: 18.9880
Epoch [1/3], Step [485/3236], Loss: 2.9119, Perplexity: 18.3909
Epoch [1/3], Step [486/3236], Loss: 2.7630, Perplexity: 15.8470
Epoch [1/3], Step [487/3236], Loss: 2.8559, Perplexity: 17.3895
Epoch [1/3], Step [488/3236], Loss: 2.9929, Perplexity: 19.9439
Epoch [1/3], Step [489/3236], Loss: 2.9376, Perplexity: 18.8714
Epoch [1/3], Step [490/3236], Loss: 2.8809, Perplexity: 17.8308
Epoch [1/3], Step [491/3236], Loss: 2.8634, Perplexity: 17.5214
Epoch [1/3], Step [492/3236], Loss: 3.0322, Perplexity: 20.7436
Epoch [1/3], Step [493/3236], Loss: 2.9449, Perplexity: 19.0092
Epoch [1/3], Step [494/3236], Loss: 3.1134, Perplexity: 22.4973
Epoch [1/3], Step [495/3236], Loss: 2.9713, Perplexity: 19.5180
Epoch [1/3], Step [496/3236], Loss: 3.0986, Perplexity: 22.1669
Epoch [1/3], Step [497/3236], Loss: 3.4056, Perplexity: 30.1313
Epoch [1/3], Step [498/3236], Loss: 2.9049, Perplexity: 18.2636
Epoch [1/3], Step [499/3236], Loss: 2.9014, Perplexity: 18.1992
Epoch [1/3], Step [500/3236], Loss: 2.9789, Perplexity: 19.6669
Epoch [1/3], Step [501/3236], Loss: 2.9451, Perplexity: 19.0123
Epoch [1/3], Step [502/3236], Loss: 3.4770, Perplexity: 32.3639
Epoch [1/3], Step [503/3236], Loss: 2.9588, Perplexity: 19.2752
Epoch [1/3], Step [504/3236], Loss: 3.0168, Perplexity: 20.4262
Epoch [1/3], Step [505/3236], Loss: 3.0647, Perplexity: 21.4286
Epoch [1/3], Step [506/3236], Loss: 2.9125, Perplexity: 18.4019
Epoch [1/3], Step [507/3236], Loss: 2.8754, Perplexity: 17.7333
Epoch [1/3], Step [508/3236], Loss: 2.8242, Perplexity: 16.8478
Epoch [1/3], Step [509/3236], Loss: 2.9515, Perplexity: 19.1354
Epoch [1/3], Step [510/3236], Loss: 2.9760, Perplexity: 19.6099
Epoch [1/3], Step [511/3236], Loss: 3.6127, Perplexity: 37.0665
Epoch [1/3], Step [512/3236], Loss: 2.8398, Perplexity: 17.1132
Epoch [1/3], Step [513/3236], Loss: 2.8128, Perplexity: 16.6567
Epoch [1/3], Step [514/3236], Loss: 2.7788, Perplexity: 16.0993
Epoch [1/3], Step [515/3236], Loss: 3.1133, Perplexity: 22.4944
Epoch [1/3], Step [516/3236], Loss: 2.7861, Perplexity: 16.2182
Epoch [1/3], Step [517/3236], Loss: 3.2022, Perplexity: 24.5859
Epoch [1/3], Step [518/3236], Loss: 2.9263, Perplexity: 18.6582
Epoch [1/3], Step [519/3236], Loss: 2.9552, Perplexity: 19.2065
Epoch [1/3], Step [520/3236], Loss: 2.9828, Perplexity: 19.7431
Epoch [1/3], Step [521/3236], Loss: 2.9790, Perplexity: 19.6679
Epoch [1/3], Step [522/3236], Loss: 2.9532, Perplexity: 19.1679
Epoch [1/3], Step [523/3236], Loss: 3.1885, Perplexity: 24.2515
Epoch [1/3], Step [524/3236], Loss: 3.1374, Perplexity: 23.0444
Epoch [1/3], Step [525/3236], Loss: 3.0430, Perplexity: 20.9689
Epoch [1/3], Step [526/3236], Loss: 3.0256, Perplexity: 20.6055
Epoch [1/3], Step [527/3236], Loss: 2.9626, Perplexity: 19.3473
Epoch [1/3], Step [528/3236], Loss: 3.1408, Perplexity: 23.1217
Epoch [1/3], Step [529/3236], Loss: 2.8169, Perplexity: 16.7256
Epoch [1/3], Step [530/3236], Loss: 2.8507, Perplexity: 17.3007
Epoch [1/3], Step [531/3236], Loss: 2.9760, Perplexity: 19.6092
Epoch [1/3], Step [532/3236], Loss: 2.8950, Perplexity: 18.0838
Epoch [1/3], Step [533/3236], Loss: 2.9679, Perplexity: 19.4505
Epoch [1/3], Step [534/3236], Loss: 2.7712, Perplexity: 15.9773
Epoch [1/3], Step [535/3236], Loss: 2.9022, Perplexity: 18.2148
Epoch [1/3], Step [536/3236], Loss: 2.7049, Perplexity: 14.9527
Epoch [1/3], Step [537/3236], Loss: 2.9829, Perplexity: 19.7441
Epoch [1/3], Step [538/3236], Loss: 2.8685, Perplexity: 17.6108
Epoch [1/3], Step [539/3236], Loss: 2.8579, Perplexity: 17.4255
Epoch [1/3], Step [540/3236], Loss: 3.1253, Perplexity: 22.7678
Epoch [1/3], Step [541/3236], Loss: 2.9660, Perplexity: 19.4148
Epoch [1/3], Step [542/3236], Loss: 2.7985, Perplexity: 16.4196
Epoch [1/3], Step [543/3236], Loss: 2.7745, Perplexity: 16.0311
Epoch [1/3], Step [544/3236], Loss: 2.8539, Perplexity: 17.3561
Epoch [1/3], Step [545/3236], Loss: 2.9255, Perplexity: 18.6443
Epoch [1/3], Step [546/3236], Loss: 2.8478, Perplexity: 17.2499
Epoch [1/3], Step [547/3236], Loss: 3.0729, Perplexity: 21.6049
Epoch [1/3], Step [548/3236], Loss: 2.9013, Perplexity: 18.1979
Epoch [1/3], Step [549/3236], Loss: 2.9804, Perplexity: 19.6965
Epoch [1/3], Step [550/3236], Loss: 3.0011, Perplexity: 20.1072
Epoch [1/3], Step [551/3236], Loss: 2.8549, Perplexity: 17.3727
Epoch [1/3], Step [552/3236], Loss: 2.9694, Perplexity: 19.4798
Epoch [1/3], Step [553/3236], Loss: 3.0777, Perplexity: 21.7093
Epoch [1/3], Step [554/3236], Loss: 3.5820, Perplexity: 35.9457
Epoch [1/3], Step [555/3236], Loss: 2.8246, Perplexity: 16.8534
Epoch [1/3], Step [556/3236], Loss: 2.9011, Perplexity: 18.1945
Epoch [1/3], Step [557/3236], Loss: 3.4687, Perplexity: 32.0964
Epoch [1/3], Step [558/3236], Loss: 2.8417, Perplexity: 17.1453
Epoch [1/3], Step [559/3236], Loss: 2.9530, Perplexity: 19.1643
Epoch [1/3], Step [560/3236], Loss: 2.8810, Perplexity: 17.8321
Epoch [1/3], Step [561/3236], Loss: 3.0613, Perplexity: 21.3547
Epoch [1/3], Step [562/3236], Loss: 2.7838, Perplexity: 16.1811
Epoch [1/3], Step [563/3236], Loss: 2.7290, Perplexity: 15.3183
Epoch [1/3], Step [564/3236], Loss: 2.8346, Perplexity: 17.0235
Epoch [1/3], Step [565/3236], Loss: 2.7870, Perplexity: 16.2315
Epoch [1/3], Step [566/3236], Loss: 2.7986, Perplexity: 16.4215
Epoch [1/3], Step [567/3236], Loss: 2.8193, Perplexity: 16.7651
Epoch [1/3], Step [568/3236], Loss: 2.9488, Perplexity: 19.0821
Epoch [1/3], Step [569/3236], Loss: 3.0015, Perplexity: 20.1161
Epoch [1/3], Step [570/3236], Loss: 2.8741, Perplexity: 17.7098
Epoch [1/3], Step [571/3236], Loss: 3.0036, Perplexity: 20.1576
Epoch [1/3], Step [572/3236], Loss: 2.9182, Perplexity: 18.5084
Epoch [1/3], Step [573/3236], Loss: 2.8490, Perplexity: 17.2711
Epoch [1/3], Step [574/3236], Loss: 3.0111, Perplexity: 20.3101
Epoch [1/3], Step [575/3236], Loss: 2.8168, Perplexity: 16.7226
Epoch [1/3], Step [576/3236], Loss: 2.8296, Perplexity: 16.9385
Epoch [1/3], Step [577/3236], Loss: 2.9570, Perplexity: 19.2409
Epoch [1/3], Step [578/3236], Loss: 3.2687, Perplexity: 26.2780
Epoch [1/3], Step [579/3236], Loss: 2.8329, Perplexity: 16.9949
Epoch [1/3], Step [580/3236], Loss: 2.9180, Perplexity: 18.5040
Epoch [1/3], Step [581/3236], Loss: 2.8452, Perplexity: 17.2054
Epoch [1/3], Step [582/3236], Loss: 2.7558, Perplexity: 15.7339
Epoch [1/3], Step [583/3236], Loss: 3.0102, Perplexity: 20.2921
Epoch [1/3], Step [584/3236], Loss: 2.9599, Perplexity: 19.2966
Epoch [1/3], Step [585/3236], Loss: 2.8363, Perplexity: 17.0522
Epoch [1/3], Step [586/3236], Loss: 3.3182, Perplexity: 27.6093
Epoch [1/3], Step [587/3236], Loss: 3.0249, Perplexity: 20.5919
Epoch [1/3], Step [588/3236], Loss: 2.7856, Perplexity: 16.2096
Epoch [1/3], Step [589/3236], Loss: 2.8343, Perplexity: 17.0181
Epoch [1/3], Step [590/3236], Loss: 3.0210, Perplexity: 20.5113
Epoch [1/3], Step [591/3236], Loss: 2.8469, Perplexity: 17.2337
Epoch [1/3], Step [592/3236], Loss: 2.8568, Perplexity: 17.4054
Epoch [1/3], Step [593/3236], Loss: 2.6991, Perplexity: 14.8663
Epoch [1/3], Step [594/3236], Loss: 2.7145, Perplexity: 15.0967
Epoch [1/3], Step [595/3236], Loss: 2.8115, Perplexity: 16.6347
Epoch [1/3], Step [596/3236], Loss: 2.9363, Perplexity: 18.8461
Epoch [1/3], Step [597/3236], Loss: 2.8978, Perplexity: 18.1338
Epoch [1/3], Step [598/3236], Loss: 2.8895, Perplexity: 17.9844
Epoch [1/3], Step [599/3236], Loss: 3.0631, Perplexity: 21.3932
Epoch [1/3], Step [600/3236], Loss: 2.8088, Perplexity: 16.5893
Epoch [1/3], Step [601/3236], Loss: 2.9092, Perplexity: 18.3414
Epoch [1/3], Step [602/3236], Loss: 2.7935, Perplexity: 16.3376
Epoch [1/3], Step [603/3236], Loss: 2.9425, Perplexity: 18.9636
Epoch [1/3], Step [604/3236], Loss: 3.3132, Perplexity: 27.4739
Epoch [1/3], Step [605/3236], Loss: 2.8864, Perplexity: 17.9279
Epoch [1/3], Step [606/3236], Loss: 2.7282, Perplexity: 15.3054
Epoch [1/3], Step [607/3236], Loss: 3.0291, Perplexity: 20.6793
Epoch [1/3], Step [608/3236], Loss: 2.7539, Perplexity: 15.7037
Epoch [1/3], Step [609/3236], Loss: 2.8209, Perplexity: 16.7917
Epoch [1/3], Step [610/3236], Loss: 2.8723, Perplexity: 17.6782
Epoch [1/3], Step [611/3236], Loss: 2.5926, Perplexity: 13.3641
Epoch [1/3], Step [612/3236], Loss: 2.9300, Perplexity: 18.7279
Epoch [1/3], Step [613/3236], Loss: 2.7753, Perplexity: 16.0439
Epoch [1/3], Step [614/3236], Loss: 3.0175, Perplexity: 20.4395
Epoch [1/3], Step [615/3236], Loss: 2.8931, Perplexity: 18.0491
Epoch [1/3], Step [616/3236], Loss: 3.5258, Perplexity: 33.9810
Epoch [1/3], Step [617/3236], Loss: 3.1581, Perplexity: 23.5262
Epoch [1/3], Step [618/3236], Loss: 3.1008, Perplexity: 22.2154
Epoch [1/3], Step [619/3236], Loss: 3.1647, Perplexity: 23.6820
Epoch [1/3], Step [620/3236], Loss: 2.8706, Perplexity: 17.6469
Epoch [1/3], Step [621/3236], Loss: 2.7760, Perplexity: 16.0549
Epoch [1/3], Step [622/3236], Loss: 2.6809, Perplexity: 14.5987
Epoch [1/3], Step [623/3236], Loss: 2.7458, Perplexity: 15.5767
Epoch [1/3], Step [624/3236], Loss: 2.7121, Perplexity: 15.0613
Epoch [1/3], Step [625/3236], Loss: 2.9957, Perplexity: 19.9999
Epoch [1/3], Step [626/3236], Loss: 2.9383, Perplexity: 18.8838
Epoch [1/3], Step [627/3236], Loss: 2.7867, Perplexity: 16.2275
Epoch [1/3], Step [628/3236], Loss: 3.2274, Perplexity: 25.2136
Epoch [1/3], Step [629/3236], Loss: 3.0471, Perplexity: 21.0552
Epoch [1/3], Step [630/3236], Loss: 2.7613, Perplexity: 15.8196
Epoch [1/3], Step [631/3236], Loss: 2.7945, Perplexity: 16.3537
Epoch [1/3], Step [632/3236], Loss: 2.9038, Perplexity: 18.2438
Epoch [1/3], Step [633/3236], Loss: 3.0814, Perplexity: 21.7885
Epoch [1/3], Step [634/3236], Loss: 2.8479, Perplexity: 17.2516
Epoch [1/3], Step [635/3236], Loss: 3.2048, Perplexity: 24.6499
Epoch [1/3], Step [636/3236], Loss: 2.9006, Perplexity: 18.1842
Epoch [1/3], Step [637/3236], Loss: 2.7985, Perplexity: 16.4203
Epoch [1/3], Step [638/3236], Loss: 3.0528, Perplexity: 21.1748
Epoch [1/3], Step [639/3236], Loss: 2.7948, Perplexity: 16.3592
Epoch [1/3], Step [640/3236], Loss: 2.7721, Perplexity: 15.9928
Epoch [1/3], Step [641/3236], Loss: 2.9219, Perplexity: 18.5756
Epoch [1/3], Step [642/3236], Loss: 2.8823, Perplexity: 17.8548
Epoch [1/3], Step [643/3236], Loss: 2.7951, Perplexity: 16.3647
Epoch [1/3], Step [644/3236], Loss: 2.8647, Perplexity: 17.5430
Epoch [1/3], Step [645/3236], Loss: 3.1640, Perplexity: 23.6655
Epoch [1/3], Step [646/3236], Loss: 2.7374, Perplexity: 15.4475
Epoch [1/3], Step [647/3236], Loss: 2.7194, Perplexity: 15.1705
Epoch [1/3], Step [648/3236], Loss: 3.0964, Perplexity: 22.1191
Epoch [1/3], Step [649/3236], Loss: 3.1859, Perplexity: 24.1883
Epoch [1/3], Step [650/3236], Loss: 2.8195, Perplexity: 16.7686
Epoch [1/3], Step [651/3236], Loss: 3.8035, Perplexity: 44.8560
Epoch [1/3], Step [652/3236], Loss: 2.9608, Perplexity: 19.3132
Epoch [1/3], Step [653/3236], Loss: 3.0882, Perplexity: 21.9366
Epoch [1/3], Step [654/3236], Loss: 2.8498, Perplexity: 17.2849
Epoch [1/3], Step [655/3236], Loss: 2.8403, Perplexity: 17.1216
Epoch [1/3], Step [656/3236], Loss: 3.0842, Perplexity: 21.8507
Epoch [1/3], Step [657/3236], Loss: 2.9683, Perplexity: 19.4586
Epoch [1/3], Step [658/3236], Loss: 2.8254, Perplexity: 16.8678
Epoch [1/3], Step [659/3236], Loss: 3.3747, Perplexity: 29.2158
Epoch [1/3], Step [660/3236], Loss: 2.8502, Perplexity: 17.2918
Epoch [1/3], Step [661/3236], Loss: 3.0981, Perplexity: 22.1563
Epoch [1/3], Step [662/3236], Loss: 2.8683, Perplexity: 17.6065
Epoch [1/3], Step [663/3236], Loss: 2.7228, Perplexity: 15.2235
Epoch [1/3], Step [664/3236], Loss: 3.0303, Perplexity: 20.7038
Epoch [1/3], Step [665/3236], Loss: 2.6558, Perplexity: 14.2365
Epoch [1/3], Step [666/3236], Loss: 3.1737, Perplexity: 23.8954
Epoch [1/3], Step [667/3236], Loss: 2.8337, Perplexity: 17.0077
Epoch [1/3], Step [668/3236], Loss: 2.8549, Perplexity: 17.3725
Epoch [1/3], Step [669/3236], Loss: 3.1476, Perplexity: 23.2800
Epoch [1/3], Step [670/3236], Loss: 2.7755, Perplexity: 16.0472
Epoch [1/3], Step [671/3236], Loss: 2.8339, Perplexity: 17.0125
Epoch [1/3], Step [672/3236], Loss: 2.8629, Perplexity: 17.5126
Epoch [1/3], Step [673/3236], Loss: 3.0349, Perplexity: 20.7987
Epoch [1/3], Step [674/3236], Loss: 2.9784, Perplexity: 19.6557
Epoch [1/3], Step [675/3236], Loss: 2.6848, Perplexity: 14.6547
Epoch [1/3], Step [676/3236], Loss: 2.7009, Perplexity: 14.8924
Epoch [1/3], Step [677/3236], Loss: 2.5986, Perplexity: 13.4456
Epoch [1/3], Step [678/3236], Loss: 2.6744, Perplexity: 14.5033
Epoch [1/3], Step [679/3236], Loss: 3.0705, Perplexity: 21.5528
Epoch [1/3], Step [680/3236], Loss: 2.8483, Perplexity: 17.2590
Epoch [1/3], Step [681/3236], Loss: 2.7863, Perplexity: 16.2207
Epoch [1/3], Step [682/3236], Loss: 3.1123, Perplexity: 22.4720
Epoch [1/3], Step [683/3236], Loss: 2.8819, Perplexity: 17.8489
Epoch [1/3], Step [684/3236], Loss: 3.0457, Perplexity: 21.0240
Epoch [1/3], Step [685/3236], Loss: 2.8384, Perplexity: 17.0884
Epoch [1/3], Step [686/3236], Loss: 3.6130, Perplexity: 37.0754
Epoch [1/3], Step [687/3236], Loss: 2.7960, Perplexity: 16.3789
Epoch [1/3], Step [688/3236], Loss: 2.7598, Perplexity: 15.7973
Epoch [1/3], Step [689/3236], Loss: 3.7557, Perplexity: 42.7658
Epoch [1/3], Step [690/3236], Loss: 2.7458, Perplexity: 15.5778
Epoch [1/3], Step [691/3236], Loss: 2.9440, Perplexity: 18.9921
Epoch [1/3], Step [692/3236], Loss: 3.6975, Perplexity: 40.3452
Epoch [1/3], Step [693/3236], Loss: 3.0652, Perplexity: 21.4386
Epoch [1/3], Step [694/3236], Loss: 2.8006, Perplexity: 16.4537
Epoch [1/3], Step [695/3236], Loss: 2.8008, Perplexity: 16.4578
Epoch [1/3], Step [696/3236], Loss: 2.8663, Perplexity: 17.5722
Epoch [1/3], Step [697/3236], Loss: 3.2754, Perplexity: 26.4543
Epoch [1/3], Step [698/3236], Loss: 2.9092, Perplexity: 18.3422
Epoch [1/3], Step [699/3236], Loss: 2.9488, Perplexity: 19.0824
Epoch [1/3], Step [700/3236], Loss: 2.8319, Perplexity: 16.9780
Epoch [1/3], Step [701/3236], Loss: 2.8094, Perplexity: 16.6003
Epoch [1/3], Step [702/3236], Loss: 2.7903, Perplexity: 16.2855
Epoch [1/3], Step [703/3236], Loss: 2.9839, Perplexity: 19.7648
Epoch [1/3], Step [704/3236], Loss: 2.7359, Perplexity: 15.4236
Epoch [1/3], Step [705/3236], Loss: 2.6292, Perplexity: 13.8626
Epoch [1/3], Step [706/3236], Loss: 2.7998, Perplexity: 16.4419
Epoch [1/3], Step [707/3236], Loss: 2.8055, Perplexity: 16.5357
Epoch [1/3], Step [708/3236], Loss: 2.6815, Perplexity: 14.6065
Epoch [1/3], Step [709/3236], Loss: 2.8676, Perplexity: 17.5942
Epoch [1/3], Step [710/3236], Loss: 3.0634, Perplexity: 21.3998
Epoch [1/3], Step [711/3236], Loss: 2.6399, Perplexity: 14.0114
Epoch [1/3], Step [712/3236], Loss: 2.7952, Perplexity: 16.3653
Epoch [1/3], Step [713/3236], Loss: 2.6623, Perplexity: 14.3290
Epoch [1/3], Step [714/3236], Loss: 2.8081, Perplexity: 16.5779
Epoch [1/3], Step [715/3236], Loss: 2.7150, Perplexity: 15.1044
Epoch [1/3], Step [716/3236], Loss: 2.8442, Perplexity: 17.1881
Epoch [1/3], Step [717/3236], Loss: 3.2265, Perplexity: 25.1903
Epoch [1/3], Step [718/3236], Loss: 2.8589, Perplexity: 17.4424
Epoch [1/3], Step [719/3236], Loss: 2.8629, Perplexity: 17.5114
Epoch [1/3], Step [720/3236], Loss: 2.6361, Perplexity: 13.9581
Epoch [1/3], Step [721/3236], Loss: 2.6704, Perplexity: 14.4464
Epoch [1/3], Step [722/3236], Loss: 2.6764, Perplexity: 14.5331
Epoch [1/3], Step [723/3236], Loss: 2.6740, Perplexity: 14.4973
Epoch [1/3], Step [724/3236], Loss: 2.9372, Perplexity: 18.8631
Epoch [1/3], Step [725/3236], Loss: 2.7698, Perplexity: 15.9562
Epoch [1/3], Step [726/3236], Loss: 2.8876, Perplexity: 17.9505
Epoch [1/3], Step [727/3236], Loss: 2.6712, Perplexity: 14.4573
Epoch [1/3], Step [728/3236], Loss: 2.9856, Perplexity: 19.7985
Epoch [1/3], Step [729/3236], Loss: 3.1125, Perplexity: 22.4772
Epoch [1/3], Step [730/3236], Loss: 3.0878, Perplexity: 21.9298
Epoch [1/3], Step [731/3236], Loss: 2.6643, Perplexity: 14.3576
Epoch [1/3], Step [732/3236], Loss: 2.8798, Perplexity: 17.8101
Epoch [1/3], Step [733/3236], Loss: 2.6530, Perplexity: 14.1966
Epoch [1/3], Step [734/3236], Loss: 2.6841, Perplexity: 14.6451
Epoch [1/3], Step [735/3236], Loss: 2.7100, Perplexity: 15.0288
Epoch [1/3], Step [736/3236], Loss: 2.6483, Perplexity: 14.1304
Epoch [1/3], Step [737/3236], Loss: 2.6995, Perplexity: 14.8728
Epoch [1/3], Step [738/3236], Loss: 2.8524, Perplexity: 17.3288
Epoch [1/3], Step [739/3236], Loss: 2.6196, Perplexity: 13.7298
Epoch [1/3], Step [740/3236], Loss: 2.8064, Perplexity: 16.5502
Epoch [1/3], Step [741/3236], Loss: 2.6753, Perplexity: 14.5161
Epoch [1/3], Step [742/3236], Loss: 2.7358, Perplexity: 15.4214
Epoch [1/3], Step [743/3236], Loss: 2.7377, Perplexity: 15.4517
Epoch [1/3], Step [744/3236], Loss: 2.7279, Perplexity: 15.3014
Epoch [1/3], Step [745/3236], Loss: 3.2244, Perplexity: 25.1376
Epoch [1/3], Step [746/3236], Loss: 2.7108, Perplexity: 15.0409
Epoch [1/3], Step [747/3236], Loss: 2.8000, Perplexity: 16.4455
Epoch [1/3], Step [748/3236], Loss: 2.9305, Perplexity: 18.7363
Epoch [1/3], Step [749/3236], Loss: 3.0205, Perplexity: 20.5012
Epoch [1/3], Step [750/3236], Loss: 2.7130, Perplexity: 15.0746
Epoch [1/3], Step [751/3236], Loss: 2.7067, Perplexity: 14.9791
Epoch [1/3], Step [752/3236], Loss: 2.7584, Perplexity: 15.7750
Epoch [1/3], Step [753/3236], Loss: 2.7998, Perplexity: 16.4409
Epoch [1/3], Step [754/3236], Loss: 2.6293, Perplexity: 13.8641
Epoch [1/3], Step [755/3236], Loss: 2.6430, Perplexity: 14.0557
Epoch [1/3], Step [756/3236], Loss: 2.7176, Perplexity: 15.1442
Epoch [1/3], Step [757/3236], Loss: 2.7741, Perplexity: 16.0249
Epoch [1/3], Step [758/3236], Loss: 2.8740, Perplexity: 17.7078
Epoch [1/3], Step [759/3236], Loss: 2.7617, Perplexity: 15.8264
Epoch [1/3], Step [760/3236], Loss: 2.6880, Perplexity: 14.7016
Epoch [1/3], Step [761/3236], Loss: 2.6048, Perplexity: 13.5288
Epoch [1/3], Step [762/3236], Loss: 2.9351, Perplexity: 18.8232
Epoch [1/3], Step [763/3236], Loss: 3.0418, Perplexity: 20.9429
Epoch [1/3], Step [764/3236], Loss: 2.8039, Perplexity: 16.5082
Epoch [1/3], Step [765/3236], Loss: 2.9811, Perplexity: 19.7091
Epoch [1/3], Step [766/3236], Loss: 2.8521, Perplexity: 17.3233
Epoch [1/3], Step [767/3236], Loss: 2.8033, Perplexity: 16.4989
Epoch [1/3], Step [768/3236], Loss: 2.9822, Perplexity: 19.7311
Epoch [1/3], Step [769/3236], Loss: 2.8430, Perplexity: 17.1673
Epoch [1/3], Step [770/3236], Loss: 2.6314, Perplexity: 13.8928
Epoch [1/3], Step [771/3236], Loss: 2.7606, Perplexity: 15.8091
Epoch [1/3], Step [772/3236], Loss: 2.9436, Perplexity: 18.9834
Epoch [1/3], Step [773/3236], Loss: 2.5135, Perplexity: 12.3482
Epoch [1/3], Step [774/3236], Loss: 2.6893, Perplexity: 14.7216
Epoch [1/3], Step [775/3236], Loss: 2.7653, Perplexity: 15.8844
Epoch [1/3], Step [776/3236], Loss: 2.7784, Perplexity: 16.0924
Epoch [1/3], Step [777/3236], Loss: 2.7800, Perplexity: 16.1183
Epoch [1/3], Step [778/3236], Loss: 2.9706, Perplexity: 19.5044
Epoch [1/3], Step [779/3236], Loss: 2.6539, Perplexity: 14.2087
Epoch [1/3], Step [780/3236], Loss: 2.6740, Perplexity: 14.4971
Epoch [1/3], Step [781/3236], Loss: 2.6309, Perplexity: 13.8857
Epoch [1/3], Step [782/3236], Loss: 2.6533, Perplexity: 14.2003
Epoch [1/3], Step [783/3236], Loss: 2.7401, Perplexity: 15.4893
Epoch [1/3], Step [784/3236], Loss: 2.5753, Perplexity: 13.1358
Epoch [1/3], Step [785/3236], Loss: 3.2708, Perplexity: 26.3322
Epoch [1/3], Step [786/3236], Loss: 2.8596, Perplexity: 17.4545
Epoch [1/3], Step [787/3236], Loss: 2.6916, Perplexity: 14.7549
Epoch [1/3], Step [788/3236], Loss: 3.5163, Perplexity: 33.6597
Epoch [1/3], Step [789/3236], Loss: 3.4073, Perplexity: 30.1851
Epoch [1/3], Step [790/3236], Loss: 2.8319, Perplexity: 16.9783
Epoch [1/3], Step [791/3236], Loss: 2.8588, Perplexity: 17.4405
Epoch [1/3], Step [792/3236], Loss: 2.7791, Perplexity: 16.1039
Epoch [1/3], Step [793/3236], Loss: 2.6375, Perplexity: 13.9782
Epoch [1/3], Step [794/3236], Loss: 2.7163, Perplexity: 15.1237
Epoch [1/3], Step [795/3236], Loss: 2.7238, Perplexity: 15.2384
Epoch [1/3], Step [796/3236], Loss: 2.8695, Perplexity: 17.6275
Epoch [1/3], Step [797/3236], Loss: 2.7595, Perplexity: 15.7918
Epoch [1/3], Step [798/3236], Loss: 2.6296, Perplexity: 13.8684
Epoch [1/3], Step [799/3236], Loss: 2.9756, Perplexity: 19.6009
Epoch [1/3], Step [800/3236], Loss: 3.0867, Perplexity: 21.9041
Epoch [1/3], Step [801/3236], Loss: 3.0120, Perplexity: 20.3273
Epoch [1/3], Step [802/3236], Loss: 2.8154, Perplexity: 16.7002
Epoch [1/3], Step [803/3236], Loss: 2.6870, Perplexity: 14.6880
Epoch [1/3], Step [804/3236], Loss: 2.8269, Perplexity: 16.8935
Epoch [1/3], Step [805/3236], Loss: 2.7646, Perplexity: 15.8725
Epoch [1/3], Step [806/3236], Loss: 2.9812, Perplexity: 19.7117
Epoch [1/3], Step [807/3236], Loss: 2.9490, Perplexity: 19.0876
Epoch [1/3], Step [808/3236], Loss: 2.7525, Perplexity: 15.6821
Epoch [1/3], Step [809/3236], Loss: 2.6926, Perplexity: 14.7701
Epoch [1/3], Step [810/3236], Loss: 2.8027, Perplexity: 16.4886
Epoch [1/3], Step [811/3236], Loss: 2.7878, Perplexity: 16.2450
Epoch [1/3], Step [812/3236], Loss: 3.2797, Perplexity: 26.5666
Epoch [1/3], Step [813/3236], Loss: 2.6856, Perplexity: 14.6669
Epoch [1/3], Step [814/3236], Loss: 2.9428, Perplexity: 18.9684
Epoch [1/3], Step [815/3236], Loss: 2.8920, Perplexity: 18.0289
Epoch [1/3], Step [816/3236], Loss: 2.6177, Perplexity: 13.7036
Epoch [1/3], Step [817/3236], Loss: 2.7602, Perplexity: 15.8037
Epoch [1/3], Step [818/3236], Loss: 2.5622, Perplexity: 12.9649
Epoch [1/3], Step [819/3236], Loss: 2.7967, Perplexity: 16.3899
Epoch [1/3], Step [820/3236], Loss: 2.6713, Perplexity: 14.4585
Epoch [1/3], Step [821/3236], Loss: 2.6488, Perplexity: 14.1370
Epoch [1/3], Step [822/3236], Loss: 2.7672, Perplexity: 15.9135
Epoch [1/3], Step [823/3236], Loss: 3.4272, Perplexity: 30.7906
Epoch [1/3], Step [824/3236], Loss: 2.7149, Perplexity: 15.1025
Epoch [1/3], Step [825/3236], Loss: 2.6141, Perplexity: 13.6545
Epoch [1/3], Step [826/3236], Loss: 2.8645, Perplexity: 17.5399
Epoch [1/3], Step [827/3236], Loss: 3.6248, Perplexity: 37.5183
Epoch [1/3], Step [828/3236], Loss: 2.5331, Perplexity: 12.5925
Epoch [1/3], Step [829/3236], Loss: 2.7871, Perplexity: 16.2335
Epoch [1/3], Step [830/3236], Loss: 2.6166, Perplexity: 13.6887
Epoch [1/3], Step [831/3236], Loss: 2.6708, Perplexity: 14.4522
Epoch [1/3], Step [832/3236], Loss: 2.6772, Perplexity: 14.5449
Epoch [1/3], Step [833/3236], Loss: 2.5676, Perplexity: 13.0345
Epoch [1/3], Step [834/3236], Loss: 2.6805, Perplexity: 14.5928
Epoch [1/3], Step [835/3236], Loss: 2.6425, Perplexity: 14.0479
Epoch [1/3], Step [836/3236], Loss: 3.1372, Perplexity: 23.0393
Epoch [1/3], Step [837/3236], Loss: 2.8804, Perplexity: 17.8212
Epoch [1/3], Step [838/3236], Loss: 2.6568, Perplexity: 14.2503
Epoch [1/3], Step [839/3236], Loss: 2.8072, Perplexity: 16.5632
Epoch [1/3], Step [840/3236], Loss: 3.0952, Perplexity: 22.0922
Epoch [1/3], Step [841/3236], Loss: 2.5694, Perplexity: 13.0574
Epoch [1/3], Step [842/3236], Loss: 2.6995, Perplexity: 14.8720
Epoch [1/3], Step [843/3236], Loss: 2.8877, Perplexity: 17.9515
Epoch [1/3], Step [844/3236], Loss: 2.8036, Perplexity: 16.5033
Epoch [1/3], Step [845/3236], Loss: 2.6773, Perplexity: 14.5464
Epoch [1/3], Step [846/3236], Loss: 2.6460, Perplexity: 14.0973
Epoch [1/3], Step [847/3236], Loss: 2.7828, Perplexity: 16.1641
Epoch [1/3], Step [848/3236], Loss: 3.1128, Perplexity: 22.4849
Epoch [1/3], Step [849/3236], Loss: 2.6202, Perplexity: 13.7391
Epoch [1/3], Step [850/3236], Loss: 2.6435, Perplexity: 14.0620
Epoch [1/3], Step [851/3236], Loss: 2.5102, Perplexity: 12.3076
Epoch [1/3], Step [852/3236], Loss: 2.7084, Perplexity: 15.0052
Epoch [1/3], Step [853/3236], Loss: 2.9110, Perplexity: 18.3748
Epoch [1/3], Step [854/3236], Loss: 2.6897, Perplexity: 14.7268
Epoch [1/3], Step [855/3236], Loss: 2.7845, Perplexity: 16.1910
Epoch [1/3], Step [856/3236], Loss: 2.6494, Perplexity: 14.1449
Epoch [1/3], Step [857/3236], Loss: 4.2534, Perplexity: 70.3475
Epoch [1/3], Step [858/3236], Loss: 2.6196, Perplexity: 13.7301
Epoch [1/3], Step [859/3236], Loss: 2.7354, Perplexity: 15.4153
Epoch [1/3], Step [860/3236], Loss: 2.7063, Perplexity: 14.9737
Epoch [1/3], Step [861/3236], Loss: 3.0417, Perplexity: 20.9400
Epoch [1/3], Step [862/3236], Loss: 2.5798, Perplexity: 13.1952
Epoch [1/3], Step [863/3236], Loss: 2.7712, Perplexity: 15.9782
Epoch [1/3], Step [864/3236], Loss: 2.7798, Perplexity: 16.1160
Epoch [1/3], Step [865/3236], Loss: 2.6600, Perplexity: 14.2957
Epoch [1/3], Step [866/3236], Loss: 2.8120, Perplexity: 16.6435
Epoch [1/3], Step [867/3236], Loss: 2.6697, Perplexity: 14.4360
Epoch [1/3], Step [868/3236], Loss: 2.7127, Perplexity: 15.0703
Epoch [1/3], Step [869/3236], Loss: 2.7015, Perplexity: 14.9026
Epoch [1/3], Step [870/3236], Loss: 2.9758, Perplexity: 19.6054
Epoch [1/3], Step [871/3236], Loss: 2.7245, Perplexity: 15.2486
Epoch [1/3], Step [872/3236], Loss: 2.8224, Perplexity: 16.8177
Epoch [1/3], Step [873/3236], Loss: 3.4611, Perplexity: 31.8528
Epoch [1/3], Step [874/3236], Loss: 2.6123, Perplexity: 13.6308
Epoch [1/3], Step [875/3236], Loss: 2.6127, Perplexity: 13.6354
Epoch [1/3], Step [876/3236], Loss: 2.7421, Perplexity: 15.5199
Epoch [1/3], Step [877/3236], Loss: 2.5618, Perplexity: 12.9596
Epoch [1/3], Step [878/3236], Loss: 2.6357, Perplexity: 13.9534
Epoch [1/3], Step [879/3236], Loss: 2.8494, Perplexity: 17.2783
Epoch [1/3], Step [880/3236], Loss: 2.7357, Perplexity: 15.4210
Epoch [1/3], Step [881/3236], Loss: 2.6925, Perplexity: 14.7681
Epoch [1/3], Step [882/3236], Loss: 2.7152, Perplexity: 15.1078
Epoch [1/3], Step [883/3236], Loss: 2.8660, Perplexity: 17.5666
Epoch [1/3], Step [884/3236], Loss: 2.5804, Perplexity: 13.2019
Epoch [1/3], Step [885/3236], Loss: 2.7189, Perplexity: 15.1638
Epoch [1/3], Step [886/3236], Loss: 2.4891, Perplexity: 12.0509
Epoch [1/3], Step [887/3236], Loss: 2.6763, Perplexity: 14.5310
Epoch [1/3], Step [888/3236], Loss: 2.7640, Perplexity: 15.8635
Epoch [1/3], Step [889/3236], Loss: 2.8030, Perplexity: 16.4944
Epoch [1/3], Step [890/3236], Loss: 2.6291, Perplexity: 13.8606
Epoch [1/3], Step [891/3236], Loss: 2.5312, Perplexity: 12.5681
Epoch [1/3], Step [892/3236], Loss: 2.5816, Perplexity: 13.2189
Epoch [1/3], Step [893/3236], Loss: 2.6436, Perplexity: 14.0636
Epoch [1/3], Step [894/3236], Loss: 2.8553, Perplexity: 17.3790
Epoch [1/3], Step [895/3236], Loss: 2.8309, Perplexity: 16.9605
Epoch [1/3], Step [896/3236], Loss: 2.6367, Perplexity: 13.9677
Epoch [1/3], Step [897/3236], Loss: 2.5365, Perplexity: 12.6358
Epoch [1/3], Step [898/3236], Loss: 2.6578, Perplexity: 14.2653
Epoch [1/3], Step [899/3236], Loss: 2.7126, Perplexity: 15.0679
Epoch [1/3], Step [900/3236], Loss: 2.5786, Perplexity: 13.1793
Epoch [1/3], Step [901/3236], Loss: 2.5834, Perplexity: 13.2415
Epoch [1/3], Step [902/3236], Loss: 2.7772, Perplexity: 16.0739
Epoch [1/3], Step [903/3236], Loss: 2.7165, Perplexity: 15.1280
Epoch [1/3], Step [904/3236], Loss: 3.0084, Perplexity: 20.2545
Epoch [1/3], Step [905/3236], Loss: 2.4979, Perplexity: 12.1565
Epoch [1/3], Step [906/3236], Loss: 2.4983, Perplexity: 12.1621
Epoch [1/3], Step [907/3236], Loss: 2.6707, Perplexity: 14.4500
Epoch [1/3], Step [908/3236], Loss: 2.6238, Perplexity: 13.7882
Epoch [1/3], Step [909/3236], Loss: 2.7642, Perplexity: 15.8658
Epoch [1/3], Step [910/3236], Loss: 2.6896, Perplexity: 14.7259
Epoch [1/3], Step [911/3236], Loss: 2.6918, Perplexity: 14.7577
Epoch [1/3], Step [912/3236], Loss: 2.7411, Perplexity: 15.5041
Epoch [1/3], Step [913/3236], Loss: 2.5641, Perplexity: 12.9889
Epoch [1/3], Step [914/3236], Loss: 2.4623, Perplexity: 11.7314
Epoch [1/3], Step [915/3236], Loss: 2.5432, Perplexity: 12.7200
Epoch [1/3], Step [916/3236], Loss: 2.4892, Perplexity: 12.0520
Epoch [1/3], Step [917/3236], Loss: 3.1820, Perplexity: 24.0951
Epoch [1/3], Step [918/3236], Loss: 2.7516, Perplexity: 15.6680
Epoch [1/3], Step [919/3236], Loss: 2.5046, Perplexity: 12.2392
Epoch [1/3], Step [920/3236], Loss: 2.6563, Perplexity: 14.2430
Epoch [1/3], Step [921/3236], Loss: 2.7757, Perplexity: 16.0495
Epoch [1/3], Step [922/3236], Loss: 2.6226, Perplexity: 13.7709
Epoch [1/3], Step [923/3236], Loss: 2.8360, Perplexity: 17.0476
Epoch [1/3], Step [924/3236], Loss: 2.5397, Perplexity: 12.6756
Epoch [1/3], Step [925/3236], Loss: 3.2545, Perplexity: 25.9069
Epoch [1/3], Step [926/3236], Loss: 2.4773, Perplexity: 11.9092
Epoch [1/3], Step [927/3236], Loss: 2.5353, Perplexity: 12.6208
Epoch [1/3], Step [928/3236], Loss: 2.5585, Perplexity: 12.9165
Epoch [1/3], Step [929/3236], Loss: 2.6971, Perplexity: 14.8371
Epoch [1/3], Step [930/3236], Loss: 2.5723, Perplexity: 13.0962
Epoch [1/3], Step [931/3236], Loss: 2.4838, Perplexity: 11.9871
Epoch [1/3], Step [932/3236], Loss: 2.7583, Perplexity: 15.7727
Epoch [1/3], Step [933/3236], Loss: 2.4783, Perplexity: 11.9210
Epoch [1/3], Step [934/3236], Loss: 2.6096, Perplexity: 13.5935
Epoch [1/3], Step [935/3236], Loss: 2.5806, Perplexity: 13.2050
Epoch [1/3], Step [936/3236], Loss: 2.6484, Perplexity: 14.1313
Epoch [1/3], Step [937/3236], Loss: 2.5868, Perplexity: 13.2877
Epoch [1/3], Step [938/3236], Loss: 2.9021, Perplexity: 18.2125
Epoch [1/3], Step [939/3236], Loss: 2.6164, Perplexity: 13.6868
Epoch [1/3], Step [940/3236], Loss: 2.6926, Perplexity: 14.7701
Epoch [1/3], Step [941/3236], Loss: 2.6723, Perplexity: 14.4738
Epoch [1/3], Step [942/3236], Loss: 2.6112, Perplexity: 13.6157
Epoch [1/3], Step [943/3236], Loss: 3.6076, Perplexity: 36.8768
Epoch [1/3], Step [944/3236], Loss: 2.7620, Perplexity: 15.8314
Epoch [1/3], Step [945/3236], Loss: 2.6871, Perplexity: 14.6892
Epoch [1/3], Step [946/3236], Loss: 2.4746, Perplexity: 11.8769
Epoch [1/3], Step [947/3236], Loss: 3.1021, Perplexity: 22.2441
Epoch [1/3], Step [948/3236], Loss: 3.0072, Perplexity: 20.2315
Epoch [1/3], Step [949/3236], Loss: 2.6546, Perplexity: 14.2187
Epoch [1/3], Step [950/3236], Loss: 2.7447, Perplexity: 15.5602
Epoch [1/3], Step [951/3236], Loss: 2.6294, Perplexity: 13.8661
Epoch [1/3], Step [952/3236], Loss: 2.4874, Perplexity: 12.0303
Epoch [1/3], Step [953/3236], Loss: 2.6685, Perplexity: 14.4177
Epoch [1/3], Step [954/3236], Loss: 2.6417, Perplexity: 14.0365
Epoch [1/3], Step [955/3236], Loss: 2.6538, Perplexity: 14.2077
Epoch [1/3], Step [956/3236], Loss: 2.5127, Perplexity: 12.3379
Epoch [1/3], Step [957/3236], Loss: 2.5930, Perplexity: 13.3692
Epoch [1/3], Step [958/3236], Loss: 2.7884, Perplexity: 16.2549
Epoch [1/3], Step [959/3236], Loss: 2.5947, Perplexity: 13.3928
Epoch [1/3], Step [960/3236], Loss: 2.5367, Perplexity: 12.6385
Epoch [1/3], Step [961/3236], Loss: 2.6201, Perplexity: 13.7366
Epoch [1/3], Step [962/3236], Loss: 2.7524, Perplexity: 15.6809
Epoch [1/3], Step [963/3236], Loss: 2.7903, Perplexity: 16.2857
Epoch [1/3], Step [964/3236], Loss: 2.7739, Perplexity: 16.0205
Epoch [1/3], Step [965/3236], Loss: 2.5834, Perplexity: 13.2416
Epoch [1/3], Step [966/3236], Loss: 2.7944, Perplexity: 16.3525
Epoch [1/3], Step [967/3236], Loss: 2.5063, Perplexity: 12.2592
Epoch [1/3], Step [968/3236], Loss: 2.6544, Perplexity: 14.2162
Epoch [1/3], Step [969/3236], Loss: 2.7640, Perplexity: 15.8625
Epoch [1/3], Step [970/3236], Loss: 3.0142, Perplexity: 20.3733
Epoch [1/3], Step [971/3236], Loss: 2.5126, Perplexity: 12.3365
Epoch [1/3], Step [972/3236], Loss: 3.6549, Perplexity: 38.6643
Epoch [1/3], Step [973/3236], Loss: 2.4867, Perplexity: 12.0212
Epoch [1/3], Step [974/3236], Loss: 2.7372, Perplexity: 15.4436
Epoch [1/3], Step [975/3236], Loss: 2.8769, Perplexity: 17.7596
Epoch [1/3], Step [976/3236], Loss: 2.6506, Perplexity: 14.1630
Epoch [1/3], Step [977/3236], Loss: 2.6885, Perplexity: 14.7089
Epoch [1/3], Step [978/3236], Loss: 2.7411, Perplexity: 15.5043
Epoch [1/3], Step [979/3236], Loss: 2.8848, Perplexity: 17.8999
Epoch [1/3], Step [980/3236], Loss: 2.6760, Perplexity: 14.5273
Epoch [1/3], Step [981/3236], Loss: 2.5989, Perplexity: 13.4487
Epoch [1/3], Step [982/3236], Loss: 2.6876, Perplexity: 14.6958
Epoch [1/3], Step [983/3236], Loss: 2.9745, Perplexity: 19.5790
Epoch [1/3], Step [984/3236], Loss: 2.5594, Perplexity: 12.9276
Epoch [1/3], Step [985/3236], Loss: 2.6548, Perplexity: 14.2217
Epoch [1/3], Step [986/3236], Loss: 2.6130, Perplexity: 13.6393
Epoch [1/3], Step [987/3236], Loss: 2.6091, Perplexity: 13.5873
Epoch [1/3], Step [988/3236], Loss: 2.9394, Perplexity: 18.9037
Epoch [1/3], Step [989/3236], Loss: 2.6068, Perplexity: 13.5561
Epoch [1/3], Step [990/3236], Loss: 2.4920, Perplexity: 12.0859
Epoch [1/3], Step [991/3236], Loss: 2.5967, Perplexity: 13.4194
Epoch [1/3], Step [992/3236], Loss: 2.7294, Perplexity: 15.3234
Epoch [1/3], Step [993/3236], Loss: 2.5036, Perplexity: 12.2262
Epoch [1/3], Step [994/3236], Loss: 2.5019, Perplexity: 12.2057
Epoch [1/3], Step [995/3236], Loss: 2.5965, Perplexity: 13.4170
Epoch [1/3], Step [996/3236], Loss: 2.6098, Perplexity: 13.5963
Epoch [1/3], Step [997/3236], Loss: 2.5288, Perplexity: 12.5384
Epoch [1/3], Step [998/3236], Loss: 2.6285, Perplexity: 13.8535
Epoch [1/3], Step [999/3236], Loss: 2.7284, Perplexity: 15.3086
Epoch [1/3], Step [1000/3236], Loss: 2.5413, Perplexity: 12.6958