Share PPL results #21

saareliad · 2019-08-14T09:30:34Z

Hi,
It was not clear to me from the article what are your final PPL results for each model.
Can you share them too?

From a first look I actually thought that you achieve same or comparable PPL results. I am not sure about it now. Can you clarify?

Do you have a baseline model with comparable PPL to the original base model?

Can someone use what you did as a baseline for smaller scale research? (4-8 "commodity" GPUs for example?).

Extra detail on total training time:
I noticed that you count in tokens instead of steps,
were tokens_per_global_batch=global_batch_size*seq_len.
Using the parameters in the script, a simple calculation yields, in steps:

config	num gpus	max tokens	seq len	base batch	global batch size	tokens per batch	required steps	PPL
single machine	1	1.8B	128	32	32	4096	439453.125	?
single machine	2	1.8B	128	32	64	8192	219726.5625	?
single machine	4	1.8B	128	32	128	16384	109863.2813	?

Comparing the the base_wiki103 config from the original repo
(they used only data parallel) we get:

config	num gpus	tokens	seq len	base batch	global batch size	tokens per batch	steps	PPL
original-base-wt103	don't care	1.92B	150	don't care	64	9600	200000	24

=>They trained on much more tokens.
If your results are really comparable, the model you present here is worth using as a baseline for future transfomerXL experiments because its faster. Right?

The text was updated successfully, but these errors were encountered:

yaroslavvb · 2019-08-19T20:53:35Z

I don't have easily accessible perplexity results right now, will update once I get logging infrastructure. But generally switching to DDP from DP gave about 30% increase in throughput, so even with the same hyper-parameters you get a speed-up compared to original version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share PPL results #21

Share PPL results #21

saareliad commented Aug 14, 2019 •

edited

Loading

yaroslavvb commented Aug 19, 2019

Share PPL results #21

Share PPL results #21

Comments

saareliad commented Aug 14, 2019 • edited Loading

yaroslavvb commented Aug 19, 2019

saareliad commented Aug 14, 2019 •

edited

Loading