Skip to content

Commit

Permalink
fix preprocessing command
Browse files Browse the repository at this point in the history
  • Loading branch information
stas00 authored Dec 8, 2023
1 parent e52bdab commit 8387ae1
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ An example script to prepare data for GPT training is:
python tools/preprocess_data.py \
--input my-corpus.json \
--output-prefix my-gpt2 \
--vocab gpt2-vocab.json \
--vocab-file gpt2-vocab.json \
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file gpt2-merges.txt \
Expand Down Expand Up @@ -132,7 +132,7 @@ xz -d oscar-1GB.jsonl.xz
python tools/preprocess_data.py \
--input oscar-1GB.jsonl \
--output-prefix my-gpt2 \
--vocab gpt2-vocab.json \
--vocab-file gpt2-vocab.json \
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file gpt2-merges.txt \
Expand Down Expand Up @@ -192,13 +192,13 @@ DATA_ARGS=" \
--data-path $DATA_PATH \
"
CMD="pretrain_gpt.py $GPT_ARGS $OUTPUT_ARGS $DATA_ARGS"
CMD="pretrain_gpt.py GPTARGSGPT_ARGS OUTPUT_ARGS $DATA_ARGS"
N_GPUS=1
LAUNCHER="deepspeed --num_gpus $N_GPUS"
$LAUNCHER $CMD
LAUNCHERLAUNCHER CMD
```

Note, we replaced `python` with `deepspeed --num_gpus 1`. For multi-gpu training update `--num_gpus` to the number of GPUs you have.
Expand Down

0 comments on commit 8387ae1

Please sign in to comment.