We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
有几个问题请教一下大佬: 1.你们放出的代码是否支持多机多卡的增量预训练呢?需要怎么做呢,因为我看没有配置多个机器的地方呀?
deepspeed \ --include="localhost:0,1,2,3" \ ./train_clm.py \ --deepspeed ./ds_config/ds_config_zero3.json \ --model_name_or_path TigerResearch/tigerbot-7b-base \ --dataset_name TigerResearch/dev_pretrain \ --do_train \ --output_dir ./ckpt-clm \ --overwrite_output_dir \ --preprocess_num_workers 8 \ --num_train_epochs 5 \ --learning_rate 1e-5 \ --evaluation_strategy steps \ --eval_steps 10 \ --bf16 True \ --save_strategy steps \ --save_steps 10 \ --save_total_limit 2 \ --logging_steps 10 \ --tf32 True \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2
2.70B的模型持续增量预训练,至少需要多少个机器呢? 3.有多机卡训练的教程吗
谢谢大佬的回复
The text was updated successfully, but these errors were encountered:
我们的代码支持单机多卡,多机多卡可以参考deepspeed的文档: https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node。 与单机相比,多机训练主要需要以下几个步骤:
node0
node1
node2
ssh nodeX
从我们的经验来说,一台8卡40G,1TB内存的机器足够在deep3 + offload的情况下微调70B模型。
多机多卡可以直接以“deepspeed multinode”为关键词搜索,可以找到许多较例子。
Sorry, something went wrong.
No branches or pull requests
有几个问题请教一下大佬:
1.你们放出的代码是否支持多机多卡的增量预训练呢?需要怎么做呢,因为我看没有配置多个机器的地方呀?
2.70B的模型持续增量预训练,至少需要多少个机器呢?
3.有多机卡训练的教程吗
谢谢大佬的回复
The text was updated successfully, but these errors were encountered: