Skip to content

Commit

Permalink
add deepspeed example
Browse files Browse the repository at this point in the history
  • Loading branch information
kuizhiqing committed Dec 29, 2023
1 parent d7a392e commit b8c6b9a
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 0 deletions.
8 changes: 8 additions & 0 deletions examples/v2beta1/deepspeed/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# DeeepSpeed Example

This demo introduces the basic usage of deepspeed with mpi-operator.

## References

* https://github.com/microsoft/DeepSpeedExamples/blob/master/training/HelloDeepSpeed/README.md
* https://www.alibabacloud.com/help/en/ack/cloud-native-ai-suite/user-guide/deepspeed-distributed-training
32 changes: 32 additions & 0 deletions examples/v2beta1/deepspeed/deepspeed-helloworld.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: deepspeed-helloworld
spec:
slotsPerWorker: 1
runPolicy:
cleanPodPolicy: Running
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/deepspeed:hello-deepspeed
name: deepspeed-helloworld
command:
- deepspeed
args:
- /workspace/DeepSpeedExamples/HelloDeepSpeed/train_bert_ds.py
- --checkpoint_dir
- /workspace
Worker:
replicas: 2
template:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/deepspeed:hello-deepspeed
name: deepspeed-helloworld
resources:
limits:
nvidia.com/gpu: 8

0 comments on commit b8c6b9a

Please sign in to comment.