Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent device config in finetuning and serving yaml #25

Merged
merged 1 commit into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/config/mpt_deltatuner.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/config/mpt_deltatuner_deepspeed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: true
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: false
precision: bf16
Expand Down
6 changes: 3 additions & 3 deletions docs/serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,17 @@ We provide preconfigured yaml files in [inference/models](../inference/models) f
To deploy on CPU, please make sure `device` is set to CPU and `cpus_per_worker` is set to a correct number.
```
cpus_per_worker: 24
device: "cpu"
device: CPU
```
To deploy on GPU, please make sure `device` is set to GPU and `gpus_per_worker` is set to 1.
```
gpus_per_worker: 1
device: "gpu"
device: GPU
```
To deploy on Gaudi, please make sure `device` is set to hpu and `hpus_per_worker` is set to 1.
```
hpus_per_worker: 1
device: "hpu"
device: HPU
```
LLM-on-Ray also supports serving with [Deepspeed](serve_deepspeed.md) for AutoTP and [BigDL-LLM](serve_bigdl.md) for INT4/FP4/INT8/FP8 to reduce latency. You can follow the corresponding documents to enable them.

Expand Down
4 changes: 2 additions & 2 deletions inference/inference_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ def _check_port(cls, v: int):
@validator('device')
def _check_device(cls, v: str):
if v:
assert v in [DEVICE_CPU, DEVICE_XPU, DEVICE_CUDA, DEVICE_HPU]
return v
assert v.lower() in [DEVICE_CPU, DEVICE_XPU, DEVICE_CUDA, DEVICE_HPU]
return v.lower()

@validator('workers_per_group')
def _check_workers_per_group(cls, v: int):
Expand Down
2 changes: 1 addition & 1 deletion inference/models/bigdl/mistral-7b-v0.1-bigdl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: false
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/bigdl/mpt-7b-bigdl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: false
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/bloom-560m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 10
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/gpt2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/mistral-7b-v0.1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/mpt-7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/neural-chat-7b-v3-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: false
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/opt-125m.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ cpus_per_worker: 24
gpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: "cpu"
device: CPU
ipex:
enabled: false
precision: bf16
Expand Down
2 changes: 1 addition & 1 deletion inference/models/template/inference_config_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ gpus_per_worker: 0
hpus_per_worker: 0
deepspeed: false
workers_per_group: 2
device: cpu
device: CPU
ipex:
enabled: true
precision: bf16
Expand Down
Loading