forked from intel/llm-on-ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Inference] Enable vllm on HPU (intel#232)
* enable vllm guadi * fix ci * fix ci * enforce eager
- Loading branch information
Showing
6 changed files
with
83 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
FROM vault.habana.ai/gaudi-docker/1.15.1/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest | ||
|
||
ENV LANG=en_US.UTF-8 | ||
|
||
WORKDIR /root/llm-on-ray | ||
|
||
COPY ./pyproject.toml . | ||
COPY ./MANIFEST.in . | ||
|
||
# Create llm_on_ray package directory to bypass the following 'pip install -e' command | ||
RUN mkdir ./llm_on_ray | ||
|
||
RUN pip install -e . && \ | ||
pip install --upgrade-strategy eager optimum[habana] && \ | ||
pip install git+https://github.com/HabanaAI/[email protected] | ||
|
||
# Install vllm habana env | ||
RUN pip install -v git+https://github.com/HabanaAI/vllm-fork.git@habana_main | ||
# Reinstall ray because vllm downgrades the ray version | ||
RUN pip install "ray>=2.10" "ray[serve,tune]>=2.10" | ||
|
||
# Optinal. Comment out if you are not using UI | ||
COPY ./dev/scripts/install-ui.sh /tmp | ||
|
||
RUN /tmp/install-ui.sh | ||
|
||
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \ | ||
service ssh restart | ||
|
||
ENV no_proxy=localhost,127.0.0.1 | ||
|
||
# Required by DeepSpeed | ||
ENV RAY_EXPERIMENTAL_NOSET_HABANA_VISIBLE_MODULES=1 | ||
|
||
ENV PT_HPU_LAZY_ACC_PAR_MODE=0 | ||
|
||
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=true | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
llm_on_ray/inference/models/hpu/llama-2-7b-chat-hf-vllm-hpu.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
port: 8000 | ||
name: llama-2-7b-chat-hf-vllm | ||
route_prefix: /llama-2-7b-chat-hf-vllm | ||
num_replicas: 1 | ||
cpus_per_worker: 8 | ||
gpus_per_worker: 0 | ||
deepspeed: false | ||
vllm: | ||
enabled: true | ||
precision: bf16 | ||
enforce_eager: true | ||
workers_per_group: 2 | ||
device: hpu | ||
hpus_per_worker: 1 | ||
ipex: | ||
enabled: false | ||
precision: bf16 | ||
model_description: | ||
model_id_or_path: meta-llama/Llama-2-7b-chat-hf | ||
tokenizer_name_or_path: meta-llama/Llama-2-7b-chat-hf | ||
config: | ||
use_auth_token: '' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters