Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build docker files for both CI and User #219

Merged
merged 47 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
813c8ef
add docker start for user
yutianchen666 May 13, 2024
17e6a24
add docker start for user
yutianchen666 May 13, 2024
b6b5f10
Merge branch 'intel:main' into docker_user
yutianchen666 May 14, 2024
84e1136
merge docker file
yutianchen666 May 14, 2024
8bb9659
fix
yutianchen666 May 17, 2024
37538ba
fix deepspeed
yutianchen666 May 17, 2024
c52372c
fix
yutianchen666 May 17, 2024
0d23dd4
fix
yutianchen666 May 17, 2024
ff0f0d9
fix
yutianchen666 May 17, 2024
d690eff
Merge branch 'intel:main' into docker_user
yutianchen666 May 20, 2024
879b0c6
add ray user
yutianchen666 May 21, 2024
9692a59
Merge branch 'main' into docker_user
yutianchen666 May 27, 2024
c804587
add git
yutianchen666 May 28, 2024
5b62d3c
add git
yutianchen666 May 28, 2024
bb0d36d
fix
yutianchen666 May 28, 2024
4bc41ce
fix
yutianchen666 May 28, 2024
39c9259
fix
yutianchen666 May 28, 2024
7d79a70
fix
yutianchen666 May 28, 2024
79ef0fa
Merge branch 'intel:main' into docker_user
yutianchen666 Jun 3, 2024
d234f53
fix
yutianchen666 Jun 3, 2024
064056d
Merge branch 'docker_user' of https://github.com/yutianchen666/llm-on…
yutianchen666 Jun 3, 2024
6d31487
fix
yutianchen666 Jun 3, 2024
36259b0
fix reademe
yutianchen666 Jun 4, 2024
18ed87c
Merge branch 'intel:main' into docker_user
yutianchen666 Jun 4, 2024
a2816fc
fix dockerfile path
yutianchen666 Jun 4, 2024
e266865
Merge branch 'intel:main' into docker_user
yutianchen666 Jun 11, 2024
1e2db9a
fix re
yutianchen666 Jun 11, 2024
acd43aa
Update README.md
xwu99 Jun 19, 2024
0c5cc08
Update README.md
xwu99 Jun 19, 2024
bd6a905
Merge branch 'intel:main' into docker_user
yutianchen666 Jun 21, 2024
7be7d5a
fix
yutianchen666 Jun 21, 2024
4f921c3
fix docker file
yutianchen666 Jun 25, 2024
fdf6769
fix docker file
yutianchen666 Jun 25, 2024
52ecb41
Merge branch 'intel:main' into docker_user
yutianchen666 Jun 28, 2024
780b275
fix
yutianchen666 Jun 28, 2024
52b1a60
fix
yutianchen666 Jun 28, 2024
1b984da
fix review
yutianchen666 Jul 5, 2024
1c9f96b
fix md
yutianchen666 Jul 5, 2024
3f88633
fix md
yutianchen666 Jul 5, 2024
8ed1728
fix md
yutianchen666 Jul 5, 2024
d65d085
fix md
yutianchen666 Jul 5, 2024
77bb907
fix rebase
yutianchen666 Jul 18, 2024
02826aa
fix rebase
yutianchen666 Jul 18, 2024
5ccb4e4
fix rebase
yutianchen666 Jul 18, 2024
98f518e
fix rebase
yutianchen666 Jul 18, 2024
e0f5fad
Merge branch 'intel:main' into docker_user
yutianchen666 Jul 18, 2024
c1018bb
fix
yutianchen666 Jul 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/workflow_inference_gaudi2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,9 @@ jobs:
DF_SUFFIX=".gaudi2"
TARGET=${{steps.target.outputs.target}}
if [[ ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
dockerfile="dev/docker/Dockerfile.habana_vllm"
dockerfile="dev/docker/ci/Dockerfile.habana_vllm"
else
dockerfile="dev/docker/Dockerfile.habana"
dockerfile="dev/docker/ci/Dockerfile.habana"
fi
docker build --build-arg CACHEBUST=1 -f ${dockerfile} -t ${TARGET}:habana .
docker container prune -f
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/workflow_test_benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:
run: |
DF_SUFFIX=".vllm"
TARGET=${{steps.target.outputs.target}}
docker build ./ --build-arg CACHEBUST=1 --build-arg http_proxy=${{ inputs.http_proxy }} --build-arg https_proxy=${{ inputs.https_proxy }} -f dev/docker/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest
docker build ./ --build-arg CACHEBUST=1 --build-arg http_proxy=${{ inputs.http_proxy }} --build-arg https_proxy=${{ inputs.https_proxy }} -f dev/docker/ci/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest
docker container prune -f
docker image prune -f

Expand Down
57 changes: 56 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ LLM-on-Ray's modular workflow structure is designed to comprehensively cater to
![llm-on-ray](./docs/assets/solution_technical_overview.png)


## Getting Started
## Getting Started Locally With Source code
This guide will assist you in setting up LLM-on-Ray on Intel CPU locally, covering the initial setup, finetuning models, and deploying them for serving.
### Setup

Expand Down Expand Up @@ -102,6 +102,61 @@ After deploying the model endpoint, you can access and test it by using the scri
python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/gpt2
```

## Getting Started With Docker
This guide will assist you in setting up LLM-on-Ray on With Docker.
```bash
git clone https://github.com/intel/llm-on-ray.git
cd llm-on-ray
```
The dockerfile for user is in dev/docker/Dockerfile.user.
Detailed parameter can be set up for docker in dev/scripts/start_with_docker.sh.
```bash
##Set Your proxy and cache path here
HTTP_PROXY='Your proxy'
HTTPS_PROXY='Your proxy'
HF_TOKEN='Your hf_token'
code_checkout_path='If you need to use the modified llm-on-ray repository, define your path here'
model_cache_path='If you need to use huggingface model cache, define your path here'
```

#### 1. Build Docker Image
Software requirement: Ubuntu and Docker
```bash
## If you need to use proxy, please change any settings in 'dev/scripts/start_with_docker.sh'
source dev/scripts/start_with_docker.sh
## Docker flie path is 'dev/docker/Dockerfile.user'.
build_docker ## Use default cpu and deepspeed for llm serving
```

Change build_docker fuction's args for different environment
```bash
build_docker vllm ## use vllm for llm serving
build_docker ipex-llm ## use ipex-vllm for llm serving
```

#### 2. Start Docker
```bash
## If you need to use the modified llm-on-ray repository or model cache path
## please change any settings in 'dev/scripts/start_with_docker.sh'

start_docker ## Run docker with default-model(gpt2) serving
start_docker {Supported models,gpt-j-6b/llama-2-7b-chat-hf/gemma-2b,etc.} ## Run docker with other model serving

## You can mount your own repositories and modify the model config file to support more models
```

#### 3. Start LLM-on-Ray
```bash
## Access and test model Same as start with source code
# using requests library
docker exec serving bash -c "python examples/inference/api_server_openai/query_http_requests.py"
# using OpenAI SDK
docker exec serving bash -c "pip install openai>=1.0"
docker exec serving bash -c "export OPENAI_BASE_URL=http://localhost:8000/v1"
docker exec serving bash -c "export OPENAI_API_KEY="not_a_real_key""
docker exec serving bash -c "python examples/inference/api_server_openai/query_openai_sdk.py"
```

## Documents
The following are detailed guidelines for pretraining, finetuning and serving LLMs in various computing environment.

Expand Down
52 changes: 52 additions & 0 deletions dev/docker/Dockerfile.user
yutianchen666 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# syntax=docker/dockerfile:1
FROM ubuntu:22.04

# Define build arguments
ARG DOCKER_NAME=default
ARG PYPJ=default
ENV LANG C.UTF-8

WORKDIR /root/

RUN --mount=type=cache,target=/var/cache/apt apt-get update -y \
&& apt-get install -y build-essential cmake wget curl git vim htop ssh net-tools \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ENV CONDA_DIR /opt/conda
RUN wget --quiet https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-x86_64.sh -O ~/miniforge.sh && \
/bin/bash ~/miniforge.sh -b -p /opt/conda
ENV PATH $CONDA_DIR/bin:$PATH

# setup env
SHELL ["/bin/bash", "--login", "-c"]

RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
unset -f conda && \
export PATH=$CONDA_DIR/bin/:${PATH} && \
mamba config --add channels intel && \
mamba install -y -c conda-forge python==3.9 gxx=12.3 gxx_linux-64=12.3 libxcrypt

# Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s)
ARG CACHEBUST=1

RUN git clone https://github.com/intel/llm-on-ray.git
RUN if [ -d "llm-on-ray" ]; then echo "Clone successful"; else echo "Clone failed" && exit 1; fi
WORKDIR /root/llm-on-ray

RUN git fetch origin pull/219/head:pr-219 && \
git checkout pr-219

RUN ls -la

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[${PYPJ}] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

# Use shell scripting to conditionally install packages
RUN if [ "${DOCKER_NAME}" = ".cpu_and_deepspeed" ]; then ds_report && ./dev/scripts/install-oneapi.sh;fi
RUN if [ "${DOCKER_NAME}" = ".ipex-llm" ]; then ./dev/scripts/install-oneapi.sh; fi
RUN if [ "${DOCKER_NAME}" = ".vllm" ]; then ./dev/scripts/install-vllm-cpu.sh; fi


RUN chmod +x ./dev/scripts/entrypoint_user.sh
ENTRYPOINT ["./dev/scripts/entrypoint_user.sh"]
9 changes: 8 additions & 1 deletion dev/docker/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
Dockerfiles for CI tests. There could be one Dockerfile with ARG declared to distinguish different pip extras. However, ARG will bust cache of 'pip install', which usually takes long time, when build docker image. Instead, we have two almost identical Dockerfiles here to improve CI efficiency.
Dockerfiles for users to convenient build containers.
1.Dockerfile.user for user to build llm-on-ray with docker on Intel CPU.
2.Dockerfile.habana for user to build llm-on-ray with docker on Intel GPU .

yutianchen666 marked this conversation as resolved.
Show resolved Hide resolved
Dockerfiles for CI tests in 'ci/*'.
In CI, the environment required by different models is separated, and the dockerfiles with different functions are distinguished by different suffixes.

There could be one Dockerfile with ARG declared to distinguish different pip extras. However, ARG will bust cache of 'pip install', which usually takes long time, when build docker image. Instead, we have two almost identical Dockerfiles here to improve CI efficiency.
32 changes: 32 additions & 0 deletions dev/docker/ci/Dockerfile.habana
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM vault.habana.ai/gaudi-docker/1.15.1/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest

ENV LANG=en_US.UTF-8

WORKDIR /root/llm-on-ray

COPY ./pyproject.toml .
COPY ./MANIFEST.in .

# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN pip install -e . && \
pip install --upgrade-strategy eager optimum[habana] && \
pip install git+https://github.com/HabanaAI/[email protected]

# Optinal. Comment out if you are not using UI
COPY ./dev/scripts/install-ui.sh /tmp

RUN /tmp/install-ui.sh

RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
service ssh restart

ENV no_proxy=localhost,127.0.0.1

# Required by DeepSpeed
ENV RAY_EXPERIMENTAL_NOSET_HABANA_VISIBLE_MODULES=1

ENV PT_HPU_LAZY_ACC_PAR_MODE=0

ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=true
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions dev/scripts/ci-functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ build_and_prune() {
fi

echo "Build Docker image and perform cleaning operation"
echo "docker build ./ ${docker_args[@]} -f dev/docker/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest && yes | docker container prune && yes | docker image prune -f"
echo "docker build ./ ${docker_args[@]} -f dev/docker/ci/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest && yes | docker container prune && yes | docker image prune -f"

# Build Docker image and perform cleaning operation
docker build ./ "${docker_args[@]}" -f dev/docker/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest && yes | docker container prune && yes
docker build ./ "${docker_args[@]}" -f dev/docker/ci/Dockerfile${DF_SUFFIX} -t ${TARGET}:latest && yes | docker container prune && yes
docker image prune -f

}
Expand Down
32 changes: 32 additions & 0 deletions dev/scripts/entrypoint_user.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
set -e

# Check if an environment variable exists and print its value
if [ -n "$hf_token" ]; then
echo "The hf_token environment variable is: $hf_token"
yutianchen666 marked this conversation as resolved.
Show resolved Hide resolved
# Execute Hugging Face CLI login command
huggingface-cli login --token "${hf_token}"
else
echo "Environment variable 'hf_token' is not set."
fi

# Default serve cmd
if ! pgrep -f 'ray'; then
echo "Ray is not running. Starting Ray..."
# start Ray
ray start --head
echo "Ray started."
else
echo "Ray is already running."
fi

if [ -n "$model_name" ]; then
echo "Using User Model: $model_name"
llm_on_ray-serve --models $model_name
else
echo "Using Default Model: gpt2"
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml
fi

#Keep the service not be exited
tail -f /dev/null
66 changes: 66 additions & 0 deletions dev/scripts/start_with_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env bash
set -eo pipefail

##Set Your proxy and cache path here
HTTP_PROXY='Your proxy'
HTTPS_PROXY='Your proxy'
HF_TOKEN='Your hf_token'
code_checkout_path='If you need to use the modified llm-on-ray repository, define your path here'
model_cache_path='If you need to use huggingface model cache, define your path here'
MODEL_CACHE_PATH_LOACL='/root/.cache/huggingface/hub'
CODE_CHECKOUT_PATH_LOCAL='/root/llm-on-ray'


build_docker() {
local DOCKER_NAME=$1

docker_args=()
docker_args+=("--build-arg=CACHEBUST=1")
if [ "$DOCKER_NAME" == "vllm" ]; then
docker_args+=("--build-arg=DOCKER_NAME=".vllm"")
docker_args+=("--build-arg=PYPJ="vllm"")
elif [ "$DOCKER_NAME" == "ipex-llm" ]; then
docker_args+=("--build-arg=DOCKER_NAME=".ipex-llm"")
docker_args+=("--build-arg=PYPJ="ipex-llm"")
else
docker_args+=("--build-arg=DOCKER_NAME=".cpu_and_deepspeed"")
docker_args+=("--build-arg=PYPJ="cpu,deepspeed"")
fi

# # If you need to use proxy,activate the following two lines
# docker_args+=("--build-arg=http_proxy=${HTTP_PROXY}")
# docker_args+=("--build-arg=https_proxy=${HTTPS_PROXY}")


echo "Build Docker image and perform cleaning operation"
echo "docker build ./ ${docker_args[@]} -f dev/docker/Dockerfile.user -t serving:latest"

# Build Docker image and perform cleaning operation
docker build ./ "${docker_args[@]}" -f dev/docker/Dockerfile.user -t serving:latest

}

start_docker() {
local MODEL_NAME=$1

docker_args=()
docker_args+=("--name=serving" )
docker_args+=("-e=hf_token=${HF_TOKEN}")
if [ -z "$MODEL_NAME" ]; then
echo "use default model"
else
docker_args+=("-e=model_name=${MODEL_NAME}")
fi

# # If you need to use proxy,activate the following two lines
# docker_args+=("-e=http_proxy=${HTTP_PROXY}")
# docker_args+=("-e=https_proxy=${HTTPS_PROXY}")

# # If you need to use the modified llm-on-ray repository or huggingface model cache, activate the corresponding row
# docker_args+=("-v=${code_checkout_path}:${CODE_CHECKOUT_PATH_LOCAL}")
# docker_args+=("-v=${model_cache_path}:${MODEL_CACHE_PATH_LOACL}")

echo "docker run -tid "${docker_args[@]}" "serving:latest""
docker run -tid "${docker_args[@]}" "serving:latest"

}
67 changes: 67 additions & 0 deletions dev/scripts/start_with_docker_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env bash
set -eo pipefail

##Set Your proxy and cache path here
HTTP_PROXY='http://10.24.221.169:911'
HTTPS_PROXY='http://10.24.221.169:911'
HF_TOKEN='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'
code_checkout_path='/home/yutianchen/Project/pr_lib/llm-on-ray'
model_cache_path='/home/yutianchen/.cache/huggingface/hub'
MODEL_CACHE_PATH_LOACL='/root/.cache/huggingface/hub'
CODE_CHECKOUT_PATH_LOCAL='/root/llm-on-ray'


build_docker() {
local DOCKER_NAME=$1

docker_args=()
docker_args+=("--build-arg=CACHEBUST=1")
if [ "$DOCKER_NAME" == "vllm" ]; then
docker_args+=("--build-arg=DOCKER_NAME=".vllm"")
docker_args+=("--build-arg=PYPJ="vllm"")
elif [ "$DOCKER_NAME" == "ipex-llm" ]; then
docker_args+=("--build-arg=DOCKER_NAME=".ipex-llm"")
docker_args+=("--build-arg=PYPJ="ipex-llm"")
else
docker_args+=("--build-arg=DOCKER_NAME=".cpu_and_deepspeed"")
docker_args+=("--build-arg=PYPJ="cpu,deepspeed"")
fi

# # If you need to use proxy,activate the following two lines
docker_args+=("--build-arg=http_proxy=${HTTP_PROXY}")
docker_args+=("--build-arg=https_proxy=${HTTPS_PROXY}")


echo "Build Docker image and perform cleaning operation"
echo "docker build ./ ${docker_args[@]} -f dev/docker/Dockerfile.user -t serving:latest"

# Build Docker image and perform cleaning operation
# docker build ./ "${docker_args[@]}" -f dev/docker/Dockerfile.user -t serving:latest

}

start_docker() {
local MODEL_NAME=$1

docker_args=()
docker_args+=("--name=serving" )

docker_args+=("-e=hf_token=${HF_TOKEN}")
if [ -z "$MODEL_NAME" ]; then
echo "use default model"
else
docker_args+=("-e=model_name=${MODEL_NAME}")
fi

# # If you need to use proxy,activate the following two lines
docker_args+=("-e=http_proxy=${HTTP_PROXY}")
docker_args+=("-e=https_proxy=${HTTPS_PROXY}")

# # If you need to use the modified llm-on-ray repository or huggingface model cache, activate the corresponding row
docker_args+=("-v=${code_checkout_path}:${CODE_CHECKOUT_PATH_LOCAL}")
docker_args+=("-v=${model_cache_path}:${MODEL_CACHE_PATH_LOACL}")

echo "docker run -tid "${docker_args[@]}" "serving:latest""
docker run -tid "${docker_args[@]}" "serving:latest"

}
Loading