Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate vllm/ns into llm-on-ray #267

Open
wants to merge 86 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
d2d1f20
add benchmark run script, visualize script
KepingYan Apr 17, 2024
88cc01e
upd
KepingYan Apr 26, 2024
083ae60
update multi replicas
KepingYan May 7, 2024
4c6fa74
use --result-dir to parse results
KepingYan May 8, 2024
1b3b13a
fix ci proxy
KepingYan May 8, 2024
184e00e
add test ci
KepingYan May 9, 2024
bd85b7d
add license
KepingYan May 9, 2024
38c52ed
fix
KepingYan May 9, 2024
78dc091
fix
KepingYan May 9, 2024
7cc0de0
add autoscaling config
KepingYan May 10, 2024
e241b25
fix ci
KepingYan May 10, 2024
3eb1c08
fix ci
KepingYan May 10, 2024
882ff4d
add package matplotlib
KepingYan May 10, 2024
21994cd
verify CI test
KepingYan May 10, 2024
d688804
verify CI test
KepingYan May 11, 2024
c8eabbc
create assets folder to place pictures
KepingYan May 13, 2024
3905082
verify CI test
KepingYan May 13, 2024
97ec06a
support openai autoscaling
KepingYan May 13, 2024
606f286
remove
KepingYan May 13, 2024
55c1dd1
integrate vllm and ns
jiafuzha May 16, 2024
e709010
update config file
KepingYan May 17, 2024
5b1bd85
integrate vllm and ns
jiafuzha May 17, 2024
eb71ace
integrate vllm and ns
jiafuzha May 17, 2024
a969f7f
remove .eggs
jiafuzha May 17, 2024
1b6aba3
integration adjustment
jiafuzha May 17, 2024
ce3ac61
llm on ray deployed
jiafuzha May 20, 2024
213ad89
llm on ray deployed
jiafuzha May 20, 2024
9b4884f
llm on ray deployed
jiafuzha May 21, 2024
3cb6f64
more doc
jiafuzha May 21, 2024
3f9ba62
merge with master
jiafuzha May 21, 2024
f6d60be
more doc for installing vllm ext
jiafuzha May 21, 2024
04cddcf
Merge remote-tracking branch 'keping/test_benchmark_script' into vllm…
jiafuzha May 21, 2024
d0d40dd
Merge remote-tracking branch 'keping/autoscaling_config' into vllm-ns…
jiafuzha May 21, 2024
24cc480
bug fix
jiafuzha May 24, 2024
295186e
save
jiafuzha May 27, 2024
875aa89
add vllm-ext/requirements.txt
jiafuzha May 27, 2024
2a462ea
add CMakeLists.txt
jiafuzha May 27, 2024
a105321
changed benchmarks
jiafuzha May 27, 2024
6aa0540
tuned graph build
jiafuzha May 30, 2024
7d6d3b4
graph build time reduced
jiafuzha May 31, 2024
473671e
graph build time reduced
jiafuzha May 31, 2024
1a88edd
configurable perf stats and copy quant config automatically
jiafuzha Jun 4, 2024
dfd26b0
save test script
jiafuzha Jun 5, 2024
65c816f
add max_batched_tokens parameter
jiafuzha Jun 6, 2024
89936d3
adjustment and ray-vllm-examples
jiafuzha Jun 12, 2024
4f088e2
perf tuned and improved by disable mmap for multiple instances
jiafuzha Jun 17, 2024
597d83d
remove unnecessary thread sync in kernels
jiafuzha Jun 19, 2024
e06e53b
merged ns PR 209 7d49516
jiafuzha Jun 24, 2024
b093d3f
change order of loop, batch size first, then iteration
jiafuzha Jun 25, 2024
423fc10
modified some examples
jiafuzha Jun 26, 2024
96657ce
add more parameters for vllm-ns test
JoshuaL3000 Jun 26, 2024
2336966
Merge remote-tracking branch 'refs/remotes/origin/vllm-ns-merged-209-…
jiafuzha Jun 26, 2024
f1d06d9
add more parameters for vllm-ns test
JoshuaL3000 Jun 26, 2024
34664ed
add more parameters for vllm-ns test
JoshuaL3000 Jun 26, 2024
4782617
merged with master
jiafuzha Jun 27, 2024
04b7582
prevent quantization being messed-up with multiple processes
jiafuzha Jun 27, 2024
b791a1d
fix merge error
jiafuzha Jun 27, 2024
79e5daf
rename py to sh
jiafuzha Jun 27, 2024
2c9b287
fix formatting issue
jiafuzha Jun 27, 2024
5ac7907
fix formatting issue
jiafuzha Jun 27, 2024
19fc069
fix merge error
JoshuaL3000 Jun 27, 2024
76fe811
Merge remote-tracking branch 'refs/remotes/origin/vllm-ns-perf-test' …
jiafuzha Jun 27, 2024
5760c65
add vllm-ns ci
jiafuzha Jun 28, 2024
30efd3f
remove unnecessary logs
jiafuzha Jun 28, 2024
1d9b4e3
remove some debug code
jiafuzha Jun 28, 2024
a14a146
add '--privileged' to docker run
jiafuzha Jun 28, 2024
4f59cb8
set unlimited max lock memory for neural speed engine
jiafuzha Jun 28, 2024
4df4f85
merged with master
jiafuzha Jun 28, 2024
e781d0b
llama-3-8B support
jiafuzha Jul 2, 2024
af7730a
extend token length limit to 8192 for mha
jiafuzha Jul 5, 2024
a92f019
extend token length limit to 8192 for mha
jiafuzha Jul 5, 2024
77ee207
extend token length limit to 8192 for mha (fix) and support different…
jiafuzha Jul 5, 2024
5154887
extend token length limit to 8192 for mha (fix) and support different…
jiafuzha Jul 5, 2024
f8e51a2
add llama3 for plain cpu
jiafuzha Jul 9, 2024
4ab3b0a
benchmark idc simple/medium/complex/verycomplex prompts
jiafuzha Jul 11, 2024
5476705
benchmark idc simple/medium/complex/verycomplex prompts
jiafuzha Jul 12, 2024
ea02ef3
benchmark idc simple/medium/complex/verycomplex prompts
jiafuzha Jul 12, 2024
7952602
add inference_engine resource and app_router resource to distinct eng…
jiafuzha Jul 16, 2024
c5c6a12
Merge remote-tracking branch 'refs/remotes/origin/vllm-ns-merged-209-…
jiafuzha Jul 16, 2024
58ad614
enhanced benchmark script to support IDC test data
jiafuzha Jul 16, 2024
724eced
updated ray startup script to add resources for app_router and infere…
jiafuzha Jul 16, 2024
7a4d7fd
fix first token latency and next token latency issue in open-ai mode …
jiafuzha Jul 16, 2024
5a59427
updated ray startup script to add resources for app_router and infere…
jiafuzha Jul 16, 2024
d5694c2
addressed some review comments
jiafuzha Jul 16, 2024
10f0f7c
fix lint issue
jiafuzha Jul 16, 2024
52c3451
address review comment by getting number of threads from ray num-cpus…
jiafuzha Jul 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/license/header_exclude_files.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
vllm-ext/vllm/extension/ns/__init__.py
8 changes: 6 additions & 2 deletions .github/workflows/workflow_inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
name: inference
strategy:
matrix:
model: [ gpt-j-6b, gpt2, bloom-560m, opt-125m, mpt-7b, mistral-7b-v0.1, mpt-7b-ipex-llm, neural-chat-7b-v3-1, CodeLlama-7b-hf, falcon-7b, starcoder, llama-2-7b-chat-hf, llama-2-7b-chat-hf-vllm, gemma-2b, deepseek-coder-33b-instruct]
model: [ gpt-j-6b, gpt2, bloom-560m, opt-125m, mpt-7b, mistral-7b-v0.1, mpt-7b-ipex-llm, neural-chat-7b-v3-1, CodeLlama-7b-hf, falcon-7b, starcoder, llama-2-7b-chat-hf, llama-2-7b-chat-hf-vllm, llama-2-7b-chat-hf-vllm-ns, gemma-2b, deepseek-coder-33b-instruct]
isPR:
- ${{inputs.ci_type == 'pr'}}

Expand Down Expand Up @@ -97,7 +97,11 @@ jobs:
run: |
TARGET=${{steps.target.outputs.target}}
source dev/scripts/ci-functions.sh
strat_ray ${TARGET}
if [[ "$TARGET" == *ns ]]; then
start_ray ${TARGET} 1
else
start_ray ${TARGET}
fi

- name: Run Inference Test
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/workflow_inference_gaudi2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ jobs:
# check and remove exited container
cid=$(docker ps -a -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker rm $cid; fi
docker run -tid --name="${TARGET}" --hostname="${TARGET}-container" --runtime=habana -v /home/yizhong/Model-References:/root/Model-References -v ${{ inputs.code_checkout_path }}:/root/llm-on-ray -v ${{ inputs.model_cache_path }}:/root/.cache/huggingface/hub/ -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --cap-add sys_ptrace --net=host --ipc=host ${TARGET}:habana
docker run -tid --privileged --name="${TARGET}" --hostname="${TARGET}-container" --runtime=habana -v /home/yizhong/Model-References:/root/Model-References -v ${{ inputs.code_checkout_path }}:/root/llm-on-ray -v ${{ inputs.model_cache_path }}:/root/.cache/huggingface/hub/ -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --cap-add sys_ptrace --net=host --ipc=host ${TARGET}:habana
- name: Start Ray Cluster
run: |
TARGET=${{steps.target.outputs.target}}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/workflow_test_benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ jobs:
# check and remove exited container
cid=$(docker ps -a -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker rm $cid; fi
docker run -tid -v ${{ inputs.model_cache_path }}:/root/.cache/huggingface/hub -v ${{ inputs.code_checkout_path }}:/root/llm-on-ray -e http_proxy=${{ inputs.http_proxy }} -e https_proxy=${{ inputs.https_proxy }} --name="${TARGET}" --hostname="${TARGET}-container" ${TARGET}:latest
docker run -tid --privileged -v ${{ inputs.model_cache_path }}:/root/.cache/huggingface/hub -v ${{ inputs.code_checkout_path }}:/root/llm-on-ray -e http_proxy=${{ inputs.http_proxy }} -e https_proxy=${{ inputs.https_proxy }} --name="${TARGET}" --hostname="${TARGET}-container" ${TARGET}:latest

- name: Start Ray Cluster
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/workflow_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ jobs:
run: |
TARGET=${{steps.target.outputs.target}}
source dev/scripts/ci-functions.sh
strat_ray ${TARGET}
start_ray ${TARGET}

- name: Run Tests
run: |
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,9 @@ build/lib/
*.json
*.txt
*.egg-info
.eggs
*.log
*.so
*.ninja_log
build/
runtime_outs/
19 changes: 18 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ repos:
hooks:
- id: ruff
args: [ --fix, --exit-non-zero-on-fix, --ignore=E402, --ignore=E501, --ignore=E731, --ignore=F401]
exclude: |
(?x)^(
examples/inference/vllm/ray-vllm-examples/llm.py|
vllm-ext/vllm/extension/ns/__init__.py|
)$


# Black needs to be ran after ruff with --fix
- repo: https://github.com/psf/black
Expand All @@ -18,7 +24,18 @@ repos:
rev: "v0.981"
hooks:
- id: mypy
exclude: tests
exclude: |
(?x)^(
tests|
vllm-ext/vllm/extension/ns/model/ns_loader.py|
vllm-ext/vllm/extension/ns/kv_cache/ns_cache.py|
vllm-ext/inference_engine/python/inference_engine/|
vllm-ext/setup.py|
examples/inference/vllm/ray-vllm-examples/llm.py|
llm_on_ray/inference/inference_config.py|
vllm-ext/vllm/extension/ns/
)

additional_dependencies:
- mypy-extensions
- pydantic==1.10.0
Expand Down
Loading
Loading