Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Splitwise implementation to vLLM #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add Splitwise implementation to vLLM #1

wants to merge 5 commits into from

Conversation

aashaka
Copy link
Owner

@aashaka aashaka commented Feb 3, 2024

No description provided.

Initialize MSCCL++ communication group if sep_prompt_token is set in
ParallelConfig.
Also add documentation for MSCCL++ installation.
- Add worker_type to differentiate prompt, token, and mixed workers
- Set a driver for each prompt machines and token machines
- Allow broadcasts to take a group
- Setup KV Cache communication using MSCCL++
- Add test for KV Cache communication
- Obtain `blocks_to_nw` when creating batches in scheduler. Coalesce blocks where possible for fast network transfers.
- Use a Sequence to Semaphore Mapper to allow for fine-grained waiting for kv-cache transfer per sequence
- Separately run prompt and token workers using the `_run_stage_workers` helper
- Populate KVCacheCommunicator for all PagedAttention modules, which allows implementation of layer-wise sends from within `attention.py`
- Populate destination rank for Sampler, which will be used as root in `gather` operations.
- Fix `tensor_model_parallel_gather` - use global rank instead of group local rank.
@@ -0,0 +1,41 @@
"""Test the KV cache communication operators.

Run `python test_kvcache_comm.py`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing there's no CI infra or test running?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they do have one. I will have to figure out how to add this test to the CI infra. Multiple things to figure out - how to use pytest with argparse for LLMEngine and setting up MSCCL++ environment in the CI infra.

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/worker/comm_utils.py Outdated Show resolved Hide resolved
vllm/worker/comm_utils.py Outdated Show resolved Hide resolved
vllm/worker/comm_utils.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants