-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Splitwise implementation to vLLM #1
base: main
Are you sure you want to change the base?
Conversation
Initialize MSCCL++ communication group if sep_prompt_token is set in ParallelConfig. Also add documentation for MSCCL++ installation.
- Add worker_type to differentiate prompt, token, and mixed workers - Set a driver for each prompt machines and token machines - Allow broadcasts to take a group - Setup KV Cache communication using MSCCL++ - Add test for KV Cache communication
- Obtain `blocks_to_nw` when creating batches in scheduler. Coalesce blocks where possible for fast network transfers. - Use a Sequence to Semaphore Mapper to allow for fine-grained waiting for kv-cache transfer per sequence - Separately run prompt and token workers using the `_run_stage_workers` helper - Populate KVCacheCommunicator for all PagedAttention modules, which allows implementation of layer-wise sends from within `attention.py` - Populate destination rank for Sampler, which will be used as root in `gather` operations. - Fix `tensor_model_parallel_gather` - use global rank instead of group local rank.
@@ -0,0 +1,41 @@ | |||
"""Test the KV cache communication operators. | |||
|
|||
Run `python test_kvcache_comm.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing there's no CI infra or test running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, they do have one. I will have to figure out how to add this test to the CI infra. Multiple things to figure out - how to use pytest with argparse for LLMEngine and setting up MSCCL++ environment in the CI infra.
No description provided.