[V1] Optimize handling of sampling metadata and req_ids list #13244

njhill · 2025-02-13T23:29:58Z

Move the current SamplingMetadata object to a field in the persistent batch, updated only when the batch changes rather than constructed every step
Keep input_batch.req_ids sized to the number of requests in the batch, so that anywhere that iterates over it doesn't need to slice (copy) the list or keep track of the separate request count. It is still updated in-place

github-actions · 2025-02-13T23:30:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/v1/worker/gpu_input_batch.py

mergify · 2025-02-14T06:40:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

- Move SamplingMetadata to a field in the persistent batch, updated only when the batch changes rather than constructed every step - Keep input_batch.req_ids sized to the number of requests in the batch, so that anywhere that iterates over it doesn't need to slice (copy) the list or keep track of the separate request count. It is still updated in-place Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-02-14T16:29:50Z

@WoosukKwon this is the first step, I am working on follow-on simplification for the penalty parameters, etc.

njhill · 2025-02-14T22:27:46Z

@WoosukKwon apologies, I am looking into the test failure.

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-02-14T23:01:17Z

@WoosukKwon the test failure should be fixed now... the shared apply penalties code was doing in-place unsqueezes on the sampling penalty tensors - which I think is a bad thing to do but didn't cause a problem before because we were passing new slices every step.

mergify · 2025-02-14T23:50:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

# Conflicts: # vllm/v1/worker/gpu_input_batch.py

WoosukKwon · 2025-02-15T02:31:04Z

Hi @njhill, do you mind if we merge #12193 first and review this PR? I'd like to prioritize the spec decode PR as it already got rebased many many times.

njhill · 2025-02-15T03:01:57Z

@WoosukKwon that's fine with me.

Signed-off-by: Nick Hill <[email protected]>

…streamline Signed-off-by: Nick Hill <[email protected]> # Conflicts: # tests/v1/worker/test_gpu_input_batch.py # vllm/v1/sample/sampler.py

Signed-off-by: Nick Hill <[email protected]>

mergify · 2025-02-16T02:13:05Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

WoosukKwon · 2025-02-17T20:10:26Z

@njhill Sorry for the delay. I will review this PR once it's rebased.

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # tests/v1/sample/test_sampler.py # tests/v1/worker/test_gpu_input_batch.py # vllm/v1/worker/gpu_input_batch.py # vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Nick Hill <[email protected]>

vllm/v1/request.py

vllm/v1/core/scheduler.py

njhill · 2025-02-18T01:38:19Z

@WoosukKwon I have now rebased. #13360 partially overlaps with this (e,g. I simplified some of the min_tokens handling in this one but have refactored completely in the other one based on the new abstraction). But I think it would be fine to get this in first and I can rebase the other one if you're ok with that.

Signed-off-by: Nick Hill <[email protected]>

vllm/v1/worker/gpu_model_runner.py

vllm/v1/worker/gpu_input_batch.py

WoosukKwon · 2025-02-18T02:56:14Z

@njhill I'm not sure it's worthwhile to change from [] to ().
I did a microbenchmark:

N = 1024
x = []
# List
start = time.perf_counter()
for i in range(N):
    x.append([])
end = time.perf_counter()
print(f"list: {(end - start) * 1000:.3f} ms")

y = []
# Tuple
start = time.perf_counter()
for i in range(N):
    y.append(())
end = time.perf_counter()
print(f"tuple: {(end - start) * 1000:.3f} ms")

I find that adding 1024 (maximum number of requests in the batch) empty lists only takes 80-90 us. While using tuple reduces this time to 30-40 us, I think the 50 us gap (in the worst case) cannot justify the extra complexity here. When the batch size is 32, the gap becomes even smaller (7 us vs 2 us). WDYT?

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-02-18T03:07:54Z

@WoosukKwon I agree it's not worth any extra complexity. Just might as well use () where it doesn't otherwise make any difference to the code. Let me check and revert where such changes were made..

WoosukKwon · 2025-02-18T03:19:04Z

@njhill I think changing List to Sequence itself is increasing complexity? After that, we need to consider whether it's a tuple or list. I'd prefer to keep using List and [] if the performance is the only concern.

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-02-18T03:21:37Z

@WoosukKwon sure, let me revert those too. I think mostly we don't need to consider the tuple/list difference because these are args or fields that would be considered read-only.

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-02-18T04:39:26Z

@WoosukKwon I need to fix up some of the gpu_model_runner tests, but I'll wait for your first review to make sure you are good with the changes overall before spending time on that.

WoosukKwon

Amazing. Looks much cleaner! 😄

vllm/v1/worker/gpu_model_runner.py

WoosukKwon · 2025-02-18T05:05:47Z

vllm/v1/core/scheduler.py

+                    del request.spec_token_ids[num_scheduled_spec_tokens:]
                    scheduled_spec_decode_tokens[request.request_id] = (
-                        request.spec_token_ids[:num_scheduled_spec_tokens])
+                        request.spec_token_ids)


What is this change for?

It avoids creating a new list, just trims the existing one down to num_scheduled_spec_tokens, since any later spec token ids are essentially discarded anyhow.

Got it! Maybe worth a comment.

vllm/v1/sample/metadata.py

vllm/v1/worker/gpu_input_batch.py

Signed-off-by: Nick Hill <[email protected]>

WoosukKwon

LGTM! Very nice simplification!

Signed-off-by: Nick Hill <[email protected]>

njhill requested review from WoosukKwon, robertgshaw2-redhat, ywang96, comaniac and alexm-redhat as code owners February 13, 2025 23:29

mergify bot added the v1 label Feb 13, 2025

njhill commented Feb 13, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Show resolved Hide resolved

mergify bot added the needs-rebase label Feb 14, 2025

njhill force-pushed the sampler-streamline branch from 2bcf20f to 7d6ee8f Compare February 14, 2025 16:27

mergify bot removed the needs-rebase label Feb 14, 2025

WoosukKwon self-assigned this Feb 14, 2025

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 14, 2025

don't mutate "constant" sampling metadata tensors

37d1f98

Signed-off-by: Nick Hill <[email protected]>

mergify bot added the needs-rebase label Feb 14, 2025

Merge remote-tracking branch 'origin/main' into sampler-streamline

f354b07

# Conflicts: # vllm/v1/worker/gpu_input_batch.py

mergify bot removed the needs-rebase label Feb 15, 2025

njhill added 4 commits February 14, 2025 21:49

simplify sampling metadata

602d3b6

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/main' into sampler-…

80eae4e

…streamline Signed-off-by: Nick Hill <[email protected]> # Conflicts: # tests/v1/worker/test_gpu_input_batch.py # vllm/v1/sample/sampler.py

group stop_token_ids with min_tokens

57cd611

Signed-off-by: Nick Hill <[email protected]>

test updates

c7e2bfd

Signed-off-by: Nick Hill <[email protected]>

mergify bot added the needs-rebase label Feb 16, 2025

njhill mentioned this pull request Feb 17, 2025

[V1] Get input tokens from scheduler #13339

Merged

njhill added 2 commits February 17, 2025 15:41

Merge remote-tracking branch 'origin/main' into sampler-streamline

6fef30c

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # tests/v1/sample/test_sampler.py # tests/v1/worker/test_gpu_input_batch.py # vllm/v1/worker/gpu_input_batch.py # vllm/v1/worker/gpu_model_runner.py

Merge remote-tracking branch 'origin/main' into sampler-streamline

78b04f3

Signed-off-by: Nick Hill <[email protected]>

mergify bot removed the needs-rebase label Feb 18, 2025

Some more small list/tuple optimizations; fix linting

d246ce5

Signed-off-by: Nick Hill <[email protected]>

njhill commented Feb 18, 2025

View reviewed changes

vllm/v1/request.py Outdated Show resolved Hide resolved

vllm/v1/core/scheduler.py Outdated Show resolved Hide resolved

Small adjustment

5e216c7

Signed-off-by: Nick Hill <[email protected]>

njhill commented Feb 18, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

njhill commented Feb 18, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Outdated Show resolved Hide resolved

Fix rejection sampler test

b2a43ba

Signed-off-by: Nick Hill <[email protected]>

Revert change related to list vs tuple

2fbc6e1

Signed-off-by: Nick Hill <[email protected]>

Revert List->Sequence changes

1b68e03

Signed-off-by: Nick Hill <[email protected]>

WoosukKwon reviewed Feb 18, 2025

View reviewed changes

njhill added 2 commits February 18, 2025 07:51

Address review comments

28a17ae

Signed-off-by: Nick Hill <[email protected]>

Fix up gpu_model_runner tests

9250721

Signed-off-by: Nick Hill <[email protected]>

WoosukKwon approved these changes Feb 18, 2025

View reviewed changes

Add comment

ce3c3f4

Signed-off-by: Nick Hill <[email protected]>

njhill merged commit 30172b4 into vllm-project:main Feb 18, 2025
44 checks passed

njhill deleted the sampler-streamline branch February 18, 2025 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Optimize handling of sampling metadata and req_ids list #13244

[V1] Optimize handling of sampling metadata and req_ids list #13244

njhill commented Feb 13, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 13, 2025

mergify bot commented Feb 14, 2025

njhill commented Feb 14, 2025

njhill commented Feb 14, 2025

njhill commented Feb 14, 2025

mergify bot commented Feb 14, 2025

WoosukKwon commented Feb 15, 2025

njhill commented Feb 15, 2025

mergify bot commented Feb 16, 2025

WoosukKwon commented Feb 17, 2025

njhill commented Feb 18, 2025 •

edited

Loading

WoosukKwon commented Feb 18, 2025

njhill commented Feb 18, 2025

WoosukKwon commented Feb 18, 2025

njhill commented Feb 18, 2025

njhill commented Feb 18, 2025

WoosukKwon left a comment

WoosukKwon Feb 18, 2025

njhill Feb 18, 2025

WoosukKwon Feb 18, 2025

WoosukKwon left a comment

[V1] Optimize handling of sampling metadata and req_ids list #13244

[V1] Optimize handling of sampling metadata and req_ids list #13244

Conversation

njhill commented Feb 13, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 13, 2025

mergify bot commented Feb 14, 2025

njhill commented Feb 14, 2025

njhill commented Feb 14, 2025

njhill commented Feb 14, 2025

mergify bot commented Feb 14, 2025

WoosukKwon commented Feb 15, 2025

njhill commented Feb 15, 2025

mergify bot commented Feb 16, 2025

WoosukKwon commented Feb 17, 2025

njhill commented Feb 18, 2025 • edited Loading

WoosukKwon commented Feb 18, 2025

njhill commented Feb 18, 2025

WoosukKwon commented Feb 18, 2025

njhill commented Feb 18, 2025

njhill commented Feb 18, 2025

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon Feb 18, 2025

Choose a reason for hiding this comment

njhill Feb 18, 2025

Choose a reason for hiding this comment

WoosukKwon Feb 18, 2025

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

njhill commented Feb 13, 2025 •

edited by github-actions bot

Loading

njhill commented Feb 18, 2025 •

edited

Loading