Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in llama decode and add tests for direct/paged KVCache #143

Merged
merged 12 commits into from
Aug 26, 2024

Conversation

aviator19941
Copy link
Collaborator

@aviator19941 aviator19941 commented Aug 22, 2024

Fixes the decode bug for paged kv cache and adds a couple of tests to compare direct vs. paged kv cache results.
TODO: Fix the skipped test for decode.

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Tests!

sharktank/tests/models/llama/kv_cache_test.py Show resolved Hide resolved
self.rope_dimension_count = 128
self.max_seq_len = 4096
self.start_positions = torch.tensor([8])
self.bs = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have a TODO to implement batch size 1 in the llama model I think: #40

I wonder if it would help to run this test parameterized (https://stackoverflow.com/a/34094) over bs == 1, bs == 2, and bs == 4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'm not sure how that would work yet. I can take a look at that later, just want to get the fix and a couple of basic tests in for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to get in as is for now, but file an issue for yourself so you can circle back to it later.

@aviator19941 aviator19941 requested a review from ScottTodd August 26, 2024 15:59
@aviator19941 aviator19941 changed the title Fix bug in llama decode and add tests for direct/paged 7b Fix bug in llama decode and add tests for direct/paged KVCache Aug 26, 2024
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Might want someone more comfortable with Python/modeling to also review though.

Copy link
Member

@dan-garvey dan-garvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits but great work.

The windows failure I don't see a way to debug outside of trying it yourself. Do you have access to a windows machine?

self.rope_dimension_count = 128
self.max_seq_len = 4096
self.start_positions = torch.tensor([8])
self.bs = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to get in as is for now, but file an issue for yourself so you can circle back to it later.

sharktank/tests/models/llama/kv_cache_test.py Outdated Show resolved Hide resolved
sharktank/tests/models/llama/kv_cache_test.py Show resolved Hide resolved
@ScottTodd
Copy link
Member

Good catch on the Windows failure: https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:118

>       torch.testing.assert_close(paged_decode_attn_output, direct_decode_attn_output)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 4096 / 4096 (100.0%)
E       Greatest absolute difference: 325089.3125 at index (0, 0, 2384) (up to 1e-05 allowed)
E       Greatest relative difference: 8845.42578125 at index (0, 0, 1[120](https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:121)) (up to 1.3e-06 allowed)

I develop primarily on Windows so I could help as needed... but I can't spare the time for that right now. Would be nice to at least mark as XFAIL or skip on the bot.

@aviator19941
Copy link
Collaborator Author

Good catch on the Windows failure: https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:118

>       torch.testing.assert_close(paged_decode_attn_output, direct_decode_attn_output)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 4096 / 4096 (100.0%)
E       Greatest absolute difference: 325089.3125 at index (0, 0, 2384) (up to 1e-05 allowed)
E       Greatest relative difference: 8845.42578125 at index (0, 0, 1[120](https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:121)) (up to 1.3e-06 allowed)

I develop primarily on Windows so I could help as needed... but I can't spare the time for that right now. Would be nice to at least mark as XFAIL or skip on the bot.

I can mark it as XFAIL for now and circle back to it later.

TODO: Fix decode test for Windows

Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
@aviator19941 aviator19941 merged commit bade2ab into main Aug 26, 2024
5 of 6 checks passed
@aviator19941 aviator19941 deleted the llama_cpp_comparison branch August 26, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants