Fix bug in llama decode and add tests for direct/paged KVCache #143

aviator19941 · 2024-08-22T21:00:57Z

Fixes the decode bug for paged kv cache and adds a couple of tests to compare direct vs. paged kv cache results.
TODO: Fix the skipped test for decode.

ScottTodd

Yay! Tests!

sharktank/tests/models/llama/kv_cache_test.py

ScottTodd · 2024-08-23T22:14:49Z

sharktank/tests/models/llama/kv_cache_test.py

+        self.rope_dimension_count = 128
+        self.max_seq_len = 4096
+        self.start_positions = torch.tensor([8])
+        self.bs = 1


We still have a TODO to implement batch size 1 in the llama model I think: #40

I wonder if it would help to run this test parameterized (https://stackoverflow.com/a/34094) over bs == 1, bs == 2, and bs == 4?

I see, I'm not sure how that would work yet. I can take a look at that later, just want to get the fix and a couple of basic tests in for now.

I think its ok to get in as is for now, but file an issue for yourself so you can circle back to it later.

Signed-off-by: aviator19941 <[email protected]>

…efill Signed-off-by: aviator19941 <[email protected]>

Signed-off-by: aviator19941 <[email protected]>

ScottTodd

LGTM. Might want someone more comfortable with Python/modeling to also review though.

dan-garvey

minor nits but great work.

The windows failure I don't see a way to debug outside of trying it yourself. Do you have access to a windows machine?

dan-garvey · 2024-08-26T17:19:26Z

sharktank/tests/models/llama/kv_cache_test.py

+        self.rope_dimension_count = 128
+        self.max_seq_len = 4096
+        self.start_positions = torch.tensor([8])
+        self.bs = 1


I think its ok to get in as is for now, but file an issue for yourself so you can circle back to it later.

sharktank/tests/models/llama/kv_cache_test.py

ScottTodd · 2024-08-26T17:33:24Z

Good catch on the Windows failure: https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:118

>       torch.testing.assert_close(paged_decode_attn_output, direct_decode_attn_output)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 4096 / 4096 (100.0%)
E       Greatest absolute difference: 325089.3125 at index (0, 0, 2384) (up to 1e-05 allowed)
E       Greatest relative difference: 8845.42578125 at index (0, 0, 1[120](https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:121)) (up to 1.3e-06 allowed)

I develop primarily on Windows so I could help as needed... but I can't spare the time for that right now. Would be nice to at least mark as XFAIL or skip on the bot.

aviator19941 · 2024-08-26T18:15:42Z

Good catch on the Windows failure: https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:118
>       torch.testing.assert_close(paged_decode_attn_output, direct_decode_attn_output)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 4096 / 4096 (100.0%)
E       Greatest absolute difference: 325089.3125 at index (0, 0, 2384) (up to 1e-05 allowed)
E       Greatest relative difference: 8845.42578125 at index (0, 0, 1[120](https://github.com/nod-ai/sharktank/actions/runs/10563521551/job/29263925200?pr=143#step:6:121)) (up to 1.3e-06 allowed)
I develop primarily on Windows so I could help as needed... but I can't spare the time for that right now. Would be nice to at least mark as XFAIL or skip on the bot.

I can mark it as XFAIL for now and circle back to it later.

TODO: Fix decode test for Windows Signed-off-by: aviator19941 <[email protected]>

Signed-off-by: aviator19941 <[email protected]>

aviator19941 force-pushed the llama_cpp_comparison branch from 04a4f31 to 9bbb389 Compare August 23, 2024 21:49

aviator19941 requested review from rsuderman, stellaraccident and ScottTodd August 23, 2024 21:54

ScottTodd reviewed Aug 23, 2024

View reviewed changes

aviator19941 force-pushed the llama_cpp_comparison branch from dec7e9e to f89fa99 Compare August 26, 2024 15:38

aviator19941 requested a review from ScottTodd August 26, 2024 15:59

aviator19941 changed the title ~~Fix bug in llama decode and add tests for direct/paged 7b~~ Fix bug in llama decode and add tests for direct/paged KVCache Aug 26, 2024

aviator19941 added 10 commits August 26, 2024 11:30

Add llama2-7b direct kv cache test

78314b3

Signed-off-by: aviator19941 <[email protected]>

Fix bug in decode, add direct/paged 7b tests

6bd3a8f

Signed-off-by: aviator19941 <[email protected]>

Create only attention test and compare paged vs direct results for pr…

d9ccf48

…efill Signed-off-by: aviator19941 <[email protected]>

Skip decode test for now, TODO: fix test

d057f17

Signed-off-by: aviator19941 <[email protected]>

Remove extra prints

79ceb43

Signed-off-by: aviator19941 <[email protected]>

Move config variables to setup

da7559c

Signed-off-by: aviator19941 <[email protected]>

Fix tests and add asserts to check key and value caches match

1e223bb

Signed-off-by: aviator19941 <[email protected]>

Add AMD copyright

844757b

Signed-off-by: aviator19941 <[email protected]>

Use max_seq_len

96b195b

Signed-off-by: aviator19941 <[email protected]>

Remove unnecessary self for decode_attention_mask

b2f2b60

Signed-off-by: aviator19941 <[email protected]>

aviator19941 force-pushed the llama_cpp_comparison branch from 665ee32 to b2f2b60 Compare August 26, 2024 16:30

ScottTodd approved these changes Aug 26, 2024

View reviewed changes

dan-garvey requested changes Aug 26, 2024

View reviewed changes

aviator19941 added 2 commits August 26, 2024 14:58

Fix comments and skip decode test

1ad3381

TODO: Fix decode test for Windows Signed-off-by: aviator19941 <[email protected]>

Add comment for page_id

1ea9a21

Signed-off-by: aviator19941 <[email protected]>

dan-garvey approved these changes Aug 26, 2024

View reviewed changes

aviator19941 merged commit bade2ab into main Aug 26, 2024
5 of 6 checks passed

aviator19941 deleted the llama_cpp_comparison branch August 26, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in llama decode and add tests for direct/paged KVCache #143

Fix bug in llama decode and add tests for direct/paged KVCache #143

aviator19941 commented Aug 22, 2024 •

edited

Loading

ScottTodd left a comment

ScottTodd Aug 23, 2024

aviator19941 Aug 26, 2024

dan-garvey Aug 26, 2024

ScottTodd left a comment

dan-garvey left a comment

dan-garvey Aug 26, 2024

ScottTodd commented Aug 26, 2024

aviator19941 commented Aug 26, 2024

Fix bug in llama decode and add tests for direct/paged KVCache #143

Fix bug in llama decode and add tests for direct/paged KVCache #143

Conversation

aviator19941 commented Aug 22, 2024 • edited Loading

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd Aug 23, 2024

Choose a reason for hiding this comment

aviator19941 Aug 26, 2024

Choose a reason for hiding this comment

dan-garvey Aug 26, 2024

Choose a reason for hiding this comment

ScottTodd left a comment

Choose a reason for hiding this comment

dan-garvey left a comment

Choose a reason for hiding this comment

dan-garvey Aug 26, 2024

Choose a reason for hiding this comment

ScottTodd commented Aug 26, 2024

aviator19941 commented Aug 26, 2024

aviator19941 commented Aug 22, 2024 •

edited

Loading