Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wave] Fixes to extend attention #480

Merged
merged 2 commits into from
Feb 10, 2025
Merged

Conversation

harsh-nod
Copy link
Contributor

@harsh-nod harsh-nod commented Feb 8, 2025

This PR adds the following fixes to extend attention:

  1. Fixes the way the block indices were being read for the v matrix in the first loop. Because of the layout differences between the first and second mma, we need to read block indices with a different layout and correspondingly elements per thread when loading from the kv cache.
  2. Moves the qk-transpose scaling inside the kernel for accuracy reasons.
  3. Moves reuse shared allocs pass after minimize global loads.
  4. Minor compiler fix regarding use of Sequence vs list

@harsh-nod harsh-nod force-pushed the gather_idx branch 7 times, most recently from acff8b9 to ec03886 Compare February 10, 2025 02:16
@harsh-nod harsh-nod changed the title Load more elements per thread and use dynamic mappings [Wave] Fixes to extend attention Feb 10, 2025
@harsh-nod harsh-nod force-pushed the gather_idx branch 3 times, most recently from 9a117f6 to 84cc202 Compare February 10, 2025 02:50
Copy link
Contributor

@raikonenfnu raikonenfnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just that one quick question

Signed-off-by: Harsh Menon <[email protected]>
@raikonenfnu raikonenfnu merged commit f0b35e0 into iree-org:main Feb 10, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants