Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paged attention new #1522

Closed
wants to merge 52 commits into from
Closed

Paged attention new #1522

wants to merge 52 commits into from

Conversation

Bob-Chen222
Copy link
Contributor

@Bob-Chen222 Bob-Chen222 commented Oct 10, 2024

Description of changes:

I have added page attention for the spec scheduler. I will clean up the print statement and add more documentation tomorrow. Let me know if anything needs to be changed!

One thing to notice is that both the specscheduler branch and this branch suffer from an "invalid argument" error in Cuda, but I think a small fix would solve this problem on both branches.


This change is Reviewable

@Bob-Chen222 Bob-Chen222 marked this pull request as draft October 10, 2024 20:09
@Bob-Chen222 Bob-Chen222 marked this pull request as ready for review October 12, 2024 06:59
@chenzhuofu chenzhuofu self-requested a review October 31, 2024 17:07
@Bob-Chen222
Copy link
Contributor Author

Some more updates:

  1. added max-kv-cache-size as a flag. At initialization, the page manager will be initialized with the number of hidden layers as input so that the page manager will know how much kv cache will have to be allocated per transformer layer. Then when we init the metadata of inference operators, we call page manager to get the kv cache size needed and allocate the slots accordingly.
  2. add paged attention support for incr_decoding
  3. clean up the comments and reorganize the format.
  4. after all these changes the performance is nearly the same as before

@lockshaw
Copy link
Collaborator

lockshaw commented Jan 9, 2025

Moved to flexflow/flexflow-serve#82

@lockshaw lockshaw closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants