Paged attention new #1522

Bob-Chen222 · 2024-10-10T20:09:24Z

Description of changes:

I have added page attention for the spec scheduler. I will clean up the print statement and add more documentation tomorrow. Let me know if anything needs to be changed!

One thing to notice is that both the specscheduler branch and this branch suffer from an "invalid argument" error in Cuda, but I think a small fix would solve this problem on both branches.

This change is

…ion_new

Bob-Chen222 · 2024-11-08T02:05:22Z

Some more updates:

added max-kv-cache-size as a flag. At initialization, the page manager will be initialized with the number of hidden layers as input so that the page manager will know how much kv cache will have to be allocated per transformer layer. Then when we init the metadata of inference operators, we call page manager to get the kv cache size needed and allocate the slots accordingly.
add paged attention support for incr_decoding
clean up the comments and reorganize the format.
after all these changes the performance is nearly the same as before

lockshaw · 2025-01-09T22:18:59Z

Moved to flexflow/flexflow-serve#82

Bob-Chen222 and others added 8 commits October 3, 2024 15:51

add page_manager and request_manager functions

71d8a7b

add batch_config

0eaca39

request manager h and request manger cc to be continued

151872f

Merge remote-tracking branch 'origin/specscheduler' into paged_attent…

904364d

…ion_new

refactored the interface of block manager but may not be bug free

e3abef8

ckpt before build

73dc699

some fix

de0b803

ready for sanity check

0e405c1

Bob-Chen222 marked this pull request as draft October 10, 2024 20:09

Bob-Chen222 and others added 12 commits October 10, 2024 13:11

Merge remote-tracking branch 'origin/specscheduler' into paged_attent…

dec2266

…ion_new

fix last commit index

6b4777e

fix request id error

8394f15

fix spec token num

2ec8b5b

fix small error in free_multiple_blocks

b12df8c

ckpt single request

6298f2a

add cleanup

c00ddec

ckpt before index error in prepare_parameters

b1ff323

fix token error in prepare_batch_config

8a3975a

ckpt, something wrong in the prefilling

f4e73ea

ckpt

4eeb021

Merge remote-tracking branch 'origin/specscheduler' into paged_attent…

12fafa3

…ion_new

Bob-Chen222 requested review from zikun-li and jiazhihao October 12, 2024 06:58

Bob-Chen222 marked this pull request as ready for review October 12, 2024 06:59

Bob-Chen222 added 4 commits October 12, 2024 00:02

update

3ad0ca5

Merge remote-tracking branch 'origin/specscheduler' into paged_attent…

945dee9

…ion_new

add some docuementation and delete print

19e41d6

add additional flag max-kv-cache-size

b1793fb

Bob-Chen222 force-pushed the paged_attention_new branch from 2502c9b to 50e38f6 Compare October 29, 2024 18:36

chenzhuofu self-requested a review October 31, 2024 17:07

Bob-Chen222 and others added 26 commits November 4, 2024 09:49

fix for merge

832f5cb

init page manager at request manager init and clean the format

4a7162f

ckpt

6b74f93

refactor and add kv cache flag via page manager

20cb714

ckpt for performance issue

311c450

first attempt in incr decoding with page attention

a493f2a

ckpt for nothing

5250a3b

fix compilation error

810983e

all good for spec, now test incr

f7656be

typo

8c203ec

workable incrdecoding!

3c158f8

Merge remote-tracking branch 'origin/specscheduler' into paged_attent…

3b34a5b

…ion_new

refactor

7d612f7

some format

07ec33e

Update request_manager.h

dad3d0f

Update llama.cc

1693455

Update spec_infer.cc

a17c130

Update trace_generator.cc

0f16daf

Update tree_inc_multihead_self_attention.cu

ff7de09

Update tree_inc_multihead_self_attention.cu

e3815a9

Update tree_inc_multihead_self_attention.cu

38f6ef8

Update page_manager.cc

80ea225

Update request_manager.cc

5fe3a8a

Update request_manager.cc

a721926

Update request_manager.cc

1e7e2ec

Update request_manager.cc

1792981

final update

95023e6

lockshaw closed this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paged attention new #1522

Paged attention new #1522

Bob-Chen222 commented Oct 10, 2024 •

edited by wmdi

Loading

Bob-Chen222 commented Nov 8, 2024

lockshaw commented Jan 9, 2025

Paged attention new #1522

Paged attention new #1522

Conversation

Bob-Chen222 commented Oct 10, 2024 • edited by wmdi Loading

Bob-Chen222 commented Nov 8, 2024

lockshaw commented Jan 9, 2025

Bob-Chen222 commented Oct 10, 2024 •

edited by wmdi

Loading