Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implemention of disaggregated attention and qkv projection #1433

Closed
wants to merge 226 commits into from

Conversation

yingchen21
Copy link
Collaborator

@yingchen21 yingchen21 commented Jul 9, 2024

Description of changes:
This PR moves the qkv projection (and output projection) from the attention operator to a seperate dense layer, so we can apply LORA to the projections of attention.

The qkv projection (and output projection) is removed from the attention operator, and done by dense instead. The attention operator now no longer has tensor weights. It takes in qkv projection and output the raw attention result (without applying output projection) with dimension vdim * num_kv_heads instead of the hidden demension.

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants