Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rope Positions Require Higher Precision #7

Merged
merged 1 commit into from
Jul 18, 2024
Merged

Rope Positions Require Higher Precision #7

merged 1 commit into from
Jul 18, 2024

Conversation

fabianlim
Copy link
Collaborator

@fabianlim fabianlim commented Jul 17, 2024

Previously the rope positions were created like this

t = torch.arange(
     self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype
)

Problem

It could happen that inv_freq can end up having a low precision dtype:

  • even if inv_freq was created in float32, it could happen that the model's parameters ended up being downcasted.
  • if so, then it is not reliable to infer the dtype of inv_freq since that may change from creation time.
  • If inv_freq was downcasted to bfloat16, then it has a limited mantissa, and cannot represent values even up to 3000+, of which self.max_seq_len_cached can exceed.

Fix
The fix here is to create the t vector in float32 everytime, and not infer the dtype of inv_freq.

Loss Comparisons

image

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@fabianlim fabianlim changed the title Rope Positions Require Higher Position Rope Positions Require Higher Precision Jul 17, 2024
Copy link

@mayank31398 mayank31398 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
this is not an issue since FSDP does downcasting itself after reset_parameters is called.
but its better to merge this

@fabianlim
Copy link
Collaborator Author

@mayank31398 i see, i did observe the downcasting for DeepSpeed, but I didnt debug it very carefully to ascertain when the downcasting happens.

@mayank31398
Copy link

I think its fine, this still helps with inference.

Copy link
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fabianlim fabianlim merged commit da678a5 into main Jul 18, 2024
6 checks passed
@mayank31398 mayank31398 deleted the fix-rope branch July 23, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants