-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[muP] Rework #1087
Open
lintangsutawika
wants to merge
109
commits into
main
Choose a base branch
from
rework-mup
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[muP] Rework #1087
Changes from 58 commits
Commits
Show all changes
109 commits
Select commit
Hold shift + click to select a range
0d921f7
changed ordering for setting up norm_factor
lintangsutawika abee54d
Update NeoXArgs docs automatically
invalid-email-address a08c3ef
updated muP args to the minimum required
lintangsutawika c35e830
calculate m_width
lintangsutawika 2807e52
Merge branch 'main' of https://github.com/EleutherAI/gpt-neox into re…
lintangsutawika 2d127df
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika 81fdc4d
Update NeoXArgs docs automatically
invalid-email-address 7d6b246
changed ordering for setting up norm_factor
lintangsutawika a0d1929
updated muP args to the minimum required
lintangsutawika d63b3b8
calculate m_width
lintangsutawika 9be82fe
Update NeoXArgs docs automatically
invalid-email-address 66214d9
removed redundant line
lintangsutawika 17b7183
removed redundant lines
lintangsutawika a6bad07
Update NeoXArgs docs automatically
invalid-email-address 63984bd
removed redundant lines
lintangsutawika 02687a8
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika 11114e2
Update NeoXArgs docs automatically
invalid-email-address 05c4de3
modify init with mup
lintangsutawika 71a91e4
divide logits by the m_width
lintangsutawika 99c8ce0
moved position of mup parameters being processed
lintangsutawika b253ab6
add note
lintangsutawika 1919499
made param groups to hold flag for mup scaling
lintangsutawika 17678e0
lr scale
lintangsutawika 2bd5ae6
update config
lintangsutawika 6642291
adjust process of mup variables
lintangsutawika 8be6c66
remove calling save_base_shapes
lintangsutawika c9fb18b
lr adjustments is done in train_step to address lr being reset due to…
lintangsutawika 795371c
lr scaling for mup is moved here instead
lintangsutawika 087beee
removed mup usage for coord check
lintangsutawika 16d04b1
merged with main
lintangsutawika e7b7bf6
latest update on coord check implementation
lintangsutawika 8dea9ce
fix merge conflict
lintangsutawika 3664eba
changed `mup_m_width` to `mup_width_multiplier`
lintangsutawika 6a46247
fixed notations
lintangsutawika 7439f9a
correct scale
lintangsutawika 5b2d31c
m_emb * embed(X)
lintangsutawika 98caa82
removed mup rescale in the layers
lintangsutawika 5c99637
removed mup rescale in the layers
lintangsutawika a636f06
adjust mup_m_emb to mup_embedding_multiplier
lintangsutawika 39190c5
add multiplier mup_output_multiplier
lintangsutawika 2489cc0
reorder model loading
lintangsutawika 23b8776
removed comments
lintangsutawika 10e935e
removed comments
lintangsutawika a0aca99
implement full process
lintangsutawika 9472b35
set neox_args.iteration to 0 for coord_check mode
lintangsutawika 5c5f2df
move mup_width_multiplier init
lintangsutawika 7eca3e7
mup_coord_check returns 2 df
lintangsutawika c9a3a65
can run
lintangsutawika a7877d4
remove commehts
lintangsutawika bd9d399
add hooks
lintangsutawika fe180d3
remove comments
lintangsutawika b240c19
uncomment activation data
lintangsutawika 93b4241
plot coords
lintangsutawika d4899fc
removed variables, add way to plot only from rank 0
lintangsutawika f589e29
changed key name in dict
lintangsutawika 8261e0d
remove print
lintangsutawika 25aa786
fix how width_multiplier is applied
lintangsutawika 4d246a1
updated plot config
lintangsutawika 84c5380
update files
lintangsutawika b2f1101
Merge branch 'main' into rework-mup
lintangsutawika 42d4cde
Update NeoXArgs docs automatically
invalid-email-address 4c477d5
init function, add input embedding different initialization
lintangsutawika 64dc4c5
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika 65c103e
changeoutput layer to normal
lintangsutawika 08b5d40
change from mean to std
lintangsutawika 2ca94a8
double attention head for every hidden size doubled
lintangsutawika 7483246
Merge branch 'main' into rework-mup
lintangsutawika 497485c
Update NeoXArgs docs automatically
invalid-email-address 34fb7ca
added args
lintangsutawika 2d53f1f
simplify coordcheck
lintangsutawika 7897610
seperate sp and mup configs
lintangsutawika 4f39209
perform coordcheck for sp and mup seperately
lintangsutawika 5f84a3f
Update NeoXArgs docs automatically
invalid-email-address 479b854
update
lintangsutawika 21a7e32
update how params are sorted
lintangsutawika bb2e0c9
remove unused comments
lintangsutawika bf1ce06
adjust
lintangsutawika 50a3dba
simplify
lintangsutawika c4c1660
fix mup embedding multiplier
lintangsutawika 1c35911
embeddingpipe fix init
lintangsutawika 84be4d4
changed how manual seed is loaded
lintangsutawika fbb4daf
removed musgd and other changces
lintangsutawika fa142ff
update config
lintangsutawika ad2336f
fixed how params are sorted
lintangsutawika fe73bc3
update how seed is computed
lintangsutawika a3bd44c
update to follow pre-commit format
lintangsutawika 56b6c9b
update from main
lintangsutawika 2365fd5
update
lintangsutawika e8639a0
Update NeoXArgs docs automatically
invalid-email-address 47e1438
fix lr weighting
lintangsutawika a064f9b
hard set to 1.0 if neox_args.use_mup is false
lintangsutawika b0da27a
Merge branch 'main' into rework-mup
Quentin-Anthony 6fe55f4
Update NeoXArgs docs automatically
invalid-email-address 8bf8bcd
add new parameters
lintangsutawika 7f0b033
add parameter checks
lintangsutawika f802869
updates to argument processing for mup
lintangsutawika cc71104
add data save and descriptions being printed
lintangsutawika c8feb39
update mup
lintangsutawika b6b3a02
update seed
lintangsutawika 847e892
remove print text
lintangsutawika 1b0027c
fixed kv
lintangsutawika 055596f
update
lintangsutawika fabb45b
update dewcriptions being printed
lintangsutawika 5ccf693
removed unused lines
lintangsutawika 9dd583b
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika 6a8ad71
Merge branch 'main' into rework-mup
lintangsutawika 485cad4
Update NeoXArgs docs automatically
invalid-email-address c291906
Merge branch 'main' into rework-mup
Quentin-Anthony 1ac9add
Merge branch 'main' into rework-mup
Quentin-Anthony File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During our call, we noted that this leads to a bug: Since the width multiplier is applied to all layers, this doesn't allow the embedding layer to be initialized differently from the transformer backbone layers (precisely: muP prescribes that layers who's input and output dimensions both scale with width need to have a sqrt(width) multiplying factor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend refactoring this code: Remove the muP width multiplier completely from the initialization methods code, and only take the initializer parameters in here (e.g., standard deviation). Then, when the initializers are used from various layers, adjust the initializer based on that particular layer's muP width adjustment requirements.