[RFC] Support FSDP2 #3231

kmehant · 2024-11-08T19:31:36Z

What does this PR do?

Prototype implementation for porting from FSDP V1 to FSDP V2. There are couple of open questions in this PR that would need comments and discussion.

Do we want to maintain FSDP V1 as is and add a experimental parallel to FSDP V2?
When we want to maintain 2 versions, should we maintain separate FSDP plugins and distributed types for each versions?
For HF/transformers users, using fsdp_config, how we want to allow them to choose between these versions?
How we want prepare 2D mesh for HSDP, should that be an input from user?

Preliminary run of this PR and results

The current version of the PR has been tested for basic functionality (full shard) and compared with previous FSDP V1 implementation.

Key	Value
Model	Maykeye/TinyLLama-v0
Mesh size	2 GPUs
sharding	full shard

Memory

Loss Parity

Throughput

TODO

Fixes #2873

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr

Signed-off-by: Mehant Kammakomati <[email protected]>

raghukiran1224 · 2024-11-12T19:43:36Z

@ByronHsu FYI - thoughts?

HuggingFaceDocBuilderDev · 2024-12-02T18:54:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SumanthRH · 2024-12-13T13:39:10Z

@kmehant thanks for starting this PR! I was looking at FSDP2 support in accelerate and landed here!

Do we want to maintain FSDP V1 as is and add a experimental parallel to FSDP V2?

Yes please! It looks like FSDP2 will be in a public API in the next torch release (2.6): pytorch/pytorch@d815efc , so maybe things are somewhat stable ? But many of the older config parameters (like auto_wrap_policy) are simply not there in V2 so I'd prefer if accelerate users get time to migrate.

When we want to maintain 2 versions, should we maintain separate FSDP plugins and distributed types for each versions?

Hmm if the new API had supported most of the V1 configurations, I would think having only a feature flag would be enough -i.e something like ACCELERATE_FSDP2_ENABLED. But looking at the API differences: https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md#fsdp1--fsdp2-api-differences
I think just having a feature flag + a different plugin FullyShardedDataParallelPluginV2 is better.

For HF/transformers users, using fsdp_config, how we want to allow them to choose between these versions?

It looks like accelerate doesn't do much validation for the config parameters in fsdp_config until plugin initialization. So in the accelerate config, specifying enable_fsdp2 should be fine , and users would be expected to list only v2 parameters in fsdp_config - this is validated when FullyShardedDataParallelPluginV2 is initialized.

SumanthRH · 2024-12-13T13:40:30Z

cc @muellerzr curious to know if this already in the pipeline internally from HF!

kyleliang919 · 2025-01-01T00:14:21Z

src/accelerate/accelerator.py

+                    # auto_wrap_policy is not yet supported by FSDP2
+                    # therefore manual wrapping has to be done like below
+                    #######
+                    for layer in model.model.layers:


This one doesn't seem to apply to general use case.
Feels like it should be something like below that checks and apply fully_shard from bottom up.

stack = [model] ordered_modules = [] while stack: current_modules = stack.pop() for _, attr in current_module.__dict__.items(): if isinstance(attr, torch.nn.Module): stack.append(attr) ordered_modules.append(current_module) for each in ordered_modules[::-1]: fully_shard(each, **fsdp2_kwargs)

kmehant added 2 commits November 9, 2024 00:35

feat: add fsdp2

1c8c579

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: add support for fsdp 2 and document reqs

86211e8

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant mentioned this pull request Nov 8, 2024

Plan to support FSDP2? #2873

Open

kyleliang919 reviewed Jan 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Support FSDP2 #3231

[RFC] Support FSDP2 #3231

kmehant commented Nov 8, 2024

raghukiran1224 commented Nov 12, 2024

HuggingFaceDocBuilderDev commented Dec 2, 2024

SumanthRH commented Dec 13, 2024

SumanthRH commented Dec 13, 2024

kyleliang919 Jan 1, 2025 •

edited

Loading

[RFC] Support FSDP2 #3231

Are you sure you want to change the base?

[RFC] Support FSDP2 #3231

Conversation

kmehant commented Nov 8, 2024

What does this PR do?

Preliminary run of this PR and results

Memory

Loss Parity

Throughput

Before submitting

Who can review?

raghukiran1224 commented Nov 12, 2024

HuggingFaceDocBuilderDev commented Dec 2, 2024

SumanthRH commented Dec 13, 2024

SumanthRH commented Dec 13, 2024

kyleliang919 Jan 1, 2025 • edited Loading

Choose a reason for hiding this comment

kyleliang919 Jan 1, 2025 •

edited

Loading