-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel]:intermediate api supports LoRA #70539
[AutoParallel]:intermediate api supports LoRA #70539
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
973cdbf
to
3d0b643
Compare
"llama.layers.*.mlp.up_proj": dist.ColWiseParallel(), | ||
"llama.layers.*.mlp.down_proj": dist.RowWiseParallel(), | ||
"lm_head.weight": dist.ColWiseParallel(), | ||
f"{prefix}llama.layers.*.self_attn.o_proj.lora_B": dist.RowWiseParallel(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the input of o_proj
is RowWiseParallel
, so we shall config f"{prefix}llama.layers.*.self_attn.o_proj.lora_A": dist.RowWiseParallel(),
rather than q_proj.lora_B?
gather_output=True | ||
), | ||
"llama.layers.*.self_attn.k_proj": dist.ColWiseParallel( | ||
f"{prefix}llama.layers.*.self_attn.q_proj.lora_A": dist.ColWiseParallel(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we shall keep result of lora is ColWiseParallel
, so we shall config f"{prefix}llama.layers.*.self_attn.q_proj.lora_B": dist.ColWiseParallel(),
rather than q_proj.lora_A
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue of k_proj.lora_A
and v_proj.lora_A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now lora_A is RowWiseParallel when layer is RowWiseParallel.
lora_B is ColWiseParallel when layer is ColWiseParallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lora_B is ColWiseParallel when layer is RowWiseParallel
lora_B is ColWiseParallel when layer is ColWiseParallel?
shard_param_list.add("weight") | ||
shard_param_list.add("bias") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if shard_param_list
is not [], should not add weight and bias? consider the plan of shard_param_list is different from weight and bias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shard_param_list will add weight or bias only when shard_param_list is []
93229ac
to
92b162d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
New features
Description
中层api支持lora&添加lora 单测
Pcard-67164