Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[疑问] 关于DeepseekV3Model类默认关闭gradient checkpointing.的原因 #670

Open
ShaohonChen opened this issue Feb 16, 2025 · 1 comment

Comments

@ShaohonChen
Copy link

ShaohonChen commented Feb 16, 2025

感谢DeepSeek团队优秀的工作!

我在阅读HuggingFace上DeepSeek-V3模型的代码时,发现在modeling_deepseek.py中DeepseekV3PreTrainedModel类中声明了支持supports_gradient_checkpointing = True,但是在DeepseekV3Model类中似乎默认关闭了gradient checkpointing. (1372行)代码的切片如下:

class DeepseekV3Model(DeepseekV3PreTrainedModel):
    """
    Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`DeepseekV3DecoderLayer`]

    Args:
        config: DeepseekV3Config
    """

    def __init__(self, config: DeepseekV3Config):
        super().__init__(config)
        self.padding_idx = config.pad_token_id
        self.vocab_size = config.vocab_size

        self.embed_tokens = nn.Embedding(
            config.vocab_size, config.hidden_size, self.padding_idx
        )
        self.layers = nn.ModuleList(
            [
                DeepseekV3DecoderLayer(config, layer_idx)
                for layer_idx in range(config.num_hidden_layers)
            ]
        )
        self._use_flash_attention_2 = config._attn_implementation == "flash_attention_2"
        self.norm = DeepseekV3RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

        self.gradient_checkpointing = False       # 似乎是默认关闭了gradient_checkpointing
        # Initialize weights and apply final processing
        self.post_init()

我想请教下开发者为什么要这么设置呢?期待开发者和社区伙伴的答疑。

@ShaohonChen
Copy link
Author

ShaohonChen commented Feb 16, 2025

同样的问题我也发在了R1的开源仓库中deepseek-ai/DeepSeek-R1#420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant