-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Molmo SupportsQuant #13336
[Quant] Molmo SupportsQuant #13336
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@@ -633,7 +634,8 @@ def forward( | |||
return hidden_states, residual | |||
|
|||
|
|||
class MolmoVisionBackbone(nn.Module): | |||
class MolmoVisionBackbone(nn.Module, SupportsQuant): | |||
packed_modules_mapping = {"merged_linear": ["gate_proj", "up_proj"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this from L1440 now that it's defined in the inner model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that that line is still required by SupportsLoRA
Lines 342 to 343 in ce77eb9
self.packed_modules_mapping = copy.deepcopy( | |
self.model.packed_modules_mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the packed_modules_mapping
attribute should be defined at the lowest level possible, ie on every Module with packed modules.
I think in practice we should add the SupportsQuant
and packed_modules_mapping
to all classes which define a load_weights
method, as it seems like these are the classes which are composable with other models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation!
I think in practice we should add the
SupportsQuant
andpacked_modules_mapping
to all classes which define aload_weights
method, as it seems like these are the classes which are composable with other models
Sounds good to me!
Purpose
SupportsQuant
interface to Molo models whichpacked_modules_mapping
configure_quant_config
function after it has been added to a sufficient number of modelsconfigure_quant_config
assumes that allpacked_modules_mapping
s will be declared prior to initialization. In reality, some models are submodels of each other, so their mappings can only be determined at init time.Prerequisites
SupportsQuant
to phi3 and clip #13104Changes
SupportsQuant
interface toMolmoVisionBackbone
,MolmoModel
, andMolmoForCausalLM