Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant] Molmo SupportsQuant #13336

Merged
merged 5 commits into from
Feb 18, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions vllm/model_executor/models/molmo.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@
from vllm.sequence import IntermediateTensors
from vllm.utils import JSONTree, json_map_leaves

from .interfaces import SupportsLoRA, SupportsMultiModal, SupportsPP
from .interfaces import (SupportsLoRA, SupportsMultiModal, SupportsPP,
SupportsQuant)
from .utils import (AutoWeightsLoader, WeightsMapper, flatten_bn,
is_pp_missing_parameter,
make_empty_intermediate_tensors_factory, make_layers,
Expand Down Expand Up @@ -633,7 +634,8 @@ def forward(
return hidden_states, residual


class MolmoVisionBackbone(nn.Module):
class MolmoVisionBackbone(nn.Module, SupportsQuant):
packed_modules_mapping = {"merged_linear": ["gate_proj", "up_proj"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this from L1440 now that it's defined in the inner model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that that line is still required by SupportsLoRA

vllm/vllm/lora/models.py

Lines 342 to 343 in ce77eb9

self.packed_modules_mapping = copy.deepcopy(
self.model.packed_modules_mapping)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, the packed_modules_mapping attribute should be defined at the lowest level possible, ie on every Module with packed modules.

I think in practice we should add the SupportsQuant and packed_modules_mapping to all classes which define a load_weights method, as it seems like these are the classes which are composable with other models

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation!

I think in practice we should add the SupportsQuant and packed_modules_mapping to all classes which define a load_weights method, as it seems like these are the classes which are composable with other models

Sounds good to me!


def __init__(
self,
Expand Down Expand Up @@ -794,7 +796,7 @@ def load_weights(self, weights: Iterable[Tuple[str,


@support_torch_compile
class MolmoModel(nn.Module):
class MolmoModel(nn.Module, SupportsQuant):

def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
super().__init__()
Expand Down Expand Up @@ -1402,8 +1404,8 @@ def get_replacement_molmo(item_idx: int):
@MULTIMODAL_REGISTRY.register_processor(MolmoMultiModalProcessor,
info=MolmoProcessingInfo,
dummy_inputs=MolmoDummyInputsBuilder)
class MolmoForCausalLM(nn.Module, SupportsMultiModal, SupportsPP,
SupportsLoRA):
class MolmoForCausalLM(nn.Module, SupportsMultiModal, SupportsPP, SupportsLoRA,
SupportsQuant):
hf_to_vllm_mapper = WeightsMapper(
orig_to_new_substr={
# vision backbone mapping
Expand Down