You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
input_ids=inputs["input_ids"]
attention_mask=inputs["attention_mask"]
pixel_values=inputs["pixel_values"]
image_grid_thw=inputs["image_grid_thw"]
inputs_embeds=model.model.embed_tokens(input_ids)
ifpixel_valuesisnotNone:
pixel_values=pixel_values.type(model.visual.get_dtype())
image_embeds=model.visual(pixel_values, grid_thw=image_grid_thw)
n_image_tokens= (input_ids==model.config.image_token_id).sum().item()
n_image_features=image_embeds.shape[0]
ifn_image_tokens!=n_image_features:
raiseValueError(
f"Image features and image tokens do not match: tokens: {n_image_tokens}, features {n_image_features}"
)
image_mask= (
(input_ids==model.config.image_token_id)
.unsqueeze(-1)
.expand_as(inputs_embeds)
.to(inputs_embeds.device)
)
image_embeds=image_embeds.to(inputs_embeds.device, inputs_embeds.dtype)
inputs_embeds=inputs_embeds.masked_scatter(image_mask, image_embeds)
ifattention_maskisnotNone:
attention_mask=attention_mask.to(inputs_embeds.device)
generated_ids=model.generate(inputs_embeds=inputs_embeds, attention_mask=attention_mask, max_new_tokens=128)
Expected behavior
The latter should work the same as the former.
The latter's error message example
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 578, in forward
attn_weights = attn_weights + causal_mask
RuntimeError: The size of tensor a (2362) must match the size of tensor b (1182) at non-singleton dimension 3
The text was updated successfully, but these errors were encountered:
minostauros
changed the title
Qwen2-VL used to work with input_embeds instead of input_ids, but no more
Qwen2-VL used to work with inputs_embeds instead of input_ids, but no more
Dec 31, 2024
minostauros
changed the title
Qwen2-VL used to work with inputs_embeds instead of input_ids, but no more
Qwen2-VL should work with inputs_embeds instead of input_idsDec 31, 2024
minostauros
changed the title
Qwen2-VL should work with inputs_embeds instead of input_ids
Qwen2-VL used to work with inputs_embeds instead of input_ids, but no more
Dec 31, 2024
System Info
transformers
version: 4.47.1Who can help?
@zucchini-nlp
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Preparation
Working example
Used to work
Worked in 9470d65 but not in v4.47.1 [comparison]
Expected behavior
The latter should work the same as the former.
The latter's error message example
The text was updated successfully, but these errors were encountered: