Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError #11

Open
ZHANG-SH97 opened this issue Nov 13, 2024 · 9 comments
Open

AssertionError #11

ZHANG-SH97 opened this issue Nov 13, 2024 · 9 comments

Comments

@ZHANG-SH97
Copy link

I used the official weights and test codes to test the video, but encountered the following error. What is the cause?
image
Thanks for your reply!

@shuyansy
Copy link
Collaborator

Hi, Could you tell me which code you used?

@ZHANG-SH97
Copy link
Author

the code on huggingface
`from videoxl.model.builder import load_pretrained_model
from videoxl.mm_utils import tokenizer_image_token, process_images,transform_input_id
from videoxl.constants import IMAGE_TOKEN_INDEX,TOKEN_PERFRAME
from PIL import Image
from decord import VideoReader, cpu
import torch
import numpy as np

fix seed

torch.manual_seed(0)

model_path = "/home/Video_XL"
video_path = "/home/test_video.mp4"

max_frames_num =1200
gen_kwargs = {"do_sample": True, "temperature": 1, "top_p": None, "num_beams": 1, "use_cache": True, "max_new_tokens": 1024}
tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0",trust_remote_code=True)

model.config.beacon_ratio=[8] # you can delete this line to realize random compression of {2,4,8} ratio

#video input
prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\ndescribe the frist person in the video.<|im_end|>\n<|im_start|>assistant\n"
input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
vr = VideoReader(video_path, ctx=cpu(0))
total_frame_num = len(vr)
uniform_sampled_frames = np.linspace(0, total_frame_num - 1, max_frames_num, dtype=int)
frame_idx = uniform_sampled_frames.tolist()
frames = vr.get_batch(frame_idx).asnumpy()
video_tensor = image_processor.preprocess(frames, return_tensors="pt")["pixel_values"].to(model.device, dtype=torch.float16)

beacon_skip_first = (input_ids == IMAGE_TOKEN_INDEX).nonzero(as_tuple=True)[1].item()
num_tokens=TOKEN_PERFRAME *max_frames_num
beacon_skip_last = beacon_skip_first + num_tokens

with torch.inference_mode():
output_ids = model.generate(input_ids, images=[video_tensor], modalities=["video"],beacon_skip_first=beacon_skip_first,beacon_skip_last=beacon_skip_last, **gen_kwargs)

if IMAGE_TOKEN_INDEX in input_ids:
transform_input_ids=transform_input_id(input_ids,num_tokens,model.config.vocab_size-1)

output_ids=output_ids[:,transform_input_ids.shape[1]:]
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
print(outputs)
`
the weight is also on huggingface:https://huggingface.co/sy1998/Video_XL/blob/main/VideoXL_weight_8

@shuyansy
Copy link
Collaborator

It seems like you forget to add image into your input, the correct prompt is "prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\ndescribe the frist person in the video. <|im_end|>\n<|im_start|>assistant\n"
Moreover, our current released weight only supports max frames=1024.Maybe within this month, we will realse the model can support 2048 frames.

@shuyansy
Copy link
Collaborator

截屏2024-11-13 下午3 30 02

@ZHANG-SH97
Copy link
Author

I attempted to utilize the suggested prompt you provided, but unfortunately, I encountered the same error.
By the way, I've configured the max_frames_num=900.
my test video only lasts for 90 seconds.

@shuyansy
Copy link
Collaborator

You can zip your demo.py code and video to my email ([email protected]). I will try my best to solve it.

@ZHANG-SH97
Copy link
Author

You can zip your demo.py code and video to my email ([email protected]). I will try my best to solve it.

already send to your email, Looking forward to your reply

@anilbatra2185
Copy link

hi @shuyansy, @ZHANG-SH97

is there any update, as I am facing the same issue.

@shuyansy
Copy link
Collaborator

Sorry to give a late reply. I found this is a bug with the Transformers version. You can use the transformers==4.40.0.dev0 to avoid this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants