You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
return forward_call(*args, **kwargs) File "/mnt/VLM/Video_XL/videoxl/videoxl/model/language_model/llava_qwen.py", line 152, in forward q_embed = (q * q_cos) + (rotate_half(q) * q_sin) RuntimeError: The size of tensor a (0) must match the size of tensor b (46454) at non-singleton dimension 2
我在单张A6000上进行推理 观察到每次虽然输入的视频不同但是结果相同 几轮后会报错
我怀疑是否每次推理后都没有清理history 您能否提供一些帮助?
我的批量demo脚本如下:
`if name == 'main':
import os
model_path = "/mnt/checkpoint/VideoXL/VideoXL_weight_8"
video_folder = "/mnt/dataset/test_video_clip/1"
video_list = [video_folder + '/' + i for i in os.listdir(video_folder)]
max_frames_num =100 # you can change this to several thousands so long you GPU memory can handle it :)
gen_kwargs = {"do_sample": False, "temperature": 1, "top_p": None, "num_beams": 1, "use_cache": False, "max_new_tokens": 1024, "num_beams": 1}
tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0")
model.config.beacon_ratio=[8] # you can delete this line to realize random compression of {2,4,8} ratio
for i in video_list:
video_path = i
print(i)
#video input
prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nPlease describe the video.<|im_end|>\n<|im_start|>assistant\n"
input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
vr = VideoReader(video_path, ctx=cpu(0))
total_frame_num = len(vr)
uniform_sampled_frames = np.linspace(0, total_frame_num - 1, max_frames_num, dtype=int)
frame_idx = uniform_sampled_frames.tolist()
frames = vr.get_batch(frame_idx).asnumpy()
video_tensor = image_processor.preprocess(frames, return_tensors="pt")["pixel_values"].to(model.device, dtype=torch.float16)
beacon_skip_first = (input_ids == IMAGE_TOKEN_INDEX).nonzero(as_tuple=True)[1].item()
num_tokens=TOKEN_PERFRAME *max_frames_num
beacon_skip_last = beacon_skip_first + num_tokens
with torch.inference_mode():
output_ids = model.generate(input_ids, images=[video_tensor], modalities=["video"],beacon_skip_first=beacon_skip_first,beacon_skip_last=beacon_skip_last, **gen_kwargs)
if IMAGE_TOKEN_INDEX in input_ids:
transform_input_ids=transform_input_id(input_ids,num_tokens,model.config.vocab_size-1)
output_ids=output_ids[:,transform_input_ids.shape[1]:]
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
print("#####################################")
print(outputs)
print("#####################################")`
The text was updated successfully, but these errors were encountered:
我有一批视频想批量进行推理
我按照demo的模式编写了一个批量脚本 但我发现在输出几轮后模型就会报错
return forward_call(*args, **kwargs) File "/mnt/VLM/Video_XL/videoxl/videoxl/model/language_model/llava_qwen.py", line 152, in forward q_embed = (q * q_cos) + (rotate_half(q) * q_sin) RuntimeError: The size of tensor a (0) must match the size of tensor b (46454) at non-singleton dimension 2
我在单张A6000上进行推理 观察到每次虽然输入的视频不同但是结果相同 几轮后会报错
我怀疑是否每次推理后都没有清理history 您能否提供一些帮助?
我的批量demo脚本如下:
`if name == 'main':
import os
The text was updated successfully, but these errors were encountered: