Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

想请教一下demo的批量推理该如何设置 #22

Open
ShbGao-ProMax opened this issue Dec 4, 2024 · 2 comments
Open

想请教一下demo的批量推理该如何设置 #22

ShbGao-ProMax opened this issue Dec 4, 2024 · 2 comments

Comments

@ShbGao-ProMax
Copy link

我有一批视频想批量进行推理

我按照demo的模式编写了一个批量脚本 但我发现在输出几轮后模型就会报错

return forward_call(*args, **kwargs) File "/mnt/VLM/Video_XL/videoxl/videoxl/model/language_model/llava_qwen.py", line 152, in forward q_embed = (q * q_cos) + (rotate_half(q) * q_sin) RuntimeError: The size of tensor a (0) must match the size of tensor b (46454) at non-singleton dimension 2

我在单张A6000上进行推理 观察到每次虽然输入的视频不同但是结果相同 几轮后会报错

我怀疑是否每次推理后都没有清理history 您能否提供一些帮助?

我的批量demo脚本如下:

`if name == 'main':
import os

model_path = "/mnt/checkpoint/VideoXL/VideoXL_weight_8"
video_folder = "/mnt/dataset/test_video_clip/1"

video_list = [video_folder + '/' + i for i in os.listdir(video_folder)]

max_frames_num =100 # you can change this to several thousands so long you GPU memory can handle it :)
gen_kwargs = {"do_sample": False, "temperature": 1, "top_p": None, "num_beams": 1, "use_cache": False, "max_new_tokens": 1024, "num_beams": 1}
tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0")

model.config.beacon_ratio=[8]   # you can delete this line to realize random compression of {2,4,8} ratio

for i in video_list:
    video_path = i
    print(i)
    #video input
    prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nPlease describe the video.<|im_end|>\n<|im_start|>assistant\n"
    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
    vr = VideoReader(video_path, ctx=cpu(0))
    total_frame_num = len(vr)
    uniform_sampled_frames = np.linspace(0, total_frame_num - 1, max_frames_num, dtype=int)
    frame_idx = uniform_sampled_frames.tolist()
    frames = vr.get_batch(frame_idx).asnumpy()
    video_tensor = image_processor.preprocess(frames, return_tensors="pt")["pixel_values"].to(model.device, dtype=torch.float16)

    beacon_skip_first = (input_ids == IMAGE_TOKEN_INDEX).nonzero(as_tuple=True)[1].item()
    num_tokens=TOKEN_PERFRAME *max_frames_num
    beacon_skip_last = beacon_skip_first  + num_tokens


    with torch.inference_mode():
        output_ids = model.generate(input_ids, images=[video_tensor],  modalities=["video"],beacon_skip_first=beacon_skip_first,beacon_skip_last=beacon_skip_last, **gen_kwargs)

        if IMAGE_TOKEN_INDEX in input_ids:
            transform_input_ids=transform_input_id(input_ids,num_tokens,model.config.vocab_size-1)
            output_ids=output_ids[:,transform_input_ids.shape[1]:]
            outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
    print("#####################################")
    print(outputs)
    print("#####################################")`
@ShbGao-ProMax
Copy link
Author

我尝试每轮都重新加载模型 似乎顺利了一些 但是在推理十几个视频后遇到了
TypeError: sequence item 109: expected str instance, NoneType found
请问您有什么建议吗

@shuyansy
Copy link
Collaborator

shuyansy commented Dec 4, 2024

您好,可以在for循环里面加上这个代码。
‘’‘
model.memory.reset()
’‘’
每轮重新加载模型理论上是可行的,有可能是那条数据的问题?能否单独对有问题的数据测试下。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants