Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About paper #25

Open
ApolloRay opened this issue Dec 17, 2024 · 1 comment
Open

About paper #25

ApolloRay opened this issue Dec 17, 2024 · 1 comment

Comments

@ApolloRay
Copy link

论文里说“For video processing, we sample one frame per second for videos shorter than 128 frames. For longer videos, we uniformly sample 128 frames.“,这里对于视频处理的帧数上限不是128帧吗?没有太理解论文提到的2048帧是怎么进行处理的,感谢解答。

@shuyansy
Copy link
Collaborator

您好!训练的时候出于训练效率以及显存的考虑,我们最大长度开到128帧。(由于训练时候会打开MLLM所有参数占的显存非常大);而在推理的时候,由于我们LLM的压缩设计可以节省大部分显存,最大可以达到16倍的压缩,因此可以处理2048帧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants