You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I kindly think you need to discuss VoCo-LLaMA[1] in the "Intro" section of your paper at the very least.
As I find the citation and discussions related to VoCo-LLaMA are removed in your latest arxiv version (v3).
The motivations and ideas between Video-XL and VoCo-LLaMA are similar.
[1] Xubing Ye, et al. "VoCo-LLaMA: Towards Vision Compression with Large Language Models." arXiv preprint arXiv:2406.12275 (2024).
The text was updated successfully, but these errors were encountered:
Dear VoCo-LLaMA team,
Thank you for your valuable feedback regarding the introduction section of our paper and the comparison with VoColLLaMA. We greatly appreciate your thorough review and the opportunity to address these concerns.
Currently, our work is presented as a concise technical report, and as such, we have not yet included detailed discussions for every related work. However, we recognize the importance of elaborating on these distinctions. VoCo-LLaMA is an excellent piece of work, and we have cited it appropriately in our new manuscript. We plan to enhance the discussion in subsequent versions, explicitly addressing the differences and contributions relative to VoCo-LLaMA.
Additionally, we acknowledge there are other related works, particularly regarding KV and prompt compression (e.g., Auto-Compressors, ICAE, Gist, Beacon, etc.), that are relevant to the scope of our study. Our approach introduces several enhancements compared to these methods, which we will detail in future revisions to provide a more comprehensive comparison.
Thank you once again for your insightful comments. Please let me know if you have any other issues.
Dear authors,
@shuyansy @UnableToUseGit
I kindly think you need to discuss VoCo-LLaMA[1] in the "Intro" section of your paper at the very least.
As I find the citation and discussions related to VoCo-LLaMA are removed in your latest arxiv version (v3).
The motivations and ideas between Video-XL and VoCo-LLaMA are similar.
[1] Xubing Ye, et al. "VoCo-LLaMA: Towards Vision Compression with Large Language Models." arXiv preprint arXiv:2406.12275 (2024).
The text was updated successfully, but these errors were encountered: