[Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt #260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR update the
llava_hf
to enable the evaluation of thellava_hf
on new series of llava model such as llava-onevision and llava-next(stronger). This PR also enable the video evaluation using llava_hf.Noted that the video evaluation is only supported using llava onevision hf and would possibly failed if you other version of llava. Since the newest transformers version has not released you have to do
to install the transformers version from source if you want to use llava onevision hf.
However, after experiment, I think the performance still has some significance difference compare to the original llava. Thus, you are not recommended to use this model to get results of original llava or llava-onevision. This model is only recommended to those that wish to have a quick baseline or have finetuned their model using llava-hf