-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mme-realworld #266
mme-realworld #266
Conversation
Hi thanks for this PR! |
and please fix the lint issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yfzhang114 , thank you for your contribution!
I would like to ask a few questions after reviewing the code and kindly ask if some changes can be made before merging.
- The split of dataset currently is hardcoded. This is not a recommended way to do so. Is it possible that you can split your dataset into two splits or two subsets in the huggingface and specify in the yaml?
- As I noticed that the two utils are almost the same instead of the doc_to_text, is it possible that you can merge the two utils into one file? You can specify which doc_to_text to use in the yaml config also.
- This one is not necessary but recommended to do so. I noticed that you convert your dataset from base64 to PIL.Image manually after download. Actually you can upload the dataset in the format of
{"bytes" : <base64 image>, "path" : <a dummy path>}
and set the features toImage()
from huggingfacedataset
so that the image will show up in your dataset viewer also.
Thank you again for your contribution and please share your thoughts on whether these changes can be made
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfzhang114 , Thank you for your contribution! LGTM
If it is possible, can you share a screenshot of the evaluation result using the mme-realworld dataset?
And seems like the lint check have failed again. Can you run an auto fixed using pre-commit?
pip install pre-commit
pre-commit install
pre-commit run --all-files
@kcz358 of course, here is the results of qwen0.5B_clip_vit, the number of all subtasks has been carefully checked.
|
* mme-realworld * mme-realworld * mme-realworld
* mme-realworld * mme-realworld * mme-realworld
MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications. https://mme-realworld.github.io/