mme-realworld #266

yfzhang114 · 2024-09-20T11:42:50Z

MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications. https://mme-realworld.github.io/

Luodian · 2024-09-20T13:44:08Z

Hi thanks for this PR!

lmms_eval/tasks/mme_realworld/utils.py

lmms_eval/api/task.py

lmms_eval/models/internvl2.py

Luodian · 2024-09-21T08:17:25Z

and please fix the lint issue.

kcz358

Hi @yfzhang114 , thank you for your contribution!

I would like to ask a few questions after reviewing the code and kindly ask if some changes can be made before merging.

The split of dataset currently is hardcoded. This is not a recommended way to do so. Is it possible that you can split your dataset into two splits or two subsets in the huggingface and specify in the yaml?
As I noticed that the two utils are almost the same instead of the doc_to_text, is it possible that you can merge the two utils into one file? You can specify which doc_to_text to use in the yaml config also.
This one is not necessary but recommended to do so. I noticed that you convert your dataset from base64 to PIL.Image manually after download. Actually you can upload the dataset in the format of {"bytes" : <base64 image>, "path" : <a dummy path>} and set the features to Image() from huggingface dataset so that the image will show up in your dataset viewer also.

Thank you again for your contribution and please share your thoughts on whether these changes can be made

lmms_eval/tasks/mme_realworld/utils.py

yfzhang114 · 2024-09-22T04:13:48Z

Hi @yfzhang114 , thank you for your contribution!

I would like to ask a few questions after reviewing the code and kindly ask if some changes can be made before merging.

The split of dataset currently is hardcoded. This is not a recommended way to do so. Is it possible that you can split your dataset into two splits or two subsets in the huggingface and specify in the yaml?

As I noticed that the two utils are almost the same instead of the doc_to_text, is it possible that you can merge the two utils into one file? You can specify which doc_to_text to use in the yaml config also.

This one is not necessary but recommended to do so. I noticed that you convert your dataset from base64 to PIL.Image manually after download. Actually you can upload the dataset in the format of {"bytes" : <base64 image>, "path" : <a dummy path>} and set the features to Image() from huggingface dataset so that the image will show up in your dataset viewer also.

Thank you again for your contribution and please share your thoughts on whether these changes can be made

I separated the English and Chinese versions of MME-RealWorld into two Hugging Face datasets. However, due to the large size of many images, it’s not feasible to upload them in the suggested format {"bytes": , "path": }.
I merged the two utility files into a single file and resolved the lint errors.

kcz358

@yfzhang114 , Thank you for your contribution! LGTM

If it is possible, can you share a screenshot of the evaluation result using the mme-realworld dataset?

And seems like the lint check have failed again. Can you run an auto fixed using pre-commit?

pip install pre-commit
pre-commit install
pre-commit run --all-files

yfzhang114 · 2024-09-22T08:10:52Z

@kcz358 of course, here is the results of qwen0.5B_clip_vit, the number of all subtasks has been carefully checked.

2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:149 - ********************************Reasoning (Task Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Monitoring (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1867      Calculate (300 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4592      Intention (98 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.5000      Property (100 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.3032      E choice 0  Monitoring (498 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Autonomous_Driving (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2434      Prediction_intention_ego (304 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.0583      Prediction_intention_pedestrian (103 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2367      Prediction_intention_vehicle (207 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.0547      Relation_interaction_other2other (201 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4608      Attention_trafficsignal (217 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2264      Relation_interaction_ego2pedestrain (106 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1238      Relation_interaction_ego2trafficsignal (105 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2376      Relation_interaction_ego2vehicle (101 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2240      E choice 0  Autonomous_Driving (1344 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++OCR with Complex Context (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4440      Scene understanding (250 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3880      Character identification (250 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.4160      E choice 2  OCR with Complex Context (500 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Diagram and Table (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2299      Diagram (174 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3067      Table (326 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2800      E choice 0  Diagram and Table (500 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Remote Sensing (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.0000      E choice 0  Remote Sensing (0 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:177 - ********************************Acc 0.2815      E choice 2   Reasoning (2842 items)

2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:149 - ********************************Perception (Task Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Monitoring (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1447      Vehicle/counting (608 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2470      Person/counting (992 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1985      Vehicle/location (136 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1562      Vehicle/attribute (352 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1481      Person/attribute (108 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.1963      E choice 0  Monitoring (2196 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Autonomous_Driving (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3206      Objects_identify (1101 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4937      Attribute_motion_vehicle/attribute (158 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2612      Attribute_motion_multivehicles/attribute (823 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3930      Attribute_visual_trafficsignal/attribute (201 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1951      Attribute_motion_pedestrain/attribute (164 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2042      Object_count (720 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1785      Attribute_motion_multipedestrians/attribute (493 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2710      E choice 0  Autonomous_Driving (3660 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++OCR with Complex Context (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3885      License (852 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3310      Phone_and_address (577 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4174      Text_recog (1198 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4204      Adver_and_product (1558 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4386      Book_map_poster (1555 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.4110      E choice 0  OCR with Complex Context (5740 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Diagram and Table (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2678      Diagram (1415 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2870      Table (4018 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2820      E choice 0  Diagram and Table (5433 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Remote Sensing (Subtask Start)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2745      Position (1257 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2677      Color (1255 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3157      Count (1226 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2857      E choice 2  Remote Sensing (3738 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:177 - ********************************Acc 0.3073      E choice 2   Perception (20767 items)

2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:178 - ********************************Overall Acc 0.3042

* mme-realworld * mme-realworld * mme-realworld

mme-realworld

e52dfe6

kcz358 reviewed Sep 21, 2024

View reviewed changes

lmms_eval/tasks/mme_realworld/utils.py Outdated Show resolved Hide resolved

kcz358 reviewed Sep 21, 2024

View reviewed changes

lmms_eval/api/task.py Outdated Show resolved Hide resolved

kcz358 reviewed Sep 21, 2024

View reviewed changes

lmms_eval/models/internvl2.py Outdated Show resolved Hide resolved

kcz358 requested changes Sep 21, 2024

View reviewed changes

kcz358 reviewed Sep 21, 2024

View reviewed changes

lmms_eval/tasks/mme_realworld/utils.py Outdated Show resolved Hide resolved

mme-realworld

03664b6

kcz358 approved these changes Sep 22, 2024

View reviewed changes

mme-realworld

e5c5a33

Luodian merged commit be9e46c into EvolvingLMMs-Lab:main Sep 22, 2024
1 check passed

KairuiHu pushed a commit that referenced this pull request Sep 23, 2024

mme-realworld (#266)

f92fd71

* mme-realworld * mme-realworld * mme-realworld

KairuiHu pushed a commit that referenced this pull request Oct 24, 2024

mme-realworld (#266)

fc8815f

* mme-realworld * mme-realworld * mme-realworld

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mme-realworld #266

mme-realworld #266

yfzhang114 commented Sep 20, 2024

Luodian commented Sep 20, 2024

Luodian commented Sep 21, 2024

kcz358 left a comment

yfzhang114 commented Sep 22, 2024

kcz358 left a comment •

edited

Loading

yfzhang114 commented Sep 22, 2024

mme-realworld #266

mme-realworld #266

Conversation

yfzhang114 commented Sep 20, 2024

Luodian commented Sep 20, 2024

Luodian commented Sep 21, 2024

kcz358 left a comment

Choose a reason for hiding this comment

yfzhang114 commented Sep 22, 2024

kcz358 left a comment • edited Loading

Choose a reason for hiding this comment

yfzhang114 commented Sep 22, 2024

kcz358 left a comment •

edited

Loading