Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mme-realworld #266

Merged
merged 3 commits into from
Sep 22, 2024
Merged

mme-realworld #266

merged 3 commits into from
Sep 22, 2024

Conversation

yfzhang114
Copy link
Contributor

MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications. https://mme-realworld.github.io/

@Luodian
Copy link
Contributor

Luodian commented Sep 20, 2024

Hi thanks for this PR!

@Luodian
Copy link
Contributor

Luodian commented Sep 21, 2024

and please fix the lint issue.

Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yfzhang114 , thank you for your contribution!

I would like to ask a few questions after reviewing the code and kindly ask if some changes can be made before merging.

  • The split of dataset currently is hardcoded. This is not a recommended way to do so. Is it possible that you can split your dataset into two splits or two subsets in the huggingface and specify in the yaml?
  • As I noticed that the two utils are almost the same instead of the doc_to_text, is it possible that you can merge the two utils into one file? You can specify which doc_to_text to use in the yaml config also.
  • This one is not necessary but recommended to do so. I noticed that you convert your dataset from base64 to PIL.Image manually after download. Actually you can upload the dataset in the format of {"bytes" : <base64 image>, "path" : <a dummy path>} and set the features to Image() from huggingface dataset so that the image will show up in your dataset viewer also.

Thank you again for your contribution and please share your thoughts on whether these changes can be made

@yfzhang114
Copy link
Contributor Author

Hi @yfzhang114 , thank you for your contribution!

I would like to ask a few questions after reviewing the code and kindly ask if some changes can be made before merging.

  • The split of dataset currently is hardcoded. This is not a recommended way to do so. Is it possible that you can split your dataset into two splits or two subsets in the huggingface and specify in the yaml?
  • As I noticed that the two utils are almost the same instead of the doc_to_text, is it possible that you can merge the two utils into one file? You can specify which doc_to_text to use in the yaml config also.
  • This one is not necessary but recommended to do so. I noticed that you convert your dataset from base64 to PIL.Image manually after download. Actually you can upload the dataset in the format of {"bytes" : <base64 image>, "path" : <a dummy path>} and set the features to Image() from huggingface dataset so that the image will show up in your dataset viewer also.

Thank you again for your contribution and please share your thoughts on whether these changes can be made

  1. I separated the English and Chinese versions of MME-RealWorld into two Hugging Face datasets. However, due to the large size of many images, it’s not feasible to upload them in the suggested format {"bytes": , "path": }.

  2. I merged the two utility files into a single file and resolved the lint errors.

Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfzhang114 , Thank you for your contribution! LGTM

If it is possible, can you share a screenshot of the evaluation result using the mme-realworld dataset?

And seems like the lint check have failed again. Can you run an auto fixed using pre-commit?

pip install pre-commit
pre-commit install
pre-commit run --all-files

@yfzhang114
Copy link
Contributor Author

@kcz358 of course, here is the results of qwen0.5B_clip_vit, the number of all subtasks has been carefully checked.

81320073f0125f02d7ce50e9dbfd27c

2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:149 - ********************************Reasoning (Task Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Monitoring (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1867      Calculate (300 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4592      Intention (98 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.5000      Property (100 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.3032      E choice 0  Monitoring (498 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Autonomous_Driving (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2434      Prediction_intention_ego (304 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.0583      Prediction_intention_pedestrian (103 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2367      Prediction_intention_vehicle (207 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.0547      Relation_interaction_other2other (201 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4608      Attention_trafficsignal (217 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2264      Relation_interaction_ego2pedestrain (106 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1238      Relation_interaction_ego2trafficsignal (105 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2376      Relation_interaction_ego2vehicle (101 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2240      E choice 0  Autonomous_Driving (1344 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++OCR with Complex Context (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4440      Scene understanding (250 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3880      Character identification (250 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.4160      E choice 2  OCR with Complex Context (500 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Diagram and Table (Subtask Start)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2299      Diagram (174 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3067      Table (326 items)
2024-09-22 03:56:23.257 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2800      E choice 0  Diagram and Table (500 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Remote Sensing (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.0000      E choice 0  Remote Sensing (0 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:177 - ********************************Acc 0.2815      E choice 2   Reasoning (2842 items)

2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:149 - ********************************Perception (Task Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Monitoring (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1447      Vehicle/counting (608 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2470      Person/counting (992 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1985      Vehicle/location (136 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1562      Vehicle/attribute (352 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1481      Person/attribute (108 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.1963      E choice 0  Monitoring (2196 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Autonomous_Driving (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3206      Objects_identify (1101 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4937      Attribute_motion_vehicle/attribute (158 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2612      Attribute_motion_multivehicles/attribute (823 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3930      Attribute_visual_trafficsignal/attribute (201 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1951      Attribute_motion_pedestrain/attribute (164 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2042      Object_count (720 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.1785      Attribute_motion_multipedestrians/attribute (493 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2710      E choice 0  Autonomous_Driving (3660 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++OCR with Complex Context (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3885      License (852 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3310      Phone_and_address (577 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4174      Text_recog (1198 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4204      Adver_and_product (1558 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.4386      Book_map_poster (1555 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.4110      E choice 0  OCR with Complex Context (5740 items)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Diagram and Table (Subtask Start)
2024-09-22 03:56:23.258 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2678      Diagram (1415 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2870      Table (4018 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2820      E choice 0  Diagram and Table (5433 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:152 - ++++++++++++++++Remote Sensing (Subtask Start)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2745      Position (1257 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.2677      Color (1255 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:159 - ----   Acc 0.3157      Count (1226 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:166 - ++++++++++++++++        Acc 0.2857      E choice 2  Remote Sensing (3738 items)
2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:177 - ********************************Acc 0.3073      E choice 2   Perception (20767 items)

2024-09-22 03:56:23.259 | INFO     | utils:mme_realworld_aggregate_results:178 - ********************************Overall Acc 0.3042

@Luodian Luodian merged commit be9e46c into EvolvingLMMs-Lab:main Sep 22, 2024
1 check passed
KairuiHu pushed a commit that referenced this pull request Sep 23, 2024
* mme-realworld

* mme-realworld

* mme-realworld
KairuiHu pushed a commit that referenced this pull request Oct 24, 2024
* mme-realworld

* mme-realworld

* mme-realworld
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants