Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get N/A when evaluating lmms-lab/llava-onevision-qwen2-7b-ov on mathvista_testmini, docvqa_test, and infovqa_test #553

Open
viyjy opened this issue Feb 24, 2025 · 5 comments

Comments

@viyjy
Copy link

viyjy commented Feb 24, 2025

  1. mathvista_testmini:
{
  "results": {
    "mathvista_testmini": {
      " ": " ",
      "alias": "mathvista_testmini"
    },
    "mathvista_testmini_cot": {
      "alias": " - mathvista_testmini_cot",
      "gpt_eval_score,none": 29.2,
      "gpt_eval_score_stderr,none": "N/A",
      "submission,none": [],
      "submission_stderr,none": []
    },
    "mathvista_testmini_format": {
      "alias": " - mathvista_testmini_format",
      "gpt_eval_score,none": 39.0,
      "gpt_eval_score_stderr,none": "N/A",
      "submission,none": [],
      "submission_stderr,none": []
    },
    "mathvista_testmini_solution": {
      "alias": " - mathvista_testmini_solution",
      "gpt_eval_score,none": 35.7,
      "gpt_eval_score_stderr,none": "N/A",
      "submission,none": [],
      "submission_stderr,none": []
    }
  },
  1. docvqa_test
"results": {
    "docvqa_test": {
      "alias": "docvqa_test",
      "anls,none": [],
      "anls_stderr,none": [],
      "submission,none": null,
      "submission_stderr,none": "N/A"
    }
  },
  "group_subtasks": {
    "docvqa_test": []
  },
  1. infovqa_test
"results": {
    "infovqa_test": {
      "alias": "infovqa_test",
      "submission,none": null,
      "submission_stderr,none": "N/A"
    }
  },
  "group_subtasks": {
    "infovqa_test": []
  },
@kcz358
Copy link
Collaborator

kcz358 commented Feb 25, 2025

  1. For mathvista, you can see that only stderr has NA/
  2. For infovqa and docvqa, you are testing the test split, which you have to submit the generated submission file to their website. If you evaluate the val split, you will get the score

@viyjy
Copy link
Author

viyjy commented Feb 25, 2025

@kcz358 Thanks.
The results for mathvista_testmini is obtained by evaluating lmms-lab/llava-onevision-qwen2-7b-ov. In LLaVA-OneVision paper, this model achieves 63.2% on mathvista_testmini. As you can see, none of those numbers are close to 63.2%. I thought there should be some numbers in

"mathvista_testmini": {
      " ": " ",
      "alias": "mathvista_testmini"
    },

@kcz358
Copy link
Collaborator

kcz358 commented Feb 26, 2025

Have you checked that the gpt extraction works well? Sometimes this could be the reason because it needs gpt to extract answer

@viyjy
Copy link
Author

viyjy commented Feb 26, 2025

Not sure where is the gpt extraction. I get the following two folders:

  • lmms-lab__llava-onevision-qwen2-7b-ov/
    • 20250220_093829_results.json
    • 20250220_093829_samples_mathvista_testmini_cot.jsonl
    • 20250220_093829_samples_mathvista_testmini_format.jsonl
    • 20250220_093829_samples_mathvista_testmini_solution.jsonl
  • submissions
    • mathvista_testmini_scores.json

  1. The content in mathvista_testmini_scores.json is something like:
{
    "1": {
        "question_id": "1",
        "query": "Hint: Please answer the question requiring a floating-point number with one decimal place and provide the final value, e.g., 1.2, 1.3, 1.4, at the end.\nQuestion: When a spring does work on an object, we cannot find the work by simply multiplying the spring force by the object's displacement. The reason is that there is no one value for the force-it changes. However, we can split the displacement up into an infinite number of tiny parts and then approximate the force in each as being constant. Integration sums the work done in all those parts. Here we use the generic result of the integration.\r\n\r\nIn Figure, a cumin canister of mass $m=0.40 \\mathrm{~kg}$ slides across a horizontal frictionless counter with speed $v=0.50 \\mathrm{~m} / \\mathrm{s}$. It then runs into and compresses a spring of spring constant $k=750 \\mathrm{~N} / \\mathrm{m}$. When the canister is momentarily stopped by the spring, by what distance $d$ is the spring compressed?",
        "choices": null,
        "answer": "1.2",
        "extraction": "0.7",
        "prediction": null,
        "true_false": false,
        "question_type": "free_form",
        "answer_type": "float",
        "precision": 1.0,
        "category": "math-targeted-vqa",
        "context": "scientific figure",
        "grade": "college",
        "img_height": 720,
        "img_width": 1514,
        "language": "english",
        "skills": [
            "scientific reasoning"
        ],
        "source": "SciBench",
        "split": "testmini",
        "task": "textbook question answering"
    },

@kcz358
Copy link
Collaborator

kcz358 commented Feb 27, 2025

It is being done during postprocessing. You might need to check whether there are any errors during that time. It will be log to the screen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants