Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
evaluation		evaluation
README.md		README.md
docowl_benchmark_evaluate.py		docowl_benchmark_evaluate.py

README.md

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Anwen Hu, Haiyang Xu†, Liang Zhang, Jiabo Ye, Ming Yan†, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

† Corresponding Author

Data: MP-DocStruct1M 🤗

MP-DocReason51K 🤗

DocDownstream 2.0 🤗

DocGenome12K 🤗

Models: DocOwl2-stage1 🤗

DocOwl2-stage2 🤗

DocOwl2 🤗

Spotlights

Support Multi-page Text Lookup and Multi-page Text Parsing.
Support Multi-page Question Answering using simple phrases or detailed explanations with evidence pages.
Support Text-rich Video Understanding.
Open Source
- ✅ Training Data: MP-DocStruct1M, MP-DocReason51K, DocDownsteam-2.0, DocGenome12K
- ✅ Model: DocOwl2
- ✅ Source code of model inference and evaluation.
- Model: DocOwl2-stage1, DocOwl2-stage2,
- Online Demo on ModelScope and HuggingFace.
- Source code of launching a local demo.
- Training code.

Training and Evaluation Datasets

Dataset	Download Link
MP-DocStruct1M	HuggingFace: mPLUG/MP-DocStruct1M ModelScope: iic/MP-DocStruct1M
DocDownstream-2.0	HuggingFace: mPLUG/DocDownstream-2.0 ModelScope: iic/DocDownstream-2.0
MP-DocReason51K	HuggingFace: mPLUG/MP-DocReason51K ModelScope: iic/MP-DocReason51K
DocGenome12K	HuggingFace: mPLUG/DocGenome12K ModelScope: iic/DocGenome12K

Models

Model Card

Model	Download Link	Abilities
DocOwl2	🤗 mPLUG/DocOwl2 iic/DocOwl2	Multi-page VQA with detailed explanations Multi-page VQA with concise answers

Model Inference

import torch
import os
from transformers import AutoTokenizer, AutoModel
from icecream import ic
import time

class DocOwlInfer():
    def __init__(self, ckpt_path):
        self.tokenizer = AutoTokenizer.from_pretrained(ckpt_path, use_fast=False)
        self.model = AutoModel.from_pretrained(ckpt_path, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16, device_map='auto')
        self.model.init_processor(tokenizer=self.tokenizer, basic_image_size=504, crop_anchors='grid_12')
        
    def inference(self, images, query):
        messages = [{'role': 'USER', 'content': '<|image|>'*len(images)+query}]
        answer = self.model.chat(messages=messages, images=images, tokenizer=self.tokenizer)
        return answer


docowl = DocOwlInfer(ckpt_path='mPLUG/DocOwl2')

images = [
        './examples/docowl2_page0.png',
        './examples/docowl2_page1.png',
        './examples/docowl2_page2.png',
        './examples/docowl2_page3.png',
        './examples/docowl2_page4.png',
        './examples/docowl2_page5.png',
    ]

answer = docowl.inference(images, query='what is this paper about? provide detailed information.')

answer = docowl.inference(images, query='what is the third page about? provide detailed information.')

Model Evaluation

prepare environments for evaluation as follows:

pip install textdistance
pip install editdistance
pip install pycocoevalcap

Evaluate DocOwl2 on 10 single-image tasks, 2 multi-page tasks and 1 video task:

python docowl_benchmark_evaluate.py --model_path $MODEL_PATH --dataset $DATASET --downstream_dir $DOWNSTREAM_DIR_PATH --save_dir $SAVE_DIR --split $split

Note: For sinlge-image evaluation, $DATASET should be chosen from [DocVQA, InfographicsVQA, WikiTableQuestions, DeepForm,KleisterCharity, TabFact, ChartQA, TextVQA, TextCaps, VisualMRC]. $DOWNSTREAM_DIR_PATH is the local path of mPLUG/DocDownstream-1.0, $split==test.

For multi-page evaluation and video evaluation, $DATASET should be chosen from [MP-DocVQA, DUDE, NewsVideoQA]. $DOWNSTREAM_DIR_PATH is the local path of mPLUG/DocDownstream-2.0, $split==val. You can also set $split==test and submit the file named with suffix _submission.json to the official evaluation website.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocOwl2

DocOwl2

README.md

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Spotlights

Training and Evaluation Datasets

Models

Model Card

Model Inference

Model Evaluation

Files

DocOwl2

Directory actions

More options

Directory actions

More options

Latest commit

History

DocOwl2

Folders and files

parent directory

README.md

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Spotlights

Training and Evaluation Datasets

Models

Model Card

Model Inference

Model Evaluation