Add onnx export function for pix2struct model #1815

naormatania · 2024-04-14T07:35:16Z

What does this PR do?

This PR adds an export option to pix2struct model (document vision-language model) by optimizing its ViT encoder

Before submitting

I have tested the export locally and have seen the model outputs the same captions as the original model (exported https://huggingface.co/google/pix2struct-screen2words-base and tested on https://github.com/google-research-datasets/screen2words dataset)
If there is more testing that I need to do or document the change somewhere please let me know

Who can review?

@fxmarty, @echarlaix, @JingyaHuang, @michaelbenayoun

HuggingFaceDocBuilderDev · 2024-04-15T07:57:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Add onnx export for pix2struct

Add onnx export for pix2struct

a3a2ff0

fxmarty approved these changes Apr 15, 2024

View reviewed changes

fxmarty merged commit 0b52e3a into huggingface:main Apr 15, 2024
40 of 46 checks passed

young-developer pushed a commit to young-developer/optimum that referenced this pull request May 10, 2024

Add onnx export function for pix2struct model (huggingface#1815)

f0dd97d

Add onnx export for pix2struct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add onnx export function for pix2struct model #1815

Add onnx export function for pix2struct model #1815

naormatania commented Apr 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 15, 2024

Add onnx export function for pix2struct model #1815

Add onnx export function for pix2struct model #1815

Conversation

naormatania commented Apr 14, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 15, 2024

naormatania commented Apr 14, 2024 •

edited

Loading