Evaluation on Emu Edit benchmark #164

Eureka-Maggie · 2024-12-16T02:50:05Z

Hi,

Noticed that there are known issues with the Emu edit benchmark: some image-caption pairs seem incorrect (e.g., 'a train station in city') or identical source and target captions. So I was wondering how to calculate clip_T metric. How did you process the benchmark dataset?

Looking forward to your reply.

staoxiao · 2024-12-16T03:16:47Z

@Eureka-Maggie, we just use the original caption without any modification.

Eureka-Maggie · 2024-12-16T03:47:49Z

Thank you for your reply. So there are some unpaired output caption and generated images when calculating clip-t metrics.

staoxiao · 2024-12-16T04:19:00Z

Yes, there is some noise in emu-edit.

zc1023 · 2025-02-22T09:14:24Z

Hello, I am reproducing the results on the emu dataset. But the result I measured is much higher than yours. Can you provide the specific details? For example, seed, resolution, and clip models.

staoxiao · 2025-02-22T09:44:28Z

@zc1023 , the clip model is openai/clip-vit-large-patch14 and resolution is 512.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation on Emu Edit benchmark #164

Evaluation on Emu Edit benchmark #164

Eureka-Maggie commented Dec 16, 2024

staoxiao commented Dec 16, 2024

Eureka-Maggie commented Dec 16, 2024

staoxiao commented Dec 16, 2024

zc1023 commented Feb 22, 2025

staoxiao commented Feb 22, 2025

Evaluation on Emu Edit benchmark #164

Evaluation on Emu Edit benchmark #164

Comments

Eureka-Maggie commented Dec 16, 2024

staoxiao commented Dec 16, 2024

Eureka-Maggie commented Dec 16, 2024

staoxiao commented Dec 16, 2024

zc1023 commented Feb 22, 2025

staoxiao commented Feb 22, 2025