-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation on Emu Edit benchmark #164
Comments
@Eureka-Maggie, we just use the original caption without any modification. |
Thank you for your reply. So there are some unpaired output caption and generated images when calculating clip-t metrics. |
Yes, there is some noise in emu-edit. |
Hello, I am reproducing the results on the emu dataset. But the result I measured is much higher than yours. Can you provide the specific details? For example, seed, resolution, and clip models. |
@zc1023 , the clip model is |
Hi,
Noticed that there are known issues with the Emu edit benchmark: some image-caption pairs seem incorrect (e.g., 'a train station in city') or identical source and target captions. So I was wondering how to calculate clip_T metric. How did you process the benchmark dataset?
Looking forward to your reply.
The text was updated successfully, but these errors were encountered: