diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c618aee --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +imagen_museum \ No newline at end of file diff --git a/index.html b/index.html index 7a14206..8854e07 100644 --- a/index.html +++ b/index.html @@ -209,7 +209,7 @@

How effective is VIEScore with current state-of-the-art M

But how well can Multimodal large language models access different tasks of conditional Image generation? We reported that the best model GPT4v’s performance is significantly better than the open-source models. Most open-source MLLMs failed to adapt to our VieScore except LLaVA.

- MY ALT TEXT + MY ALT TEXT

Table 1: Correlations across all tasks with different backbone models. We highlight the highest correlation numbers in green. Visit our paper for insights and challenges in VIEScore.

@@ -256,10 +256,22 @@

How is Traditional Metrics correlating with human compare -0.0114 -0.0881 + + VIEScore(GPT-4o0shot) + 0.4989 + 0.2495 + 0.3928 + + + VIEScore(GPT-4o1shot) + 0.5124 + 0.0336 + 0.4042 + VIEScore(GPT-4v0shot) - 0.4885 - 0.2379 + 0.4885 + 0.2379 0.4614 @@ -296,10 +308,22 @@

How is Traditional Metrics correlating with human compare -0.0694 - VIEScore(GPT-4v0shot) - 0.4508 - 0.2859 - 0.4069 + VIEScore(GPT-4o0shot) + 0.5421 + 0.3469 + 0.4769 + + + VIEScore(GPT-4o1shot) + 0.5246 + 0.1272 + 0.4432 + + + VIEScore(GPT-4v0shot) + 0.4508 + 0.2859 + 0.4069 VIEScore(GPT-4v1shot) @@ -335,10 +359,22 @@

How is Traditional Metrics correlating with human compare 0.1142 - VIEScore(GPT-4v0shot) - 0.2610 - 0.4274 - 0.2456 + VIEScore(GPT-4o0shot) + 0.4062 + 0.4863 + 0.3821 + + + VIEScore(GPT-4o1shot) + 0.3684 + 0.1939 + 0.3438 + + + VIEScore(GPT-4v0shot) + 0.2610 + 0.4274 + 0.2456 VIEScore(GPT-4v1shot) @@ -368,10 +404,10 @@

How is Traditional Metrics correlating with human compare 0.4653 - DINO - 0.4160 + DINO + 0.4160 0.1206 - 0.4246 + 0.4246 CLIP-I @@ -379,6 +415,18 @@

How is Traditional Metrics correlating with human compare 0.1694 0.3058 + + VIEScore(GPT-4o0shot) + 0.4806 + 0.2576 + 0.4637 + + + VIEScore(GPT-4o1shot) + 0.4685 + -0.0171 + 0.4292 + VIEScore(GPT-4v0shot) 0.3979 @@ -388,7 +436,7 @@

How is Traditional Metrics correlating with human compare VIEScore(GPT-4v1shot) 0.2757 - 0.2261 + 0.2261 0.2753 @@ -413,10 +461,10 @@

How is Traditional Metrics correlating with human compare 0.4747 - DINO + DINO 0.3022 -0.0381 - 0.3005 + 0.3005 CLIP-I @@ -424,10 +472,22 @@

How is Traditional Metrics correlating with human compare 0.1248 0.2813 + + VIEScore(GPT-4o0shot) + 0.4800 + 0.3734 + 0.3268 + + + VIEScore(GPT-4o1shot) + 0.3862 + 0.1273 + 0.3268 + VIEScore(GPT-4v0shot) - 0.3274 - 0.2960 + 0.3274 + 0.2960 0.1507 @@ -470,10 +530,22 @@

How is Traditional Metrics correlating with human compare 0.1498 - VIEScore(GPT-4v0shot) - 0.3209 + VIEScore(GPT-4o0shot) + 0.4516 + 0.2751 + 0.4136 + + + VIEScore(GPT-4o1shot) + 0.4120 + -0.0141 + 0.3523 + + + VIEScore(GPT-4v0shot) + 0.3209 0.3025 - 0.3346 + 0.3346 VIEScore(GPT-4v1shot) @@ -508,17 +580,29 @@

How is Traditional Metrics correlating with human compare 0.4204 0.4133 + + VIEScore(GPT-4o0shot) + 0.4972 + 0.4892 + 0.5439 + + + VIEScore(GPT-4o1shot) + 0.5544 + 0.3699 + 0.5238 + VIEScore(GPT-4v0shot) - 0.4360 - 0.4975 + 0.4360 + 0.4975 0.3999 - VIEScore(GPT-4v1shot) + VIEScore(GPT-4v1shot) 0.3892 0.4132 - 0.4237 + 0.4237 VIEScore(LLaVA0shot) diff --git a/static/images/table_full1.png b/static/images/table_full1.png deleted file mode 100644 index c2934d2..0000000 Binary files a/static/images/table_full1.png and /dev/null differ diff --git a/static/images/table_full2.png b/static/images/table_full2.png deleted file mode 100644 index 944ef6b..0000000 Binary files a/static/images/table_full2.png and /dev/null differ diff --git a/static/images/table_overall_new.png b/static/images/table_overall_new.png new file mode 100644 index 0000000..a90320f Binary files /dev/null and b/static/images/table_overall_new.png differ