Below are results with:
Manticore 4.2.1 d039fba84@220407 release
elasticsearch 7.17.0
Results with all metrics:
- For MS:
a. Default settings:
python -m benchmark.manticore.evaluate data/trec-covid test trec_covid
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0 | 0 | 0 | 0.00127 |
1 | MAP | 0 | 0 | 0 | 0 |
2 | Recall | 0 | 0 | 0 | 2e-05 |
3 | P | 0 | 0 | 0 | 0.002 |
4 | MRR | 0 | 0 | 0 | 0.002 |
5 | R_cap | 0 | 0 | 0 | 0.002 |
6 | Hole | 1 | 1 | 0.992 | 0.888 |
7 | Accuracy | 0 | 0 | 0 | 0.02 |
python -m benchmark.manticore.evaluate data/nfcorpus test nfcorpus
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0.12752 | 0.12963 | 0.12311 | 0.11403 |
1 | MAP | 0.02175 | 0.03538 | 0.04675 | 0.0517 |
2 | Recall | 0.02175 | 0.0399 | 0.05719 | 0.06486 |
3 | P | 0.12752 | 0.12584 | 0.1047 | 0.08054 |
4 | MRR | 0.10836 | 0.12539 | 0.14123 | 0.14808 |
5 | R_cap | 0.10836 | 0.10836 | 0.09835 | 0.10131 |
6 | Hole | 0.17028 | 0.17957 | 0.1969 | 0.24923 |
7 | Accuracy | 0.10836 | 0.14241 | 0.19814 | 0.25077 |
b. ES-like settings:
python -m benchmark.manticore.evaluate data/trec-covid test trec_covid_es_like
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0.85 | 0.81905 | 0.76783 | 0.71441 |
1 | MAP | 0.00229 | 0.00438 | 0.01006 | 0.01836 |
2 | Recall | 0.00229 | 0.00442 | 0.01071 | 0.0206 |
3 | P | 0.88 | 0.85 | 0.808 | 0.766 |
4 | MRR | 0.88 | 0.9 | 0.92167 | 0.92167 |
5 | R_cap | 0.88 | 0.85 | 0.812 | 0.766 |
6 | Hole | 0 | 0.01 | 0.02 | 0.022 |
7 | Accuracy | 0.88 | 0.92 | 1 | 1 |
python -m benchmark.manticore.evaluate data/nfcorpus test nfcorpus_es_like
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0.45292 | 0.42067 | 0.38322 | 0.34537 |
1 | MAP | 0.0592 | 0.08632 | 0.11326 | 0.12953 |
2 | Recall | 0.0592 | 0.09091 | 0.13506 | 0.16388 |
3 | P | 0.47078 | 0.41558 | 0.33182 | 0.2513 |
4 | MRR | 0.4582 | 0.4969 | 0.52657 | 0.53319 |
5 | R_cap | 0.4582 | 0.40867 | 0.35516 | 0.30366 |
6 | Hole | 0.06192 | 0.07585 | 0.08111 | 0.08731 |
7 | Accuracy | 0.4582 | 0.5356 | 0.64087 | 0.6935 |
- For ES:
python -m benchmark.es.evaluate_bm25 data/trec-covid test trec_covid
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0.82 | 0.79679 | 0.72491 | 0.68803 |
1 | MAP | 0.00234 | 0.0044 | 0.00961 | 0.01698 |
2 | Recall | 0.00234 | 0.00443 | 0.01027 | 0.01907 |
3 | P | 0.88 | 0.84 | 0.768 | 0.734 |
4 | MRR | 0.88 | 0.9 | 0.92167 | 0.92167 |
5 | R_cap | 0.88 | 0.83 | 0.768 | 0.734 |
6 | Hole | 0.02 | 0.03 | 0.052 | 0.054 |
7 | Accuracy | 0.88 | 0.92 | 1 | 1 |
python -m benchmark.es.evaluate_bm25 data/nfcorpus test nfcorpus
metric | k=1 | k=2 | k=5 | k=10 | |
---|---|---|---|---|---|
0 | NDCG | 0.44968 | 0.4197 | 0.37705 | 0.34281 |
1 | MAP | 0.05936 | 0.08833 | 0.11329 | 0.12969 |
2 | Recall | 0.05936 | 0.09328 | 0.13313 | 0.16603 |
3 | P | 0.46753 | 0.41396 | 0.32273 | 0.24708 |
4 | MRR | 0.44892 | 0.49536 | 0.52023 | 0.52954 |
5 | R_cap | 0.44892 | 0.40712 | 0.34711 | 0.30188 |
6 | Hole | 0.06192 | 0.07276 | 0.07802 | 0.08359 |
7 | Accuracy | 0.44892 | 0.5418 | 0.62848 | 0.70279 |