-
Notifications
You must be signed in to change notification settings - Fork 7
Search with * wildcard has an effect on the relevance ranking #700
Comments
Here is solr scoring explanations for the document https://test.swissbib.ch/Record/316493929 This is ranked 1st for quarteroni and ranked 28th for quarteroni*. quarteroni search {
"316493929": {
"match": true,
"value": 6917.0166,
"description": "sum of:",
"details": [
{
"match": true,
"value": 6916.366,
"description": "max of:",
"details": [
{
"match": true,
"value": 690.38696,
"description": "weight(author_additional_gnd_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
"details": [
{
"match": true,
"value": 690.38696,
"description": "score(doc=1258316,freq=3.0), product of:",
"details": [
{
"match": true,
"value": 100,
"description": "boost"
},
{
"match": true,
"value": 6.9038696,
"description": "fieldWeight in 1258316, product of:",
"details": [
{
"match": true,
"value": 1.7320508,
"description": "tf(freq=3.0), with freq of:",
"details": [
{
"match": true,
"value": 3,
"description": "termFreq=3.0"
}
]
},
{
"match": true,
"value": 11.957853,
"description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
"details": [
{
"match": true,
"value": 28,
"description": "docFreq"
},
{
"match": true,
"value": 1664689,
"description": "docCount"
}
]
},
{
"match": true,
"value": 0.33333334,
"description": "fieldNorm(doc=1258316)"
}
]
}
]
}
]
},
{
"match": true,
"value": 6916.366,
"description": "weight(author:quarteroni in 1258316) [ClassicSimilarity], result of:",
"details": [
{
"match": true,
"value": 6916.366,
"description": "score(doc=1258316,freq=1.0), product of:",
"details": [
{
"match": true,
"value": 750,
"description": "boost"
},
{
"match": true,
"value": 9.221822,
"description": "fieldWeight in 1258316, product of:",
"details": [
{
"match": true,
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"match": true,
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"match": true,
"value": 13.041626,
"description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
"details": [
{
"match": true,
"value": 46,
"description": "docFreq"
},
{
"match": true,
"value": 7974611,
"description": "docCount"
}
]
},
{
"match": true,
"value": 0.70710677,
"description": "fieldNorm(doc=1258316)"
}
]
}
]
}
]
},
{
"match": true,
"value": 189.3192,
"description": "weight(addfields_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
"details": [
{
"match": true,
"value": 189.3192,
"description": "score(doc=1258316,freq=1.0), product of:",
"details": [
{
"match": true,
"value": 50,
"description": "boost"
},
{
"match": true,
"value": 3.7863839,
"description": "fieldWeight in 1258316, product of:",
"details": [
{
"match": true,
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"match": true,
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"match": true,
"value": 13.116419,
"description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
"details": [
{
"match": true,
"value": 48,
"description": "docFreq"
},
{
"match": true,
"value": 8959630,
"description": "docCount"
}
]
},
{
"match": true,
"value": 0.28867513,
"description": "fieldNorm(doc=1258316)"
}
]
}
]
}
]
}
]
},
{
"match": true,
"value": 0.65048635,
"description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
"details": [
{
"match": true,
"value": 0.65048635,
"description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
},
{
"match": true,
"value": 1,
"description": "boost"
}
]
}
]
}
} quarteroni* search {
"316493929": {
"match": true,
"value": 750.6505,
"description": "sum of:",
"details": [
{
"match": true,
"value": 750,
"description": "max of:",
"details": [
{
"match": true,
"value": 100,
"description": "author_additional_gnd_txt_mv:quarteroni*^100.0"
},
{
"match": true,
"value": 750,
"description": "author:quarteroni*^750.0"
},
{
"match": true,
"value": 50,
"description": "addfields_txt_mv:quarteroni*^50.0"
}
]
},
{
"match": true,
"value": 0.65048635,
"description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
"details": [
{
"match": true,
"value": 0.65048635,
"description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
},
{
"match": true,
"value": 1,
"description": "boost"
}
]
}
]
}
} |
Looks like prefix queries ("a*") are constant-scoring (all matching documents get an equal score). The scoring factors TF, IDF, index boost, and "coord" are not used. |
Looks like vufind suffers from the same problem : https://vufind.org/demo/Search/Results?lookfor=quarteroni*&type=AllFields&limit=20 This is not really bad for searches, but it is very bad for suggestions, as suggestions are based on wildcard queries. One more reason to use https://lucene.apache.org/solr/guide/7_3/suggester.html |
I solved it using "quarteroni OR quarteroni*" as a query. But this is not a fully convincing solution as this has some border effects (for example using pf solr parameter will boost documents which have the query word twice). |
Compare :
https://www.swissbib.ch/Search/Results?lookfor=quarteroni&type=AllFields&limit=20
https://www.swissbib.ch/Search/Results?lookfor=quarteroni*&type=AllFields&limit=20
or
https://www.swissbib.ch/Search/Results?lookfor=scrum&type=AllFields&limit=20
https://www.swissbib.ch/Search/Results?lookfor=scrum*&type=AllFields&limit=20
or
https://www.swissbib.ch/Search/Results?lookfor=pneumonia&type=AllFields&limit=20
https://www.swissbib.ch/Search/Results?lookfor=pneumonia*&type=AllFields&limit=20
When there is a * in the search query, the quality of the relevance ranking is worse. The year boosting factor seems to have way more influence in the wilcarded search.
The text was updated successfully, but these errors were encountered: