Skip to content
This repository has been archived by the owner on Jan 25, 2021. It is now read-only.

Search with * wildcard has an effect on the relevance ranking #700

Open
liowalter opened this issue May 23, 2019 · 4 comments
Open

Search with * wildcard has an effect on the relevance ranking #700

liowalter opened this issue May 23, 2019 · 4 comments

Comments

@liowalter
Copy link
Member Author

Here is solr scoring explanations for the document https://test.swissbib.ch/Record/316493929

This is ranked 1st for quarteroni and ranked 28th for quarteroni*.

quarteroni search
debug link

{
  "316493929": {
    "match": true,
    "value": 6917.0166,
    "description": "sum of:",
    "details": [
      {
        "match": true,
        "value": 6916.366,
        "description": "max of:",
        "details": [
          {
            "match": true,
            "value": 690.38696,
            "description": "weight(author_additional_gnd_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 690.38696,
                "description": "score(doc=1258316,freq=3.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 100,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 6.9038696,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1.7320508,
                        "description": "tf(freq=3.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 3,
                            "description": "termFreq=3.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 11.957853,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 28,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 1664689,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.33333334,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "match": true,
            "value": 6916.366,
            "description": "weight(author:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 6916.366,
                "description": "score(doc=1258316,freq=1.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 750,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 9.221822,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1,
                        "description": "tf(freq=1.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 1,
                            "description": "termFreq=1.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 13.041626,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 46,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 7974611,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.70710677,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "match": true,
            "value": 189.3192,
            "description": "weight(addfields_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 189.3192,
                "description": "score(doc=1258316,freq=1.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 50,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 3.7863839,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1,
                        "description": "tf(freq=1.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 1,
                            "description": "termFreq=1.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 13.116419,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 48,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 8959630,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.28867513,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "match": true,
        "value": 0.65048635,
        "description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
        "details": [
          {
            "match": true,
            "value": 0.65048635,
            "description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
          },
          {
            "match": true,
            "value": 1,
            "description": "boost"
          }
        ]
      }
    ]
  }
}

quarteroni* search
debug link

{
  "316493929": {
    "match": true,
    "value": 750.6505,
    "description": "sum of:",
    "details": [
      {
        "match": true,
        "value": 750,
        "description": "max of:",
        "details": [
          {
            "match": true,
            "value": 100,
            "description": "author_additional_gnd_txt_mv:quarteroni*^100.0"
          },
          {
            "match": true,
            "value": 750,
            "description": "author:quarteroni*^750.0"
          },
          {
            "match": true,
            "value": 50,
            "description": "addfields_txt_mv:quarteroni*^50.0"
          }
        ]
      },
      {
        "match": true,
        "value": 0.65048635,
        "description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
        "details": [
          {
            "match": true,
            "value": 0.65048635,
            "description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
          },
          {
            "match": true,
            "value": 1,
            "description": "boost"
          }
        ]
      }
    ]
  }
}

@liowalter
Copy link
Member Author

Looks like prefix queries ("a*") are constant-scoring (all matching documents get an equal score). The scoring factors TF, IDF, index boost, and "coord" are not used.

@liowalter
Copy link
Member Author

Looks like vufind suffers from the same problem :

https://vufind.org/demo/Search/Results?lookfor=quarteroni*&type=AllFields&limit=20
https://vufind.org/demo/Search/Results?lookfor=quarteroni&type=AllFields&limit=20

This is not really bad for searches, but it is very bad for suggestions, as suggestions are based on wildcard queries. One more reason to use https://lucene.apache.org/solr/guide/7_3/suggester.html

@liowalter
Copy link
Member Author

I solved it using "quarteroni OR quarteroni*" as a query. But this is not a fully convincing solution as this has some border effects (for example using pf solr parameter will boost documents which have the query word twice).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant