Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API /search return result with HTML tags and formating #2612

Open
wy193777 opened this issue Jan 4, 2019 · 11 comments
Open

REST API /search return result with HTML tags and formating #2612

wy193777 opened this issue Jan 4, 2019 · 11 comments

Comments

@wy193777
Copy link
Contributor

wy193777 commented Jan 4, 2019

The /search REST API return results with unnecessary HTML tags and formatting. Is there a way to turn off HTML formatting on search results from REST API?

@vladak
Copy link
Member

vladak commented Jan 4, 2019

Could you give an example ?

@wy193777
Copy link
Contributor Author

wy193777 commented Jan 4, 2019

Below is an example from a instance run locally. You can see a lot of <b> been added to wrap the search term. Logically, the client using the REST API should already know the searched term, so strong them isn't necessary. You can also see & been substituted to &amp;.

{
  "time": 68,
  "resultCount": 4,
  "startDocument": 0,
  "endDocument": 3,
  "results": {
    "/Golden-Register-2.0-Backend/package.json": [
      {
        "line": "    \"start\": \"copyfiles sql/**/*.sql build/ &amp;&amp; tsc &amp;&amp; node --<b>max-old-space-size</b>=5120 --trace-warnings -r ts-node/register build/src/index.js --pretty\",",
        "lineNumber": "13"
      }
    ],
    "/diagram-visualization/node_modules/webpack/package.json": [
      {
        "line": "    \"appveyor:test\": \"node node_modules\\\\mocha\\\\bin\\\\mocha --<b>max-old-space-size</b>=4096 --harmony test/*.test.js\",",
        "lineNumber": "129"
      },
      {
        "line": "    \"benchmark\": \"mocha --<b>max-old-space-size</b>=4096 --harmony test/*.benchmark.js -R spec\",",
        "lineNumber": "131"
      },
      {
        "line": "    \"circleci:test\": \"node node_modules/mocha/bin/mocha --<b>max-old-space-size</b>=4096 --harmony test/*.test.js\",",
        "lineNumber": "134"
      },
      {
        "line": "    \"cover\": \"node --<b>max-old-space-size</b>=4096 --harmony ./node_modules/istanbul/lib/cli.js cover -x '**/*.runtime.js' node_modules/mocha/bin/_mocha -- test/*.test.js\",",
        "lineNumber": "135"
      },
      {
        "line": "    \"cover:min\": \"node --<b>max-old-space-size</b>=4096 --harmony ./node_modules/istanbul/lib/cli.js cover -x '**/*.runtime.js' --report lcovonly node_modules/mocha/bin/_mocha -- test/*.test.js\",",
        "lineNumber": "136"
      },
      {
        "line": "    \"test\": \"mocha test/*.test.js --<b>max-old-space-size</b>=4096 --harmony --check-leaks\",",
        "lineNumber": "143"
      }
    ],
    "/Patching-Tool-Client/node_modules/webpack/package.json": [
      {
        "line": "    \"benchmark\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.benchmark.js\\\" --runInBand\",",
        "lineNumber": "200"
      },
      {
        "line": "    \"cover:all\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --coverage\",",
        "lineNumber": "204"
      },
      {
        "line": "    \"cover:integration\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\" --coverage\",",
        "lineNumber": "206"
      },
      {
        "line": "    \"cover:unit\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\" --coverage\",",
        "lineNumber": "208"
      },
      {
        "line": "    \"schema-lint\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.lint.js\\\" --no-verbose\",",
        "lineNumber": "214"
      },
      {
        "line": "    \"test\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest\",",
        "lineNumber": "218"
      },
      {
        "line": "    \"test:basic\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/{
          TestCasesNormal,
          StatsTestCases,
          ConfigTestCases
        }.test.js\\\"\",",
        "lineNumber": "219"
      },
      {
        "line": "    \"test:integration\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\"\",",
        "lineNumber": "220"
      },
      {
        "line": "    \"test:unit\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\"\",",
        "lineNumber": "221"
      }
    ],
    "/Golden-Register-2.0-Frontend/node_modules/webpack/package.json": [
      {
        "line": "    \"benchmark\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.benchmark.js\\\" --runInBand\",",
        "lineNumber": "244"
      },
      {
        "line": "    \"cover:all\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --coverage\",",
        "lineNumber": "248"
      },
      {
        "line": "    \"cover:integration\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\" --coverage\",",
        "lineNumber": "250"
      },
      {
        "line": "    \"cover:unit\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\" --coverage\",",
        "lineNumber": "252"
      },
      {
        "line": "    \"schema-lint\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.lint.js\\\" --no-verbose\",",
        "lineNumber": "258"
      },
      {
        "line": "    \"test\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest\",",
        "lineNumber": "260"
      },
      {
        "line": "    \"test:basic\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/{
          TestCasesNormal,
          StatsTestCases,
          ConfigTestCases
        }.test.js\\\"\",",
        "lineNumber": "261"
      },
      {
        "line": "    \"test:integration\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\"\",",
        "lineNumber": "262"
      },
      {
        "line": "    \"test:unit\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\"\",",
        "lineNumber": "263"
      }
    ]
  }
}

The OpenGrok's REST API doc also show this behavior.

{
  "time": 13,
  "resultCount": 35,
  "startDocument": 0,
  "endDocument": 0,
  "results": {
    "/opengrok/test/org/opensolaris/opengrok/history/hg-export-renamed.txt": [{
      "line": "# User Vladimir <b>Kotal</b> &lt;Vladimir.<b>Kotal</b>@oracle.com&gt;",
      "lineNumber": "19"
    },{
      "line": "# User Vladimir <b>Kotal</b> &lt;Vladimir.<b>Kotal</b>@oracle.com&gt;",
      "lineNumber":"29"
    }]
  }

@vladak
Copy link
Member

vladak commented Jan 4, 2019

It seems that the summarizer/highlighter kicks in.

@vladak
Copy link
Member

vladak commented Jan 4, 2019

Yes, SearchController uses SearchEngine.results() which uses Summarizer and Summary used therein is inherently HTML based. There needs to be a special version of SearchEngine.results() for the API.

@wy193777
Copy link
Contributor Author

wy193777 commented Jan 4, 2019

Thanks for your quick help!

@wy193777
Copy link
Contributor Author

Have idea on when will this bug been fixed?

@vladak
Copy link
Member

vladak commented Jan 12, 2019 via email

@wy193777
Copy link
Contributor Author

The <b> actually not just comes from Summarizer. After dig into source code, I find those html tags might also comes from Context.java, which is extremely complex.....

@wy193777
Copy link
Contributor Author

OK, Context.java calls code from PlainLineTokenizer.lex, which is generated from opengrok-indexer/src/main/resources/search/context/PlainLineTokenizer.lex. This lex file seems also did things like htmlize. I guess eliminate <b> is not a simple task.

@vladak
Copy link
Member

vladak commented Mar 20, 2019 via email

@idodeclare
Copy link
Contributor

It's fairly straight-forward to extend OGKUnifiedHighlighter. I raised PR #2732.

idodeclare added a commit to idodeclare/OpenGrok that referenced this issue Jul 20, 2019
Addresses the source-code related part of oracle#2612
since `OGKUnifiedHighlighter` allows straight-
forward, alternate formatters.

`HistoryContext` is still a custom highlighting,
so it will still return HTML-like content.
idodeclare added a commit to idodeclare/OpenGrok that referenced this issue Oct 9, 2020
Addresses the source-code related part of oracle#2612
since `OGKUnifiedHighlighter` allows straight-
forward, alternate formatters.

`HistoryContext` is still a custom highlighting,
so it will still return HTML-like content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants