You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tell us about your request. Provide a summary of the request.
The documentation for term query includes the case_insensitive parameter.
Due to the implementation details of this type of search, every (alphabetic) character in such a query doubles the complexity of the search, consuming a lot of heap memory and potentially crashing nodes due to high CPU with GC thrashing. Even a relatively short search term (about 16 characters) could result in nearly 8 GB of heap.
This potential impact should be highlighted in the documentation as a warning. Additionally there are preferred strategies for doing case insensitive searches that should be presented as an alternative in the docs.
I'm happy to write such docs but would hope to get some assistance from @msfroh in validating them technically.
Version: List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.
all
The text was updated successfully, but these errors were encountered:
Elsewhere -- like in RegexpQueryBuilder, we default to setting max_determinized_states to 10000 (by referencing the constant defined in Lucene as Operations.DEFAULT_DETERMINIZE_WORK_LIMIT).
IMO, we should:
deprecate all of the existing caseInsensitive*Query methods in AutomatonQueries,
replace them with calls that specify maxDeterminizedStates,
add max_determinized_states as a query parameter to any query type that may generate an automaton query, and
make the default max_determinized_states value for those query types Integer.MAX_VALUE on the 2.x branch (for dangerous backward compatibility) and Operations.DEFAULT_DETERMINIZE_WORK_LIMIT (i.e. 10000) on the main branch so we're safe by default on 3.0.
That way, folks using 2.19 can at least safeguard themselves by explicitly setting max_determinized_states to something reasonable, while 3.0 is safe by default (but we let users risk shooting themselves in the foot if they explicitly ask to).
What do you want to do?
Tell us about your request. Provide a summary of the request.
The documentation for term query includes the
case_insensitive
parameter.Due to the implementation details of this type of search, every (alphabetic) character in such a query doubles the complexity of the search, consuming a lot of heap memory and potentially crashing nodes due to high CPU with GC thrashing. Even a relatively short search term (about 16 characters) could result in nearly 8 GB of heap.
This potential impact should be highlighted in the documentation as a warning. Additionally there are preferred strategies for doing case insensitive searches that should be presented as an alternative in the docs.
I'm happy to write such docs but would hope to get some assistance from @msfroh in validating them technically.
Version: List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.
all
The text was updated successfully, but these errors were encountered: