-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ORDER BY BM25 #1434
Implement ORDER BY BM25 #1434
Conversation
3a50985
to
3848e94
Compare
(force push is identical code that CI ran previously, just cleaned up the history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'm really happy with how much this cleaned up the index based ordering logic in several classes.
Left a handful of minor comments/questions.
src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
Show resolved
Hide resolved
test/unit/org/apache/cassandra/index/sai/StorageAttachedIndexTest.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
Outdated
Show resolved
Hide resolved
03036e4
to
a5060e9
Compare
…onsible for ANN index queries. Other global orderings will be represented by a SingleColumnComparator with clustered=true instead.
a5060e9
to
5776439
Compare
… of recomputing scores on the coordinator
5776439
to
e0ea872
Compare
162e02e
to
56a6e0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left several minor comments and a few larger questions.
src/java/org/apache/cassandra/index/sai/disk/v1/postings/IntersectingPostingList.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/postings/IntersectingPostingList.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/postings/IntersectingPostingList.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/TopKProcessor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/disk/v1/InvertedIndexSearcher.java
Outdated
Show resolved
Hide resolved
…ction in SingleColumnRelation.newEQRestriction. This eliminates the need for skipMerge and special cases in doMergeWith, and moves the issuing of warnings next to the place where the transformation occurs instead of doing it much later in RowFilterValidator (which is no longer needed)
5f5eb0e
to
3d17e2f
Compare
01d3adc
to
b19fead
Compare
2babc86
to
50c4a57
Compare
… approach than ignoring it when serialization fails later
I think the GenericOrderByTest failure is real :( |
@jbellis looks like we're duplicating the sort value so that the rows have 4 columns instead of 3. When I update the order by test to use
That error looks like we're not deduping rows properly. |
1cfbbba
to
a3d5c67
Compare
…ostly do different things, jamming the logic together in "processPartitions" caused way more complexity than saving a repeated while loop was worth
a3d5c67
to
39652f1
Compare
|
Something wonky with Butler?
|
VectorLocalTest and VectorCompactionTest both pass locally |
// by synthetic +score column. | ||
boolean cqlReversed = ordering.direction == Ordering.Direction.DESC; | ||
if (def.position() == ColumnMetadata.NO_POSITION) | ||
return ordering.expression.isScored() || cqlReversed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this hurts anything as is, but is the "isScored()" ever going to return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ann and BM25 Ordering subclasses override isScored() to true. Maybe I'm not understanding your point correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work :) just waiting on CI now
|
✔️ Build ds-cassandra-pr-gate/PR-1434 approved by ButlerApproved by Butler |
### What is the issue The `NON_DAEMON` IndexRegistry returns `true` for all calls to `supportsExpression`. This leads to an incorrect result from the `getEqBehavior` method where we get `MATCH` instead of `EQ` because the index indicates that it can handle `ANALYZER_MATCHES` expressions and `EQ` expressions. It is an odd edge case because the javadoc for the `NON_DAEMON` object is: ```java /** * An {@code IndexRegistry} intended for use when Cassandra is initialized in client or tool mode. * Contains a single stub {@code Index} which possesses no actual indexing or searching capabilities * but enables query validation and preparation to succeed. Useful for tools which need to prepare * CQL statements without instantiating the whole ColumnFamilyStore infrastructure. */ ``` This presents a problem for the eq/match logic where we want to find a nuanced solution to the different solutions. My proposal here is to just make it use the EQ behavior, but that might have adverse side effects in untested code. ### What does this PR fix and why was it fixed The original fix used in #1434 was just to avoid the NPE we hit, but it allowed for the wrong result in eq behavior. This fix is to say that we should just return `EQ` in that case. It's possible this fix has negative consequences that we haven't seen yet, but at the very least, the CNDB tests pass with it.
What is the issue
https://github.com/riptano/cndb/issues/11725
What does this PR fix and why was it fixed
...
Checklist before you submit for review
NoSpamLogger
for log lines that may appear frequently in the logs