-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder #49304
base: master
Are you sure you want to change the base?
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
Outdated
Show resolved
Hide resolved
@HeartSaVioR - could you PTAL, thanks ! |
@@ -578,7 +578,11 @@ class RocksDBSuite extends AlsoTestWithRocksDBFeatures with SharedSparkSession | |||
|
|||
if (isChangelogCheckpointingEnabled) { | |||
assert(changelogVersionsPresent(remoteDir) === (1 to 50)) | |||
assert(snapshotVersionsPresent(remoteDir) === Range.inclusive(5, 50, 5)) | |||
if (colFamiliesEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but why will we get different results for snapshot with colFamiliesEnabled and disabled? I thought we are writing to the changelog the same way so this should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the only difference is that when we create a new col family - we force the snapshot creation by setting a flag. In the tests - we have to create this each time if there is no commit involved - hence we see the difference
What changes were proposed in this pull request?
Move virt col family related mapping into db layer instead of encoder
Why are the changes needed?
Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing Unit tests and added unit tests
Was this patch authored or co-authored using generative AI tooling?
No