Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder #49304

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented Dec 27, 2024

What changes were proposed in this pull request?

Move virt col family related mapping into db layer instead of encoder

Why are the changes needed?

Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:

  • encoder is only responsible for managing encoding based on type such as noPrefix, prefix, range etc
  • the onus of maintaining virtual col families is now with the underlying DB layer
  • this layer can now also expose metrics for internal as well as non-internal column families

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing Unit tests and added unit tests

[info] Run completed in 8 minutes, 48 seconds.
[info] Total number of tests run: 305
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 305, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db marked this pull request as ready for review December 28, 2024 06:54
@anishshri-db
Copy link
Contributor Author

cc - @ericm-db @jingz-db - PTAL, thx !

@anishshri-db anishshri-db changed the title [SPARK-50655][SS] Move virt col family related mapping into db layer instead of encoder [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder Jan 2, 2025
@anishshri-db
Copy link
Contributor Author

@HeartSaVioR - could you PTAL, thanks !

@@ -578,7 +578,11 @@ class RocksDBSuite extends AlsoTestWithRocksDBFeatures with SharedSparkSession

if (isChangelogCheckpointingEnabled) {
assert(changelogVersionsPresent(remoteDir) === (1 to 50))
assert(snapshotVersionsPresent(remoteDir) === Range.inclusive(5, 50, 5))
if (colFamiliesEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why will we get different results for snapshot with colFamiliesEnabled and disabled? I thought we are writing to the changelog the same way so this should be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the only difference is that when we create a new col family - we force the snapshot creation by setting a flag. In the tests - we have to create this each time if there is no commit involved - hence we see the difference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants