-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Hudi merged view files for partition path updates without compaction #24283
base: master
Are you sure you want to change the base?
Conversation
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add tests please
7198333
to
483e2cb
Compare
There is already a test in |
32c670f
to
896e81c
Compare
@ZacBlanco Feedback is addressed. Please take a look again. |
7cd16cc
to
d0bf390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. The configurations look better to me now.
I have one more concern, and that's the added files for these tests take up about 2.2MB. Not a whole lot, but for the health and size of the repository I would either like to generate the Hudi tables during the test, or minimize the size of added files to be < 1MiB.
zacblanco@zac-ibm presto % du -h -s presto-hive/src/test/resources/hudi_mor_part_update
2.2M presto-hive/src/test/resources/hudi_mor_part_update
presto-hive/src/main/java/com/facebook/presto/hive/HudiDirectoryLister.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HudiDirectoryLister.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHudiDirectoryLister.java
Show resolved
Hide resolved
@ZacBlanco I have addressed your feedback. On the added resource files for the tests, currently we don't have a hudi writer integration. One solution is to zip the hudi testing data and unzip in tests. Overall size on disk would be less, with a little bit of overhead during tests (probably similar or lesser than writing hudi tables). If you think that's ok, I can do that in a separate PR. Please let me know your thoughts. |
d0bf390
to
1b9e76d
Compare
Thank you for the updates. It looks good to me now. On the table size aspect:
Have you tested the size of the zipped table? Also, we might even avoid zipping if we perform the minimum number of operations and make the size of the table as small as possible. Also, did you try to minimize the number of operations and records on this table before including it in the PR? I'm not sure how you generated this table, but ideally we would generate a table with the smallest amount of data and metadata. e.g. This table should have one int-sized column and only a small number (<10?) records across the smallest number of inserts to (1-2?) alongside the partition path update.
These files will get pulled on every clone of the repository. Per-day Presto is seeing 1-2K clones. With the current table it adds that's up to 4GB additional data per day being pulled. It needs to be done in the same commit that the table is introduced, or else the extra data will still exist in the git history and will affect non-shallow clones. |
Description
Support Hudi merged view files for partition path updates without compaction. This is needed with Merge-on-Read Hudi tables when partition has been updated for a record and the table has not been compacted yet.
schemaName.tableName
) where this support is needed.HudiDirectoryLister
.Motivation and Context
Support Hudi merged view files for partition path updates without compaction. This is needed with Merge-on-Read Hudi tables when partition has been updated for a record and the table has not been compacted yet.
Impact
Support Hudi merged view files for partition path updates without compaction. This is needed with Merge-on-Read Hudi tables when partition has been updated for a record and the table has not been compacted yet. As a result, even the read-optimized view of Merge-on-Read tables under partition path updates without compaction should not return any duplicates.
Test Plan
Contributor checklist
Release Notes