Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test for spark streaming to make sure it uses merger when configured #6

Draft
wants to merge 2 commits into
base: create_new_merger_configs
Choose a base branch
from

Conversation

jonvex
Copy link
Owner

@jonvex jonvex commented Sep 25, 2024

Change Logs

Describe context and summary for this change. Highlight if any code was copied.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@jonvex jonvex force-pushed the test_merge_mode_streaming branch from 5f1de7b to 15244d9 Compare October 17, 2024 01:22
* Instead of merging, it will just delete the record. Use for testing
*
*/
public class HoodieSparkDeleteRecordMerger extends HoodieSparkRecordMerger {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this especially needed for structured streaming?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I think it will be useful for additional testing as we switch over to merge mode on the write side in 1.1.

}

@Test
def testStructuredStreamingWithMergeMode(): Unit = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parameterze over all merge modes? This test only handles custom merge mode and I guess that's why you had to add DeleteRecordMerger, which is fine. But, let's also test other merge modes in streaming (maybe event time merge mode is getting tested due to default).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth the extra CI runtime. This isn't to really test out all the merge modes. It is to test that we can do structured streaming with custom merge mode and the configs are passed through correctly. I parameterized it for mor/cow now because we needed to test the merger for mor as well in the reader.

@jonvex jonvex force-pushed the test_merge_mode_streaming branch from 15244d9 to 6efda9c Compare October 18, 2024 16:45
@jonvex jonvex force-pushed the test_merge_mode_streaming branch from 08b6895 to 77ec168 Compare October 21, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants