Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): resolve avro Ref to corresponding Record definition #20401

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

xiangjinwu
Copy link
Contributor

@xiangjinwu xiangjinwu commented Feb 6, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This is the sink analogy for #19601 and #19746. Comparing the source and sink support:

  • The sink did not support any form of Ref previously, while the source supported a subset (single-layer) via a hack.
  • The sink unifies schema and datum handling via MaybeData, while the source took 2 PRs to fix separately.
  • The sink uses an owned newtype NamesRef(HashMap<Name, AvroSchema>) while the source uses apache_avro::schema::NamesRef<'s> = HashMap<Name, &'s AvroSchema>. The former requires more memory while the latter requires more time to resolve every time. We can unify to one approach once we learned which is better in practice.
  • The sink does not need to reject circular references as source did, because the upstream SQL schema is never circular.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@@ -395,6 +435,8 @@ fn on_field<D: MaybeData>(data_type: &DataType, maybe: D, expected: &AvroSchema)
_ => (expected, OptIdx::NotUnion),
};

let inner = refs.lookup(inner);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref never resolves to a Union, while a Union may contain a Ref. As a result we check for nullable Union first and then resolve Ref. General (instead of nullability) Union is not supported in sink yet.

@xiangjinwu xiangjinwu marked this pull request as ready for review February 6, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant