-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
user provided compression/decompression for record batches #81
Conversation
I'm not sure I agree that compression is unrelated to the library's function. Is there a reason why we can't improve the ability to configure the particular codec configuration rather than rip everything out? Alternatively, we could add a "bring your own compression lib" option without removing the existing options? As it stands, this PR would cause churn for our existing production users without much benefit to them. |
this library has never really supported compression fully. the compression part simply isnt production ready. it doesnt have the testing it needs and scenarios like choosing custom compression levels are not covered. making the compression user provided is the best way to go. |
I think that compression is very much a part of the kafka protocol. However:
For your team this should be equivalent to ripping out the existing compression and adding custom compression logic: just disable all the compression features and then provide a custom compression algorithm. If collaborating with upstream is too much overhead for your team feel free to fork the project and implement it yourself. FWIW my project doesnt currently make use of the record level protocol encoding/decoding, so thats why we haven't encountered any issues with it. We might start using it in the future though. |
i have modified the PR so the change might work for all parties. The provided custom compression/decompression method is now an this way existing users only have to pass a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This largely seems fine, but I've left a few inline comments.
Also, there are a few variations on API I can think of:
- encode/decode methods could rely on a generic type to provide the custom compression logic without needing an argument to be passed in.
- downside is more complexity
- we could match the compression API to take in the decode/encode function like we currently do for the default encode/decode logic
- I suspect the current implementation might have more room for optimization but its hard to say without actually attempting such optimization which
- I think their might actually be a third option that would be better than both? Maybe we just want to be able to preallocate the second BytesMut to the same size + buffer as the original BytesMut?
But I dont see any of these variations as clearly better than the current PR, so assuming @tychedelia is ok with it, I think lets just go ahead with what we have. (after addressing the inline comments I've left)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I'll give @tychedelia some time to give any input before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes to support both paths, and while I understand you view the existing compression options as too buggy to use, I hope that you'll consider upstreaming any fixes as you learn more.
While investigating the cause of LZ4 compression issues related to franz-go (see comments here #1651), I found `lz4_flex` which is a pure-Rust lz4 implementation which appears to be safer and faster than `lz4`/`lz4-sys` that `kafka-protocol` is using. Now that tychedelia/kafka-protocol-rs#81 allows us to use our own compression, and `lz4`'s configuration of block checksums is broken (fix here 10XGenomics/lz4-rs#52), I thought it would be a good time to swap to `lz4_flex`.
While investigating the cause of LZ4 compression issues related to franz-go (see comments here #1651), I found `lz4_flex` which is a pure-Rust lz4 implementation which appears to be safer and faster than `lz4`/`lz4-sys` that `kafka-protocol` is using. Now that tychedelia/kafka-protocol-rs#81 allows us to use our own compression, and `lz4`'s configuration of block checksums is broken (fix here 10XGenomics/lz4-rs#52), I thought it would be a good time to swap to `lz4_flex`.
PR for #80
this makes #79 a non-issue and allows this lib to focus on kafka protocol instead of various decompression routines.