user provided compression/decompression for record batches #81

pdeva · 2024-09-15T15:03:26Z

PR for #80

this makes #79 a non-issue and allows this lib to focus on kafka protocol instead of various decompression routines.

tychedelia · 2024-09-16T19:13:12Z

I'm not sure I agree that compression is unrelated to the library's function. Is there a reason why we can't improve the ability to configure the particular codec configuration rather than rip everything out? Alternatively, we could add a "bring your own compression lib" option without removing the existing options?

As it stands, this PR would cause churn for our existing production users without much benefit to them.

pdeva · 2024-09-16T19:40:29Z

this library has never really supported compression fully.
2 of the 4 compression algos have been contributed by members of our team. and we are constantly having to create bug reports to add fixes to it.

the compression part simply isnt production ready. it doesnt have the testing it needs and scenarios like choosing custom compression levels are not covered.

making the compression user provided is the best way to go.

rukai · 2024-09-16T22:33:00Z

I think that compression is very much a part of the kafka protocol.
And since this crate allows users to encode and decode the entire kafka protocol it should support all kafka supported compression out of the box.

However:

We should put each compression algorithm behind a feature flag enabled by default
Exposing a way for users to provide custom compression algorithms could be nice, I can see use cases where a service might be kafka compatible but support extra compression algorithms.

For your team this should be equivalent to ripping out the existing compression and adding custom compression logic: just disable all the compression features and then provide a custom compression algorithm.

If collaborating with upstream is too much overhead for your team feel free to fork the project and implement it yourself.
Since there is a clear split between the kafka protocol and the record level protocol (where compression occurs) you can continue using upstream version of this library for the kafka protocol, and then use a fork of this library for the record level protocol.

FWIW my project doesnt currently make use of the record level protocol encoding/decoding, so thats why we haven't encountered any issues with it. We might start using it in the future though.

pdeva · 2024-09-17T02:05:28Z

i have modified the PR so the change might work for all parties.

The provided custom compression/decompression method is now an Option.
So if the user wants to use the inbuilt compress/decompress of this lib, he can just pass None.
Otherwise he can provide a function via Some(my_func) to do custom compression/decompression as needed.

this way existing users only have to pass a None when upgrading.

rukai

This largely seems fine, but I've left a few inline comments.

Also, there are a few variations on API I can think of:

encode/decode methods could rely on a generic type to provide the custom compression logic without needing an argument to be passed in.
- downside is more complexity
we could match the compression API to take in the decode/encode function like we currently do for the default encode/decode logic
- I suspect the current implementation might have more room for optimization but its hard to say without actually attempting such optimization which
- I think their might actually be a third option that would be better than both? Maybe we just want to be able to preallocate the second BytesMut to the same size + buffer as the original BytesMut?

But I dont see any of these variations as clearly better than the current PR, so assuming @tychedelia is ok with it, I think lets just go ahead with what we have. (after addressing the inline comments I've left)

src/records.rs

rukai

lgtm, I'll give @tychedelia some time to give any input before merging.

tychedelia

Thanks for making these changes to support both paths, and while I understand you view the existing compression options as too buggy to use, I hope that you'll consider upstreaming any fixes as you learn more.

While investigating the cause of LZ4 compression issues related to franz-go (see comments here #1651), I found `lz4_flex` which is a pure-Rust lz4 implementation which appears to be safer and faster than `lz4`/`lz4-sys` that `kafka-protocol` is using. Now that tychedelia/kafka-protocol-rs#81 allows us to use our own compression, and `lz4`'s configuration of block checksums is broken (fix here 10XGenomics/lz4-rs#52), I thought it would be a good time to swap to `lz4_flex`.

pdeva added 2 commits September 15, 2024 08:02

user provided decompression

c305782

user provided compression

aa96d93

pdeva changed the title ~~user provided decompression for record batches~~ user provided compression/decompression for record batches Sep 15, 2024

remove unused files

2bebaf5

tychedelia requested a review from rukai September 16, 2024 19:08

tychedelia added the enhancement New feature or request label Sep 16, 2024

make user compression optional

4c825fa

rukai reviewed Sep 17, 2024

View reviewed changes

src/records.rs Show resolved Hide resolved

src/records.rs Outdated Show resolved Hide resolved

pdeva added 2 commits September 16, 2024 22:50

fix ws

46c8452

docs for the new arguments

90976b5

rukai approved these changes Sep 17, 2024

View reviewed changes

tychedelia approved these changes Sep 20, 2024

View reviewed changes

Merge branch 'main' into pdeva/compression

6a07f3b

tychedelia merged commit cabe835 into tychedelia:main Sep 20, 2024
3 checks passed

jshearer mentioned this pull request Sep 23, 2024

dekaf: Temporarily disable record LZ4 compression estuary/flow#1651

Merged

jshearer mentioned this pull request Sep 23, 2024

dekaf: Swap to lz4_flex from lz4 estuary/flow#1653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user provided compression/decompression for record batches #81

user provided compression/decompression for record batches #81

pdeva commented Sep 15, 2024 •

edited

Loading

tychedelia commented Sep 16, 2024

pdeva commented Sep 16, 2024

rukai commented Sep 16, 2024 •

edited

Loading

pdeva commented Sep 17, 2024

rukai left a comment •

edited

Loading

rukai left a comment

tychedelia left a comment

user provided compression/decompression for record batches #81

user provided compression/decompression for record batches #81

Conversation

pdeva commented Sep 15, 2024 • edited Loading

tychedelia commented Sep 16, 2024

pdeva commented Sep 16, 2024

rukai commented Sep 16, 2024 • edited Loading

pdeva commented Sep 17, 2024

rukai left a comment • edited Loading

Choose a reason for hiding this comment

rukai left a comment

Choose a reason for hiding this comment

tychedelia left a comment

Choose a reason for hiding this comment

pdeva commented Sep 15, 2024 •

edited

Loading

rukai commented Sep 16, 2024 •

edited

Loading

rukai left a comment •

edited

Loading