-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use receivertest package to test beats receivers #41888
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
While working on elastic#41888 I was benchmarking the filebeatreceiver CreateLogs factory and noticed that the asset decoding in libbeat dominates the cpu and memory profile of the receiver creation. This behavios is expected since asset decoding is intened to occur at startup. However, it's still worthwhile to optimize it if possible. Some time ago I worked on `iobuf.ReadAll` at elastic/elastic-agent-libs#229, an optimized version of `io.ReadAll` that has a better growth algorithm—based on bytes.Buffer—and benefits from the `io.ReaderFrom` optimization. The choice of when to use this is very picky, as using it with a reader that is not a `io.ReaderFrom` can be slower than the standard `io.ReadAll`. For this case we are certain of the reader implementation, so we can use it. Benchmark results show that it is 5% faster and uses 17% less memory. On top of this use klauspost/compress instead of compress/zlib, that shoves off an additional 11% of the cpu time. In summary, the cummulative effect of these changes is a 15% cpu time reduction and 18% less memory usage in the asset decoding. After these fixes the profiles are still dominated by the asset decoding, but I guess that is expected, at least it is a bit faster now.
While working on elastic#41888 I was benchmarking the filebeatreceiver CreateLogs factory and noticed that the asset decoding in libbeat dominates the cpu and memory profile of the receiver creation. This behavior is expected since asset decoding is intended to occur at startup. However, it's still worthwhile to optimize it if possible. Some time ago I worked on `iobuf.ReadAll` at elastic/elastic-agent-libs#229, an optimized version of `io.ReadAll` that has a better growth algorithm—based on bytes.Buffer—and benefits from the `io.ReaderFrom` optimization. The choice of when to use this is very picky, as using it with a reader that is not a `io.ReaderFrom` can be slower than the standard `io.ReadAll`. For this case we are certain of the reader implementation, so we can use it. Benchmark results show that it is 5% faster and uses 17% less memory. After these fixes the profiles are still dominated by the asset decoding, but I guess that is expected, at least it is a bit faster now.
I investigated trying to use the receivertest for tests and have some findings. The two main challenges to use this package is to write a generator implementation that creates unique events that can be processed by the receiver and tell receivertest where it should look for this id in the For the first problem, the way I found to adapt this generator concept to Beats receiver is to have a generator that writes ndjson lines to a file with a specific id and use the file as a filestream input with Regarding the second challenge, it is a bit more complicated. Unfortunately we cannot use the receivertest as-is, it looks for a hardcoded We would have to extend receivertest to have the ability to use a user defined function to lookup this ID. I have a prototype ready at main...mauri870:beats:receivertest. I have opened open-telemetry/opentelemetry-collector#12003 upstream as well. |
While working on #41888 I was benchmarking the filebeatreceiver CreateLogs factory and noticed that the asset decoding in libbeat dominates the cpu and memory profile of the receiver creation. This behavior is expected since asset decoding is intended to occur at startup. However, it's still worthwhile to optimize it if possible. Some time ago I worked on `iobuf.ReadAll` at elastic/elastic-agent-libs#229, an optimized version of `io.ReadAll` that has a better growth algorithm—based on bytes.Buffer—and benefits from the `io.ReaderFrom` optimization. The choice of when to use this is very picky, as using it with a reader that is not a `io.ReaderFrom` can be slower than the standard `io.ReadAll`. For this case we are certain of the reader implementation, so we can use it. Benchmark results show that it is 5% faster and uses 17% less memory. After these fixes the profiles are still dominated by the asset decoding, but I guess that is expected, at least it is a bit faster now.
While working on #41888 I was benchmarking the filebeatreceiver CreateLogs factory and noticed that the asset decoding in libbeat dominates the cpu and memory profile of the receiver creation. This behavior is expected since asset decoding is intended to occur at startup. However, it's still worthwhile to optimize it if possible. Some time ago I worked on `iobuf.ReadAll` at elastic/elastic-agent-libs#229, an optimized version of `io.ReadAll` that has a better growth algorithm—based on bytes.Buffer—and benefits from the `io.ReaderFrom` optimization. The choice of when to use this is very picky, as using it with a reader that is not a `io.ReaderFrom` can be slower than the standard `io.ReadAll`. For this case we are certain of the reader implementation, so we can use it. Benchmark results show that it is 5% faster and uses 17% less memory. After these fixes the profiles are still dominated by the asset decoding, but I guess that is expected, at least it is a bit faster now. (cherry picked from commit 3d1bdcf)
* libbeat: optimize asset data decoding (#42180) While working on #41888 I was benchmarking the filebeatreceiver CreateLogs factory and noticed that the asset decoding in libbeat dominates the cpu and memory profile of the receiver creation. This behavior is expected since asset decoding is intended to occur at startup. However, it's still worthwhile to optimize it if possible. Some time ago I worked on `iobuf.ReadAll` at elastic/elastic-agent-libs#229, an optimized version of `io.ReadAll` that has a better growth algorithm—based on bytes.Buffer—and benefits from the `io.ReaderFrom` optimization. The choice of when to use this is very picky, as using it with a reader that is not a `io.ReaderFrom` can be slower than the standard `io.ReadAll`. For this case we are certain of the reader implementation, so we can use it. Benchmark results show that it is 5% faster and uses 17% less memory. After these fixes the profiles are still dominated by the asset decoding, but I guess that is expected, at least it is a bit faster now. (cherry picked from commit 3d1bdcf) * s/CreateLogsReceiver/CreateLogs/ --------- Co-authored-by: Mauri de Souza Meneguzzo <[email protected]>
@mauri870 is this one still blocked? |
Yes, it requires changes to the upstream package. Without those changes, we cannot use it to test the Beats receivers. The alternative would be to copy a subset of the functionality and reproduce the test cases directly in Beats. I would prefer the first option. Unfortunately, I haven't had time to work on either of these in the past few weeks. |
Now that I revisited this topic for a moment, I think we can hack our way through without changing things upstream. I'll send a PR soon. |
I found a way to use the receivertest without needing changes upstream. I have a PoC here. Unfortunately I hit a roadblock, the test seems to be working fine but when we hit a TODO: make sure batch retry actually works with a filebeatreceiver. |
The open telemetry project has a
receivertest
module that can be used to test a receiver's contract. I'm particularly interested in theCheckConsumerContract
function that is used to test the contract between the receiver and the next consumer in the pipeline.This test has a couple of interesting scenarios such as:
This test is based on a Generator that is responsible to produce the data used during the test. For this to work properly the beats receiver must implement the
ReceiveLogs(data plog.Logs)
function to receive the logs from this external source. In the case of beats receivers, the beats themselves produce their own data, so we need to come up with a way to adapt this generator concept or implement said function. I believe the messages should be unique, so perhaps thebenchmark
input being able to output unique strings or an integer counter should be enough.This seems quite handy to setup once and use the same machinery to test every beats receiver.
Here is an example test:
During my initial experimentation with this I found a bug with global state in libbeat that is now resolved by #41475.
The text was updated successfully, but these errors were encountered: