-
Notifications
You must be signed in to change notification settings - Fork 17
The shipper event protocol should be a wrapper for JSON bytes #288
Comments
If the benchmarks are better using serialized JSON instead of google.protobuf.Struct (which is what we expect), then let's make the change to the protocol to use JSON going forward. We can also explore other tweaks to the protocol if the performance tests reveal any other new information we need to take into account. |
Initial resultsI tested round tripping (Unmarshal then Marshal) events because in our use case clients will have to Marshall and the shipper will Unmarshal, so the impact of both is important. Also I used 2 types of events. The first (SingleField) was where the data for the event was stored in a single field ("message"). example:
The second type of event was where the data for the event was added to the root of the message as fields (MultiField). example:
Both types of events were generated by running filebeat against sample data and using the file output. benchtime was set to 30 seconds. SingleField Results
MultiField Results
The fourth condition was added, because clients will still have to spend time Marshalling the fields and metadata to strings to send the data. The shipper will have to spend time Unmarshalling to use the fields for processors or access the metadata. https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_base |
Just to confirm I am interpreting this correctly, using the MultiField example as a reference:
I would then conclude that we should absolutely switch to using JSON serialization for the metadata and fields. We can iterate on making the JSON serialization faster, or attempting to optimistically avoid unmarshalling the JSON if we don't need to look at the event metadata or fields later. Is there any reason you tested with the fields as |
yes
One big problem with avoiding unmarshalling "Fields" field. The data we need to run processors, even a drop processor or adding a tag is in there. So I think it is unlikely that we can avoid unmarshalling the "Fields" field.
brain fart. I've switched to bytes and updated results with some more encoders for comparison coming soon. |
Updated ResultsUpdated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_string_fields Command: Raw Results
Comparing to original protobuf implementation, "Shallow Protobuf" is with Fields & Metadata as bytes. "Deep Event" is a go struct that mimics the original protobuf implementation (Fields & Metadata as mapstr, timestamp as string). "goccy" is using https://github.com/goccy/go-json. "CBORL" is using https://github.com/fxamacker/cbor
Observations
|
One advantage to serializing to JSON for the shipper is users and teams that are already writing to Elasticsearch already heavily use JSON and have invested time in making it efficient. We need to consider existing code that already exists for this purpose. I don't like the idea of throwing away this existing efficiency when using the shipper. There are multiple alternative JSON serialization libraries in use for efficiency and convenience reasons, and switching to CBORL would force everyone to pick a CBORL library and invest time in performance tuning it. The disk queue isn't the most common use case, and I think it is generally accepted that writing to disk will be slower. It seems acceptable to pay the cost to serialize from JSON to CBORL there like we do in Beats today, and we generally don't get complaints about it. |
The only other thing it may be worth investigating here is trying to use an alternative protobuf generator as described in #263. Specifically it could be worth quickly testing if using https://github.com/planetscale/vtprotobuf makes a difference. |
Results with vtprotobufUpdated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf Command: Raw Results
Comparison against original protobuf implementation
Observations
|
We spoke about this in the shipper project team meeting, and the vtprotobuf implementation is so much faster that it seems obvious that we should use it. It is however not compatible with messages serialized using the default protobuf compiler: https://github.com/planetscale/vtprotobuf#mixing-protobuf-implementations-with-grpc One way to take advantage of the vtprotobuf optimizations for Go code that can use it is to keep the existing protobuf message definitions, but also support the "wrapped JSON" message proposed in the issue description for clients that can't support an optimizied protobuf implementation. #288 (comment) This would allow us to use the fastest implementation for the majority of the shipper clients which will be Beats or other Go processes but gives us a fallback for clients like endpoint security that wouldn't be compatible with the optimized vtprotobuf RPC. There was general support of this approach. @leehinman one thing I'm not sure we considered is that in the real system Beats will be converting from Should we benchmark the performance of these |
The tests include Marshal & Unmarshal so messages.Event -> Marshal -> UnMarshal -> messages.Event:
We could test beat.Event -> messages.Event -> Marshal -> UnMarshal -> messages.Event -> beat.Event, but I think the largest time is spent Marshal/Unmarshal. |
The conversion from The conversion of the mapstrs to |
Results adding beat.Event conversionUpdated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf New test case is called Raw Results
Observations
Follow up questions
Roundtrip code is copy of what is currently done in beats shipper client and serverfunc rtBeatsVTMessagesEvent(e *beat.Event) {
meta, err := helpers.NewValue(e.Meta)
if err != nil {
panic(err)
}
fields, err := helpers.NewValue(e.Fields)
if err != nil {
panic(err)
}
source := &messages.Source{}
ds := &messages.DataStream{}
inputIDVal, err := e.Meta.GetValue("input_id")
if err != nil {
panic(err)
}
source.InputId, _ = inputIDVal.(string)
streamIDVal, err := e.Meta.GetValue("stream_id")
if err != nil {
panic(err)
}
source.StreamId, _ = streamIDVal.(string)
dsType, err := e.Fields.GetValue("data_stream.type")
if err != nil {
panic(err)
}
ds.Type, _ = dsType.(string)
dsNamespace, err := e.Fields.GetValue("data_stream.namespace")
if err != nil {
panic(err)
}
ds.Namespace, _ = dsNamespace.(string)
dsDataset, err := e.Fields.GetValue("data_stream.dataset")
if err != nil {
panic(err)
}
ds.Dataset, _ = dsDataset.(string)
m := &messages.Event{
Timestamp: timestamppb.New(e.Timestamp),
Metadata: meta.GetStructValue(),
Fields: fields.GetStructValue(),
Source: source,
DataStream: ds,
}
b, err := m.MarshalVT()
if err != nil {
panic(err)
}
newMessage := &messages.Event{}
err = newMessage.UnmarshalVT(b)
if err != nil {
panic(err)
}
be := &beat.Event{}
be.Timestamp = newMessage.Timestamp.AsTime()
be.Fields = helpers.AsMap(newMessage.Fields)
be.Meta = helpers.AsMap(newMessage.Metadata)
} |
Probably not. |
We could also consider just serializing |
Spoke about this today, and decided the shipper should just accept the same JSON that an input would be writing to ES without the shipper. This would give us an event format like the following: message Event {
// JSON serialized event to be written to ES.
oneof event {
bytes json = 1;
}
} This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today. We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors. |
@leehinman @cmacknz now that we have measured the performances differences: #288 (comment) |
Is there an active PR for this change? |
There's no open PR yet, we still need to follow through and make the change in #288 (comment). @jlind23 we can keep this open but move it to a future sprint. I will edit the description with the latest information. I don't see a point in opening a separate implementation issue since this one has all the history already. |
Edit: the path forward is described in #288 (comment)
We should change the message definition to simply wrap the JSON serialized event that would be written to Elasticsearch. We will use a oneof type to allow for alternate serialization formats in the future, in particular the alternate vtprotobuf implementation appears to be more efficient than JSON and is the first candidate for an alternate format.
Original Description
This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.
We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.
The shipper protocol currently only accepts events serialized to a customized version of the google.protobuf.Struct type. The intent is to allow more efficiently serializing frequently used types like timestamps and make processors easier to write and validate given the complete set of types they must operate on are known ahead of time.
We currently define both the message metadata and fields as our messages.Struct type https://github.com/elastic/elastic-agent-shipper-client/blob/1fbbb05f0b174053a5b160cdd5836eaed430cdbd/api/messages/publish.proto#L39-L42
Given that most processes that will use the shipper are currently designed to serialize their internal event representations to JSON for direct ingestion by Elasticsearch, we should evaluate whether there is a noticeable performance hit introducing the conversion to
messages.Struct
for the shipper. Most processes using the shipper are highly optimized for serializing to JSON and may be noticeably less performant serializing tomessages.Struct
instead.Specifically, we should benchmark the performance of Filebeat ingesting events using the shipper with the
messages.Struct
type and compare it to the performance of the same setup modified to transport the event as JSON bytes directly:The text was updated successfully, but these errors were encountered: