-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to terminate an aggregation group early in the aggregation processor #5240
Comments
@joelmarty , We originally planned for a
The idea here is that when this condition occurs, the group closes. Is this what you would like? Would you be able to create a PR with this feature? We can help if you need some pointers for getting started. |
Yes, it is exactly what I would need. I am not sure I would be able to work on this feature as it would be a consequent investment of time, but I can try. I am interested in getting opinions on the options I provided if I were to work on this. |
Regarding the options, let me see if I understand them:
I think you are wanting to use the existing I'm not sure what you mean by the tagging. Can you elaborate on this?
I think you may be looking for something slightly different than the original Concluding when a group has met a condition may be harder because expressions only work on events. But, the event is not created until the group is completed. Regarding working on these, implementing a |
No, one of the solutions I was proposing is to add both a "tag generator" expression that adds a tag to the The other solution I was proposing to use a more generic |
Is your feature request related to a problem? Please describe.
I have a pipeline to ingest logs in opensearch, and I use the aggregation processor with the
put_map
action.At the moment, the only way a group can close with this action is to wait for the
group_duration
to expire.That means that all records that have been merged but whose group is not yet closed still lives in memory in the data-prepper nodes.
For high throughput or high latency pipeline where you have to specify a large
group_duration
, or both, that means a lot of memory will be wasted on already merged records that are just waiting for the expiration of the group.There should be a way to terminate a group and flush the result to the next processor or sink if you know you do not need to wait.
Describe the solution you'd like
The solution could work in two steps:
The pipeline configuration could look like:
Describe alternatives you've considered (Optional)
Other option: add a
close_when
expression common to allAggregateAction
that provides the custom expression that guards the closure of the group.This expression can be evaluated when
AggregateGroupManager.getGroupsToConclude()
is called, so the changes inAggregateProcessor
are minimal.Additional context
The aggregate processor first checks for groups to conclude and then processes the current batch. This logic should be reversed so the events are flushed immediately after the aggregation.
The text was updated successfully, but these errors were encountered: