Allow users to override the user-agent #4015

gregschohn · 2024-01-24T15:37:41Z

Is your feature request related to a problem? Please describe.

We need to differentiate requests from an instance of Data Prepper that our solution is using and from the rest of a cluster's clients.

To migrate data, our solution uses does a bulk move of data from a source cluster to a target cluster. Independently, individual requests are recorded from the source cluster and replayed to the target to both keep the target cluster in sync and also to compare the behavior of the two clusters.

When we capture traffic, depending on the order that a customer chooses to perform each step, there may be overlap with the Data Prepper requests to the source. We'd like to be able to mask out those requests from our replay. Those would at the very least, create more noisy data for users and could cause confusion as they would see updates replayed on already existing data that was migrated with Data Prepper. Allowing the customer/us to set a unique value that we can easily filter on the capture side would eliminate this problem and be more more efficient (much lower costs).

Describe the solution you'd like
I'd like to have a command line flag to set the user-agent HTTP header for all requests that Data Prepper sends. A default value of something different than the ES/OS user-agent may be beneficial too.

Describe alternatives you've considered (Optional)
Other HTTP header values could work too, but user-agent seems like it could be the most natural and easy to explain one. For our greater solution, dealing with the duplicate data better is possible, but it is 1) considerable effort to mitigate, 2) still will be expensive as we aren't able to remove the data passively.

Additional context
N/A

dlvenable · 2024-01-30T20:39:36Z

@gregschohn , Thanks for this request. Do you want this configurable for both the opensearch sink and the opensearch source?

Do you have a proposal on how the user would configure this?

gregschohn · 2024-02-08T20:30:49Z

A command line argument or a setting in the pipeline file would work (so would an environment variable, but that seems like it wouldn't be the best experience for users in general). We'll want the same user-agent for all requests, so a static value loaded once is fine.

Our needs at this time are just for the source - so you can use one user-agent configuration for both or separate ones. We don't have an opinion on that detail.

gregschohn added the untriaged label Jan 24, 2024

github-project-automation bot added this to Data Prepper Tracking Board Jan 24, 2024

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Jan 24, 2024

dlvenable added enhancement New feature or request and removed untriaged labels Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to override the user-agent #4015

Allow users to override the user-agent #4015

gregschohn commented Jan 24, 2024

dlvenable commented Jan 30, 2024

gregschohn commented Feb 8, 2024

Allow users to override the user-agent #4015

Allow users to override the user-agent #4015

Comments

gregschohn commented Jan 24, 2024

dlvenable commented Jan 30, 2024

gregschohn commented Feb 8, 2024