Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for variables in outputs and default provider #6602

Merged
merged 17 commits into from
Feb 7, 2025

Conversation

blakerouse
Copy link
Contributor

@blakerouse blakerouse commented Jan 24, 2025

What does this PR do?

Adds support for context variables (ONLY) in outputs. Adds support for a default provide prefix to be defined for variables (default is env). This provides support for the original ${ES_PASSWORD} when used in outputs to still work as it will automatically map that to ${env.ES_PASSWORD} and it will get resolved for the context variables the same as it was being done by go-ucfg.

This includes an improvement to how variables are observed in the composable controller and how it is used by the coordinator. Now when a set of observable's are passed to the composable controller it will return the current set of variables after the debounce time, this ensures that before the variables are substituted that it is using the latest set of variables. Without this change running would always show an error at first with ${env.ES_PASSWORD} is an unknown variable and then less than a few milliseconds it would find it. This change removes that behavior and is able to find the variable on initial render.

Why is it important?

This allows the ability for all context provides to be able used in the outputs configuration. This is useful for using say the kubernetes_secrets provider in the policy for credentials or the new filesource provider to get a value from a file into the outputs. This is done in a way to not break compatibility with ${ES_PASSWORD} that could be used in outputs when it was rendered by go-ucfg, by the addition of a default provider setting (which defaults to env).

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test (great coverage in unit tests)

Disruptive User Impact

None.

How to test this PR locally

Replace a field in the elastic-agent.yml with a variable like from the environment provider.

outputs:
  default:
    type: elasticsearch
    hosts: [127.0.0.1:9200]
    username: "${ES_USER}"
    password: "${ES_PASSWORD}"
    preset: balanced

Run with the variables.

ES_USER=elastic ES_PASSWORD=password ./elastic-agent run -e

Observe that the ES_USER and ES_PASSWORD are substituted.

Related issues

@blakerouse blakerouse added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jan 24, 2025
@blakerouse blakerouse self-assigned this Jan 24, 2025
Copy link
Contributor

mergify bot commented Jan 24, 2025

This pull request does not have a backport label. Could you fix it @blakerouse? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 24, 2025

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 24, 2025
@blakerouse blakerouse changed the title Add support for variables in outputs and default provider. Add support for variables in outputs and default provider Jan 25, 2025
@blakerouse blakerouse marked this pull request as ready for review January 25, 2025 14:50
@blakerouse blakerouse requested a review from a team as a code owner January 25, 2025 14:50
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm self-requested a review January 30, 2025 16:15
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't manage to get past halfway through the PR so this is not a full review but I already have a couple of comments that can be shared

Comment on lines +86 to +101
"basic no provider var": {
input: NewKey("outputs", NewDict([]Node{
NewKey("default", NewDict([]Node{
NewKey("key", NewStrVal("${name}")),
})),
})),
expected: NewDict([]Node{
NewKey("default", NewDict([]Node{
NewKey("key", NewStrVal("value1")),
})),
}),
vars: mustMakeVarsWithDefault(map[string]interface{}{
"var1": map[string]interface{}{
"name": "value1",
},
}, "var1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the introduction of the default provider correctly, shouldn't this case test that ${var1} is resolved as ${default_provider.var1} if default_provider is passed to mustMakeVarsWithDefault which then contains the variable ?
I understand that technically it's not different since those are all map[string]any but it's closer to the actual usage for env vars (using var1 as default provider here is a bit confusing since that is already used as variable name elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the confusion. It is testing that it is name is resolved to var1.name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that var1 looks to me as a variable name rather than a provider name. Here it is used as a default provider and I find name a bit misleading. It may be just my own bias though since it's all strings it doesn't matter for testing the functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is misleading name. provider1 would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because most of these tests are testing different providers and each provider is named var1 and the other var2. Just keeping the same naming.

Comment on lines 260 to 266
if test.Error {
assert.Error(t, err)
} else if test.NoMatch {
assert.Error(t, ErrNoMatch, err)
assert.ErrorIs(t, err, ErrNoMatch)
} else {
require.NoError(t, err)
assert.Equal(t, test.Result, res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this whole block would probably be easier to read as an ErrorAssertionFunc member in the testcase struct, so input and error assertions can be kept together instead of signaling with a boolean

@@ -276,3 +294,15 @@ func varPrefixMatched(val string, key string) bool {
s := strings.SplitN(val, ".", 2)
return s[0] == key
}

func maybeAddDefaultProvider(val string, defaultProvider string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a general solution standpoint, I assumed that the defaultProvider would come into play when resolving variables rather than when adding values: I was expecting that the resolver would first try resolving an expression like ${somevar.somefield.somestuff} "as-is" and only in case of failure it would try to solve again by adding the default provider and resolving ${default_provider.somevar.somefield.somestuff}.

What is being done here is modifying variable names that do not contain "." by prepending defaultProvider. when adding/replacing a variable and it only works for expressions that do not already contain "." which is a strong limitation.

Are there specific concerns/constraints that would make the current change the preferred solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be done at resolve time because we need it to be known at Observe time. Without it at Observe time the code that was added to run only the providers that are referenced will not work. The issue is also present in the case that the variable references a fetch provider, as we will not be able to know if that variable is truly resolvable until that time occurs to resolve it.

If an expression contains . the text before the first . is always the provider, always! There is no other way to do this, because of the issues I placed above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an expression contains . the text before the first . is always the provider, always! There is no other way to do this, because of the issues I placed above.

Is this documented somewhere? I didn't see it in the PR changes but I didn't look at the whole ast package so I may have missed it. The reason I asked the question is that I suspected that there was a constraint somewhere, it wasn't obvious which and probably other people reading this snippet may have the same doubt.
It could be worth adding a comment on the function itself explaining that it will prepend the defaultProvider only on flat, single-token variable expression like ${somevar}

Copy link
Contributor

@swiatekm swiatekm Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change the name to addDefaultProviderIfNotPresent? That's clearer to me at first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a doc string to the function to explain this better so it is known.

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the first pass of review. I added some more questions about the concurrent part of the change, specifically about the handleObserved(), observedResult in internal/pkg/composable/controller.go and surrounding code.

A couple of questions about providers that used to exit almost immediately and now block on ctx.Done: is this necessary to implement the functionality or is it something that was fixed "while we're in there" ?

// this new set of vars replaces that set if that current
// value has not been read then it will result in vars state being incorrect
select {
case <-c.ch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this draining for c.ch added here when it was already present in previous code (line 297 after the change) ?
Is it necessary to drain this before pushing vars in observedResult channel?
Is it still necessary to drain again in line 297 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment right above this says exactly why it is required. It has to be here and is different then the one on line 297, because this drains the channel and then sends it over the observedResult where the other drains the channel and then sends it over that same channel.

Copy link
Member

@pchila pchila Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see what the code is doing, what is not obvious is why it's doing it: who is supposed to read from the observedResult ?
The drain on line 297 is supposed to remove a stale value before pushing the latest but again we don't know who reads from that and why it is a different channel from the observedResult

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a lot more comments in this flow, that should provide a good explanation on what it is doing.

//
// This is a blocking call until the observation is handled and the most recent
// set of variables are returned the caller in the case a change occurred. If no change occurred then
// it will return with a nil array. If changed the current observed state of variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "change" calculated? Is it intended as a difference from the last set of variable values that have been provided or just retrieved and observed through previous calls to handleObserved() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the case that the observed state changes, not the values in each provider that will come over the Watch channel. But if the change happens to occur during the debounce window then it will have that change in the Observed state.

// Observe sends the observed variables from the AST to the controller.
// Observe instructs the controller to enable the observed providers.
//
// This is a blocking call until the observation is handled and the most recent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an upper limit to the time Observe() will take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It either resolves or the context is closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on the critical path of agent startup?
My main worry here is that we could end up with an agent stuck in the initialization phase similar to when the coordinator is waiting for variables before processing configuration changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry there is a maximum here which is 500ms. I added that to the docstring. It will never fully block indefinitely. It is in the path of variable resolution all previous logic in that area applies.

Comment on lines +261 to +269
default:
if !observed {
vars, err := c.Observe(timeoutCtx, tt.observed)
require.NoError(t, err)
if vars != nil {
setVars = vars
}
observed = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default block is meant to run at every iteration of the for loop where <-timeoutCtx.Done() and vars := <-c.Watch() would block?
If this has to happen only once could it be called outside of the loop as it was done previously ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only called once, but cannot be done outside of the loop because <-c.Watch() must have a reader. To ensure that it has a reader and is not blocked it does it this way. This removes the need to add locks around setVars.

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a partial review, gotta say mixing all three changes in a single PR makes it harder to review.

  • The variables in outputs part looks fine.
  • The default provider part looks mostly fine, but like @pchila I'm curious why the substitution doesn't happen at resolution.
  • The change to composable manager logic I'm not sure about. Could we get this one in a separate PR?

@blakerouse
Copy link
Contributor Author

  • The change to composable manager logic I'm not sure about. Could we get this one in a separate PR?

It is required for variable substitution works correct in the outputs. The reason I keep it inside of this PR is because it is directly related to the behavior, and I think it provides more context on why it is done.

@jlind23
Copy link
Contributor

jlind23 commented Feb 4, 2025

@pchila @swiatekm Could you please review this again now that Blake added some changes.

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse thanks for integrating code review feedback: I feel the added comments help with understanding the reason for parts of this change.
I have a couple of follow-up questions about the channel usage that I asked under the original discussions.
Could you please have a look at those?

@pierrehilbert pierrehilbert added the backport-9.0 Automated backport to the 9.0 branch label Feb 5, 2025
dependabot bot and others added 4 commits February 5, 2025 13:49
…ts/processor/lsmintervalprocessor from 0.3.0 to 0.4.0 (elastic#6533)

* build(deps): bump github.com/elastic/opentelemetry-collector-components/processor/lsmintervalprocessor

Bumps [github.com/elastic/opentelemetry-collector-components/processor/lsmintervalprocessor](https://github.com/elastic/opentelemetry-collector-components) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/elastic/opentelemetry-collector-components/releases)
- [Commits](elastic/opentelemetry-collector-components@processor/lsmintervalprocessor/v0.3.0...processor/lsmintervalprocessor/v0.4.0)

---
updated-dependencies:
- dependency-name: github.com/elastic/opentelemetry-collector-components/processor/lsmintervalprocessor
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update NOTICE.txt

* Update otel README.md

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
@blakerouse blakerouse requested a review from a team as a code owner February 5, 2025 18:49
@blakerouse
Copy link
Contributor Author

@pchila I have updated the PR with your latest revisions and have answered all your remaining questions. Thanks!

Copy link
Contributor

mergify bot commented Feb 5, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b variables-outputs upstream/variables-outputs
git merge upstream/main
git push upstream variables-outputs

@blakerouse blakerouse removed request for a team and andrzej-stencel February 7, 2025 02:22
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@blakerouse blakerouse merged commit b59d51a into elastic:main Feb 7, 2025
14 checks passed
@blakerouse blakerouse deleted the variables-outputs branch February 7, 2025 15:15
mergify bot pushed a commit that referenced this pull request Feb 7, 2025
Adds support for context variables (ONLY) in outputs. Adds support for a default provide prefix to be defined for variables (default is env). This provides support for the original ${ES_PASSWORD} when used in outputs to still work as it will automatically map that to ${env.ES_PASSWORD} and it will get resolved for the context variables the same as it was being done by go-ucfg.

This includes an improvement to how variables are observed in the composable controller and how it is used by the coordinator. Now when a set of observable's are passed to the composable controller it will return the current set of variables after the debounce time, this ensures that before the variables are substituted that it is using the latest set of variables. Without this change running would always show an error at first with ${env.ES_PASSWORD} is an unknown variable and then less than a few milliseconds it would find it. This change removes that behavior and is able to find the variable on initial render.

(cherry picked from commit b59d51a)
mergify bot pushed a commit that referenced this pull request Feb 7, 2025
Adds support for context variables (ONLY) in outputs. Adds support for a default provide prefix to be defined for variables (default is env). This provides support for the original ${ES_PASSWORD} when used in outputs to still work as it will automatically map that to ${env.ES_PASSWORD} and it will get resolved for the context variables the same as it was being done by go-ucfg.

This includes an improvement to how variables are observed in the composable controller and how it is used by the coordinator. Now when a set of observable's are passed to the composable controller it will return the current set of variables after the debounce time, this ensures that before the variables are substituted that it is using the latest set of variables. Without this change running would always show an error at first with ${env.ES_PASSWORD} is an unknown variable and then less than a few milliseconds it would find it. This change removes that behavior and is able to find the variable on initial render.

(cherry picked from commit b59d51a)
blakerouse added a commit that referenced this pull request Feb 11, 2025
)

Adds support for context variables (ONLY) in outputs. Adds support for a default provide prefix to be defined for variables (default is env). This provides support for the original ${ES_PASSWORD} when used in outputs to still work as it will automatically map that to ${env.ES_PASSWORD} and it will get resolved for the context variables the same as it was being done by go-ucfg.

This includes an improvement to how variables are observed in the composable controller and how it is used by the coordinator. Now when a set of observable's are passed to the composable controller it will return the current set of variables after the debounce time, this ensures that before the variables are substituted that it is using the latest set of variables. Without this change running would always show an error at first with ${env.ES_PASSWORD} is an unknown variable and then less than a few milliseconds it would find it. This change removes that behavior and is able to find the variable on initial render.

(cherry picked from commit b59d51a)

Co-authored-by: Blake Rouse <[email protected]>
blakerouse added a commit that referenced this pull request Feb 12, 2025
)

Adds support for context variables (ONLY) in outputs. Adds support for a default provide prefix to be defined for variables (default is env). This provides support for the original ${ES_PASSWORD} when used in outputs to still work as it will automatically map that to ${env.ES_PASSWORD} and it will get resolved for the context variables the same as it was being done by go-ucfg.

This includes an improvement to how variables are observed in the composable controller and how it is used by the coordinator. Now when a set of observable's are passed to the composable controller it will return the current set of variables after the debounce time, this ensures that before the variables are substituted that it is using the latest set of variables. Without this change running would always show an error at first with ${env.ES_PASSWORD} is an unknown variable and then less than a few milliseconds it would find it. This change removes that behavior and is able to find the variable on initial render.

(cherry picked from commit b59d51a)

Co-authored-by: Blake Rouse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-9.0 Automated backport to the 9.0 branch Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants