[8.x](backport #6585) Change the default gRPC port to 0 when in a container #6597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Changes the default port used when agent runs in a container to port 0. There is at least one user whose rollout of Elastic Agent is experiencing collisions with another application on the same port (nobody else could ever pick 6789 as a default port surely...) and I thought this would be straight forward to fix quickly. I was wrong. I expected ~2 hours and spent 3 days:
elastic-agent/_meta/config/elastic-agent.docker.yml.tmpl
Lines 88 to 89 in 9b8a25f
elastic-agent/deploy/helm/elastic-agent/templates/agent/k8s/_secret.tpl
Line 5 in 9b8a25f
run()
function that is the entrypoint of the agent already has a hook for overriding configurationelastic-agent/internal/pkg/agent/cmd/run.go
Lines 426 to 428 in 4199196
elastic-agent/internal/pkg/agent/cmd/container.go
Line 769 in 4199196
elastic-agent/internal/pkg/agent/cmd/run.go
Line 415 in 9b8a25f
elastic-agent/internal/pkg/agent/application/application.go
Line 99 in 4199196
note We should eventually stop using local TCP at all for the control protocol between sub-processes, and switch to unix sockets / named pipes. The capability for this was added in #4249, but it was left disabled because endpoint-security doesn't support it yet (because the upstream gRPC C++ client doesn't support it on Windows). We have now removed endpoint-security from our containers which would allow us to switch to unix sockets there, but this change required an elastic-agent-client package update and we need to test that every client we have has it first. This was more testing effort than I wanted to take on now, but I will create a follow up issue to do this.
Why is it important?
This automatically avoids port collisions between Elastic Agents using
hostNetwork: true
on Kubernetes as our DaemonSet does by default and other applications (or other Elastic Agents).Disruptive User Impact
None, but in case I'm wrong about this I made it possible to choose a specific port with an environment variable.
How to test this PR locally