Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #6585) Change the default gRPC port to 0 when in a container #6597

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jan 24, 2025

What does this PR do?

Changes the default port used when agent runs in a container to port 0. There is at least one user whose rollout of Elastic Agent is experiencing collisions with another application on the same port (nobody else could ever pick 6789 as a default port surely...) and I thought this would be straight forward to fix quickly. I was wrong. I expected ~2 hours and spent 3 days:

  1. It looked like we already have a separate default config file for Docker containers I could just update:
    # # port for the GRPC server that spawned processes connect back to.
    # port: 6789
  2. This didn't work because the Helm chart defines its own configuration separately. I also didn't want to define this more than once because we'll probably change it again later (see note below):
  3. The run() function that is the entrypoint of the agent already has a hook for overriding configuration
    if override != nil {
    override(cfg)
    }
    that is already used in the container command to override the logging configuration
    func logToStderr(cfg *configuration.Configuration) {
    but of course updating this to change the gRPC port also didn't work.
  4. It turns out we load the configuration from disk at least twice at startup, once in
    cfg, err := configuration.NewFromConfig(rawConfig)
    and again in
    cfg, err := configuration.NewFromConfig(rawConfig)
    The second use didn't apply the overrides, the overrides only applied for the first case because that's when we created our logger and the only current overrides are for logging 🤦. Probably we could tidy this up but I've already spent enough time on this one.

note We should eventually stop using local TCP at all for the control protocol between sub-processes, and switch to unix sockets / named pipes. The capability for this was added in #4249, but it was left disabled because endpoint-security doesn't support it yet (because the upstream gRPC C++ client doesn't support it on Windows). We have now removed endpoint-security from our containers which would allow us to switch to unix sockets there, but this change required an elastic-agent-client package update and we need to test that every client we have has it first. This was more testing effort than I wanted to take on now, but I will create a follow up issue to do this.

Why is it important?

This automatically avoids port collisions between Elastic Agents using hostNetwork: true on Kubernetes as our DaemonSet does by default and other applications (or other Elastic Agents).

Disruptive User Impact

None, but in case I'm wrong about this I made it possible to choose a specific port with an environment variable.

How to test this PR locally

INSTANCE_PROVISIONER=kind SNAPSHOT=true GOTEST_FLAGS="-test.run TestKubernetesAgentHelm" TEST_PLATFORMS="kubernetes/arm64/1.31.0/basic" mage -v integration:kubernetes
```<hr>This is an automatic backport of pull request #6585 done by [Mergify](https://mergify.com).

* Override the container command's gRPC port to 0 by default.

* Test that two containers don't have port collisions on the same host.

* Move container override next to regular defaults.

* Add changelog.

* Fix application.New call in unit test.

* Silence lint warning.

(cherry picked from commit a61ad8c)
@mergify mergify bot added the backport label Jan 24, 2025
@mergify mergify bot requested a review from a team as a code owner January 24, 2025 15:19
@mergify mergify bot removed the request for review from a team January 24, 2025 15:19
@mergify mergify bot requested review from michalpristas and pchila January 24, 2025 15:19
@github-actions github-actions bot added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jan 24, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@cmacknz cmacknz enabled auto-merge (squash) January 24, 2025 17:12
@cmacknz cmacknz merged commit bfdeedd into 8.x Jan 24, 2025
15 checks passed
@cmacknz cmacknz deleted the mergify/bp/8.x/pr-6585 branch January 24, 2025 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants