Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.17](backport #6444) [otel/kube-stack]: Add gateway collector #6990

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Feb 24, 2025

What does this PR do?

This PR adds a new K8s deployment of the EDOT collector named "gateway". The main purpose of this new deployment is to simplify the daemonset collector configuration and unify managed/self-managed scenarios. The gateway collector configuration contains all Elastic's custom Otel components needed for the signals transformations in self-managed scenarios, which are currently configured in the daemonset collector.

image

Elastic configured components in the "Gateway" collectors (previously in the daemonset):

  • signaltometrics
  • elasticinframetrics
  • elastictrace
  • lsminterval

Another important change is that the Gateway collectors will forward the Otel data to Elasticsearch, the daemonset and cluster collectors configurations have been updated to export all collected data to the corresponding Gateway OTLP endpoint. Although the daemonset collectors are still configured to collect the auto-instrumentation OTLP data, the data is load balanced (loadbalancing exporter) to the gateway collectors based on the service name.

Additional context: https://github.com/elastic/opentelemetry-dev/issues/587

Why is it important?

The key benefit of this architecture is to decouple data collection from data transformations (e.g. APM enrichment and aggregations), for managed scenarios, users would just need to remove (or comment) the "gateway" collector configuration. Note that moving the data processing from a k8s "daemonset" to a "deployment", it eases its horizontal scaling.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

This is an automatic backport of pull request #6444 done by [Mergify](https://mergify.com).

* feat: move telemetry aggregation and forwarding to gateway

* ci: use Elastic envs in gateway

* chore: add changelog entry

* fix: format values file

* feat: add apm loadbalancing

* chore: increase resource limits

* revert resource limits increase

* chore: remove config warnings

* docs: add Gateway collectors section

* revert: enable daemonset storagechecks

* rename metrics/otel pipeline and use signaltometrics

* unify k8s and host metrics pipelines

* use default traceID as loadbalancing routing_key

* chore: reuse k8s integration test helpers

* format values with Helm linter

* replace loadbalancing in favor of headless otlp

* Update testing/integration/otel_helm_test.go

Co-authored-by: Panos Koutsovasilis <[email protected]>

* Update testing/integration/otel_helm_test.go

Co-authored-by: Panos Koutsovasilis <[email protected]>

* rename k8s values options helper function

* move process attributes remove processor to gateway

* add batch processor for aggregation pipeline

* enable compression for cluster otlp connections

* chore: remove elastic endpoint references

* fix: do not generate service's signals for non apm data

* Revert "fix: do not generate service's signals for non apm data"

This reverts commit ffa6620.

* fix: set agent.name as edot-collector

* fix: enable daemon hostNetwork

* set unknown as default signaltometrics agent.name resource attribute

* remove signaltometrics for metrics-only services

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
(cherry picked from commit daed81e)
@mergify mergify bot requested a review from a team as a code owner February 24, 2025 15:33
@mergify mergify bot added the backport label Feb 24, 2025
@mergify mergify bot requested review from andrzej-stencel and pchila and removed request for a team February 24, 2025 15:33
@rogercoll
Copy link
Contributor

It requires the routing connector to be backported #6996

Copy link

@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 25, 2025

💚 Build Succeeded

History

cc @rogercoll

@rogercoll rogercoll merged commit d1b3210 into 8.17 Feb 25, 2025
14 checks passed
@rogercoll rogercoll deleted the mergify/bp/8.17/pr-6444 branch February 25, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants