-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Dogstatsd in containerized environments #195
Comments
We're running several Kubernetes clusters, and running Datadog as a Daemonset. We have it configured to listen to Then we have each host configured with a Link-local address, say We do the same thing with Consul, which we have an instance running on each node. This takes a bit of extra configuration of the nodes, and a bit of reconfiguring of each app running, but it's working pretty well for us. |
We're looking at sidecar-ing https://github.com/DataDog/docker-dd-agent/tree/master/dogstatsd because we have some very high volume dogstatsd submitters, and that will allow for us to account for any dogstatsd overhead associated with scheduling a container. edit: and we do see overhead, one of our apps submits such a volume of metrics to dogstatsd we routinely see cpu utilization north of 60%. |
Dogstatsd in containerized environments
Overview
This RFC tries to address the use case of emitting custom metrics from containerized applications via dogstatsd. Our goal is to support a broad base of orchestrators and environments.
Problem
Clients can use one of several libraries to send UDP packets to dogstatsd (running alongside an agent or standalone), for their custom metrics to be forwarded.
For metric transmission to work, we need:
Nowadays, the recommended deployment is to bind to the host’s 8125 port and send the UDP packets there. But other deployment scenarios are requested (see section Dogstatsd deployment scenarios).
As the host tag (if not already present) is added to the metrics by dogstatsd, we need to talk to the host’s dogstatsd and not a random one on the cluster. This is why using load-balanced IPs or one dogstatsd/cluster is not currently supported.
Client using the trace agent will want a consistent behavior for both dogstatsd and APM. The solution should work for both. Currently, the trace libraries all have specific host & port options and recommend host port binding.
Officially supported libraries:
Clients may also use other community developed statsd libraries, maintaining compatibility with them is preferable.
Constraints
Dogstatsd deployment scenarios
Case A: One dogstatsd per host, local traffic only
Our recommended deployment so far is to run dogstatsd (standalone or with an agent), either on a container or directly in the host system. In both cases, we bind to port 8125 on the host IP. All major orchestrators support this, and we should enable it in our official installation methods:
Pros:
Cons:
Case B: A load-balanced dogstatsd pool for the cluster
In this scenario, several dogstatsd instances are load-balanced behind a common IP/DNS name. They can be running with agents, or on dedicated containers.
Pros:
Cons:
There are three ways to address this:
Case C: Dogstatsd as a sidecar container in every pod
Kubernetes pods & Rancher services allow you to run a container alongside your application, and many k8s users want this solution. This could allow container-specific tags to be added, but we could also implement that on the common dogstatsd by matching the originating IP with the orchestrators’ information.
Possible solutions
Binding dogstatsd to the host IP and using the default network gateway
On vanilla docker, the containers use bridge networking : every container uses the host as a default gateway to reach the external network. This is why the datadogpy library parses /proc/net/route to determine the host’s IP address.
Pros:
Cons:
Pass host & IP via environment variables and modify libraries
Container Network Interfaces (CNIs) adoption is rapidly rising and assuming the host is the default network gateway does not work on these systems. While we can design CNI/orchestrator specific fixes, implementing and maintaining them in every client library would not be ideal.
The most maintainable solution is to separate the detection logic from the client libraries and pass the IP and port in two environment variables in the using application’s container. If not present, localhost will be used by default.
Pros:
Cons:
Recommended client implementation logic
Unfortunately, UDP packet drop can’t be detected, so a try-failover scenario can’t be implemented
Provide a dogstatsd-proxy system
We could provide a proxy listening on localhost:8125 and forwarding metrics to a dogstatsd server, either specified via the environment variables (see above), or through custom specific logic.
The initial implementation could use socat, and allow several use cases:
Later on, we could implement a proxy mode into dogstatsd (once version 6 in Go is production-ready), that could allow:
This proxy would be distributed as:
Pros:
Cons:
eBPF
TBD
The text was updated successfully, but these errors were encountered: