Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod name to context token and logging #11532

Merged
merged 1 commit into from
Oct 25, 2023
Merged

Add pod name to context token and logging #11532

merged 1 commit into from
Oct 25, 2023

Conversation

adleong
Copy link
Member

@adleong adleong commented Oct 24, 2023

When the destination controller logs about receiving or sending messages to a data plane proxy, there is no information in the log about which data plane pod it is communicating with. This can make it difficult to diagnose issues which span the data plane and control plane.

We add a pod field to the context token that proxies include in requests to the destination controller. We add this pod name to the logging context so that it shows up in log messages. In order to accomplish this, we had to plumb through logging context in a few places where it previously had not been. This gives us a more complete logging context and more information in each log message.

An example log message with this fuller logging context is:

time="2023-10-24T00:14:09Z" level=debug msg="Sending destination add: add:{addrs:{addr:{ip:{ipv4:183762990}  port:8080}  weight:10000  metric_labels:{key:\"control_plane_ns\"  value:\"linkerd\"}  metric_labels:{key:\"deployment\"  value:\"voting\"}  metric_labels:{key:\"pod\"  value:\"voting-7475cb974c-2crt5\"}  metric_labels:{key:\"pod_template_hash\"  value:\"7475cb974c\"}  metric_labels:{key:\"serviceaccount\"  value:\"voting\"}  tls_identity:{dns_like_identity:{name:\"voting.emojivoto.serviceaccount.identity.linkerd.cluster.local\"}}  protocol_hint:{h2:{}}}  metric_labels:{key:\"namespace\"  value:\"emojivoto\"}  metric_labels:{key:\"service\"  value:\"voting-svc\"}}" addr=":8086" component=endpoint-translator context-ns=emojivoto context-pod=web-767f4484fd-wmpvf remote="10.244.0.65:52786" service="voting-svc.emojivoto.svc.cluster.local:8080"

Note the context-pod field.

Additionally, we have tested this when no pod field is included in the context token (e.g. when handling requests from a pod which does not yet add this field) and confirmed that the context-pod log field is empty, but no errors occur.

@adleong adleong requested a review from a team as a code owner October 24, 2023 22:31
@adleong adleong merged commit 4e7a588 into main Oct 25, 2023
35 checks passed
@adleong adleong deleted the alex/token-logging branch October 25, 2023 20:48
mateiidavid added a commit that referenced this pull request Oct 27, 2023
This edge release includes a fix for the `ServiceProfile` CRD resource schema.
The schema incorrectly required `not` response matches to be arrays, while the
in-cluster validator parsed `not` response matches as objects. In addition, an
issues has been fixed in `linkerd profile`. When used with the `--open-api`
flag, it would not strip trailing slashes when generating a resource from
swagger specifications.

* Fixed an issue where trailing slashes wouldn't be stripped when generating
  `ServiceProfile` resources through `linkerd profile --open-api` ([#11519])
* Fixed an issue in the `ServiceProfile` CRD schema. The schema incorrectly
  required that a `not` response match should be an array, which the service
  profile validator rejected since it expected an object. The schema has been
  updated to properly indicate that `not` values should be an object ([#11510];
  fixes [#11483])
* Improved logging in the destination controller by adding the client pod's
  name to the logging context. This will improve visibility into the messages
  sent and received by the control plane from a specific proxy ([#11532])
* Fixed an issue in the destination controller where the metadata API would not
  initialize a `Job` informer. The destination controller uses the metadata API
  to retrieve `Job` metadata, and relies mostly on informers. Without an
  initialized informer, an error message would be logged, and the controller
  relied on direct API calls ([#11541]; fixes [#11531])

[#11541]: #11532
[#11532]: #11532
[#11531]: #11531
[#11519]: #11519
[#11510]: #11510
[#11483]: #11483

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid mentioned this pull request Oct 27, 2023
mateiidavid added a commit that referenced this pull request Oct 27, 2023
This edge release includes a fix for the `ServiceProfile` CRD resource schema.
The schema incorrectly required `not` response matches to be arrays, while the
in-cluster validator parsed `not` response matches as objects. In addition, an
issues has been fixed in `linkerd profile`. When used with the `--open-api`
flag, it would not strip trailing slashes when generating a resource from
swagger specifications.

* Fixed an issue where trailing slashes wouldn't be stripped when generating
  `ServiceProfile` resources through `linkerd profile --open-api` ([#11519])
* Fixed an issue in the `ServiceProfile` CRD schema. The schema incorrectly
  required that a `not` response match should be an array, which the service
  profile validator rejected since it expected an object. The schema has been
  updated to properly indicate that `not` values should be an object ([#11510];
  fixes [#11483])
* Improved logging in the destination controller by adding the client pod's
  name to the logging context. This will improve visibility into the messages
  sent and received by the control plane from a specific proxy ([#11532])
* Fixed an issue in the destination controller where the metadata API would not
  initialize a `Job` informer. The destination controller uses the metadata API
  to retrieve `Job` metadata, and relies mostly on informers. Without an
  initialized informer, an error message would be logged, and the controller
  relied on direct API calls ([#11541]; fixes [#11531])

[#11541]: #11532
[#11532]: #11532
[#11531]: #11531
[#11519]: #11519
[#11510]: #11510
[#11483]: #11483

Signed-off-by: Matei David <[email protected]>
adleong added a commit that referenced this pull request Nov 16, 2023
When the destination controller logs about receiving or sending messages to a data plane proxy, there is no information in the log about which data plane pod it is communicating with.  This can make it difficult to diagnose issues which span the data plane and control plane.

We add a `pod` field to the context token that proxies include in requests to the destination controller.  We add this pod name to the logging context so that it shows up in log messages.  In order to accomplish this, we had to plumb through logging context in a few places where it previously had not been.  This gives us a more complete logging context and more information in each log message.

An example log message with this fuller logging context is:

```
time="2023-10-24T00:14:09Z" level=debug msg="Sending destination add: add:{addrs:{addr:{ip:{ipv4:183762990}  port:8080}  weight:10000  metric_labels:{key:\"control_plane_ns\"  value:\"linkerd\"}  metric_labels:{key:\"deployment\"  value:\"voting\"}  metric_labels:{key:\"pod\"  value:\"voting-7475cb974c-2crt5\"}  metric_labels:{key:\"pod_template_hash\"  value:\"7475cb974c\"}  metric_labels:{key:\"serviceaccount\"  value:\"voting\"}  tls_identity:{dns_like_identity:{name:\"voting.emojivoto.serviceaccount.identity.linkerd.cluster.local\"}}  protocol_hint:{h2:{}}}  metric_labels:{key:\"namespace\"  value:\"emojivoto\"}  metric_labels:{key:\"service\"  value:\"voting-svc\"}}" addr=":8086" component=endpoint-translator context-ns=emojivoto context-pod=web-767f4484fd-wmpvf remote="10.244.0.65:52786" service="voting-svc.emojivoto.svc.cluster.local:8080"
```

Note the `context-pod` field.

Additionally, we have tested this when no pod field is included in the context token (e.g. when handling requests from a pod which does not yet add this field) and confirmed that the `context-pod` log field is empty, but no errors occur.

Signed-off-by: Alex Leong <[email protected]>
@adleong adleong mentioned this pull request Nov 16, 2023
adleong added a commit that referenced this pull request Nov 16, 2023
This stable release improves observability for the control plane by adding
additional logging to the destination controller and by adding histograms which
can detect Kubernetes informer lag. It also adds the ability to configure
protocol detection.

* Improved logging in the destination controller by adding the client pod's
  name to the logging context. This will improve visibility into the messages
  sent and received by the control plane from a specific proxy ([#11532])
* helm: Introduce configurable values for protocol detection ([#11536])
* Fixed an issue where the Destination controller could stop processing service
  profile updates, if a proxy subscribed to those updates stops reading them;
  this is a followup to the issue [#11491] fixed in [stable-2.14.2] ([#11546])
* In the Destination controller, added informer lag histogram metrics to track
  whenever the Kubernetes objects watched by the controller are falling behind
  the state in the kube-apiserver ([#11534])
* proxy: Fix grpc_status metric labels for inbound traffic

[stable-2.14.2]: https://github.com/linkerd/linkerd2/releases/tag/stable-2.14.2
[#11532]: #11532
[#11536]: #11536
[#11546]: #11546
[#11534]: #11534

---------

Signed-off-by: Matei David <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Co-authored-by: Matei David <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants