k8s-pod-restart-info-collector is a simple K8s customer controller that watches for Pods changes and collects K8s Pod restart reasons, logs, and events to Slack channel when a Pod restarts.
For more information, see the blog on Medium: Automated Troubleshooting of Kubernetes (K8s) Pods Issues
This project is actively used and maintained by Airwallex DevOps team.
Here are two Slack screenshots of the example messages.
As shown below, by clicking “Show more”, we can see the Reason, “Pod Status”, “Pod Events”, “Node Status and Events”, and “Pod Logs Before Restart”.
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T0SEVS2SG/xxxxx
go run .
Replace the slackWebhookUrl
, clusterName
and slackChannel
.
helm upgrade --install k8s-pod-restart-info-collector ./helm \
--set slackWebhookUrl="https://hooks.slack.com/services/Change-Me" \
--set clusterName="Change-Me" \
--set slackChannel="Change-Me"
Check Commands:
# check commands
kubectl get pod,deploy,sa,secret -l app.kubernetes.io/instance=k8s-pod-restart-info-collector
helm status k8s-pod-restart-info-collector
helm get values k8s-pod-restart-info-collector
helm get manifest k8s-pod-restart-info-collector
helm get all k8s-pod-restart-info-collector
# see logs
kubectl logs deployment/k8s-pod-restart-info-collector -f
Run a debug-pod
to verify the collector:
kubectl run debug-pod --image=alpine -- date;sleep 30
kubectl get pod debug-pod -w
To uninstall/delete the k8s-pod-restart-info-collector
helm release:
helm uninstall k8s-pod-restart-info-collector
The command removes all the Kubernetes components associated with the chart and deletes the release.
Name | Description | Value |
---|---|---|
clusterName |
K8s cluster name (Display on slack message) | required |
slackWebhookUrl |
Slack webhook URL | required |
slackUsername |
Slack username (Display on slack message) | default: "k8s-pod-restart-info-collector" |
slackChannel |
Slack channel name | default: "restart-info-nonprod" |
muteSeconds |
The time to mute duplicate pod alerts | default: "600" |
ignoredNamespaces |
A comma-separated list of namespaces to ignore | default: "" |
ignoredPodNamePrefixes |
A comma-separated list of pod name prefixes to ignore | default: "" |
-
When will the collector send Pod restart messages to Slack channel?
When a Pod restarts. However, if one of the following conditions is met, the messages are not sent.
- Pod restartCount > 30
- In the previous 10 minutes, the same Pod restart message was sent
-
How to customize slack channel for each pods
Adding
alert-slack-channel: "your-slack-channel-name"
to Pod annotations or labels. For example, a label:alert-slack-channel: "restart-info-nonprod"
-
How to set the kubernetes icon as the Slack profile photo?
The alert messages use a custom emoji naming
:kubernetes:
as the profile photo. You need to upload the kubernete.png to Slack to add this custom emoji. Please refer to Add custom emoji
Please refer to:
- https://github.com/kubernetes/sample-controller/blob/master/docs/controller-client-go.md
- https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md
- https://github.com/kubernetes/client-go/tree/master/examples/workqueue
Copyright [2022] [Airwallex (Hong Kong) Limited]
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.