Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting guide update #720

Merged
merged 2 commits into from
Mar 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion documentation/Troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ This section provides troubleshooting information for Kubemarine and Kubernetes
- [Upgrade Procedure to v1.28.3 Fails on ETCD Step](#upgrade-procedure-to-v1283-fails-on-etcd-step)
- [kubectl logs and kubectl exec fail](#kubectl-logs-and-kubectl-exec-fail)
- [OpenSSH server becomes unavailable during cluster installation on Centos9](#openssh-server-becomes-unavailable-during-cluster-installation-on-centos9)
- [Packets Loss During the Transmission Between Nodes](#packets-loss-during-the-transmission-between-nodes)

# Kubemarine Errors

Expand Down Expand Up @@ -2012,4 +2013,23 @@ Failed to start OpenSSH server daemon.
- Ensure that critical services such as OpenSSH are upgraded when their dependencies, like OpenSSL, are updated.
- Test updates in a staging environment to catch compatibility issues before deployment.

**Note**: Not applicable.
**Note**: Not applicable.

## Packets loss during the transmission between nodes

### Description
Packets are lost during the transmission between nodes that are located in different subnets. It appears in retries of TCP sessions or inability to get the UDP packets in case of high network load. The root cause is in the IaaS level routers performance. Basically, routing works slower than switching.

### Alerts
Not applicable.

### Stack trace(s)
Not applicable.

### How to solve
Reschedule the pods in cluster to displace the pods that create the significant network load to the nodes in the same subnet OR move all of the nodes in the cluster in the same subnet

### Recommendations
- Avoid routing between nodes in the same cluster in case of high network load

**Note**: Not applicable.