-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caution users that they must reduce replica count by 1 when scaling in #963
Merged
Merged
Changes from 12 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
9f753ad
Caution users that they must reduce replica count by 1 when scaling in
JakeSCahill acda24b
Consistent phrasing
JakeSCahill 4b4da44
Consistent format
JakeSCahill 9939fb3
DOC-516 Improve scale and decommission docs based on CS feedback
JakeSCahill 8d9340b
Clarifications
JakeSCahill 6584ed6
Merge branch 'main' into decom
JakeSCahill 937d66a
Update caption
JakeSCahill 4c59edc
Merge branch 'decom' of https://github.com/redpanda-data/docs into decom
JakeSCahill 3ab6bf7
Clarify the purpose of decommissioning in the first sentence
JakeSCahill ede3cb5
Update k-decommission-brokers.adoc
JakeSCahill 6ec556e
Update k-nodewatcher.adoc
JakeSCahill f823ef6
Merge branch 'main' into decom
Feediver1 bdbb24b
Move legend above flowchart
JakeSCahill File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
329 changes: 215 additions & 114 deletions
329
modules/manage/pages/kubernetes/k-decommission-brokers.adoc
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,211 @@ | ||||||
= Install the Nodewatcher Controller | ||||||
:page-categories: Management | ||||||
:env-kubernetes: true | ||||||
:description: pass:q[The Nodewatcher controller is an emergency backstop for Redpanda clusters that use PersistentVolumes (PVs) for the Redpanda data directory. When a node running a Redpanda Pod suddenly goes offline, Nodewatcher detects the lost node, retains the associated PV, and removes the corresponding PersistentVolumeClaim (PVC). This workflow allows the Redpanda Pod to be rescheduled on a new node without losing critical data.] | ||||||
|
||||||
{description} | ||||||
|
||||||
:warning-caption: Emergency use only | ||||||
|
||||||
[WARNING] | ||||||
==== | ||||||
The Nodewatcher controller is intended only for emergency scenarios (for example, node hardware or infrastructure failures). *Never use the Nodewatcher controller as a routine method for removing brokers.* If you want to remove brokers, see xref:manage:kubernetes/k-decommission-brokers.adoc[Decommission brokers] for the correct procedure. | ||||||
==== | ||||||
|
||||||
:warning-caption: Warning | ||||||
|
||||||
== Why use Nodewatcher? | ||||||
|
||||||
If a worker node hosting a Redpanda Pod suddenly fails or disappears, Kubernetes might leave the associated PV and PVC in an _attached_ or _in-use_ state. Without Nodewatcher (or manual intervention), the Redpanda Pod cannot safely reschedule to another node because the volume is still recognized as occupied. Also, the default reclaim policy might delete the volume, risking data loss. Nodewatcher automates the steps needed to retain the volume and remove the stale PVC, so Redpanda Pods can move to healthy nodes without losing the data in the original PV. | ||||||
|
||||||
== How Nodewatcher works | ||||||
|
||||||
When the controller detects events that indicate a Node resource is no longer available, it does the following: | ||||||
|
||||||
- For each Redpanda Pod on that Node, it identifies the PVC (if any) the Pod was using for its storage. | ||||||
- It sets the reclaim policy of the affected PersistentVolume (PV) to `Retain`. | ||||||
- It deletes the associated PersistentVolumeClaim (PVC) to allows the Redpanda broker Pod to reschedule onto a new, operational node. | ||||||
|
||||||
[mermaid] | ||||||
.... | ||||||
flowchart TB | ||||||
%% Define classes | ||||||
classDef systemAction fill:#F6FBF6,stroke:#25855a,stroke-width:2px,color:#20293c,rx:5,ry:5 | ||||||
|
||||||
A[Node fails] --> B{Is Node<br>running Redpanda?}:::systemAction | ||||||
B -- Yes --> C[Identify Redpanda Pod PVC]:::systemAction | ||||||
C --> D[Set PV reclaim policy to 'Retain']:::systemAction | ||||||
D --> E[Delete PVC]:::systemAction | ||||||
E --> F[Redpanda Pod<br>is rescheduled]:::systemAction | ||||||
B -- No --> G[Ignore event]:::systemAction | ||||||
.... | ||||||
|
||||||
== Prerequisites | ||||||
|
||||||
- An existing Redpanda cluster in Kubernetes. | ||||||
- Sufficient RBAC permissions for Nodewatcher to read and modify PVs, PVCs, and Node resources. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to RBAC doc? |
||||||
|
||||||
== Install Nodewatcher | ||||||
|
||||||
[tabs] | ||||||
====== | ||||||
Helm + Operator:: | ||||||
+ | ||||||
-- | ||||||
|
||||||
You can install the Nodewatcher controller as part of the Redpanda Operator or as a sidecar on each Pod that runs a Redpanda broker. When you install the controller as part of the Redpanda Operator, the controller monitors all Redpanda clusters running in the same namespace as the Redpanda Operator. If you want the controller to manage only a single Redpanda cluster, install it as a sidecar on each Pod that runs a Redpanda broker, using the Redpanda resource. | ||||||
|
||||||
To install the Nodewatcher controller as part of the Redpanda Operator: | ||||||
|
||||||
. Deploy the Redpanda Operator with the Nodewatcher controller: | ||||||
+ | ||||||
[,bash,subs="attributes+",lines=7+8] | ||||||
---- | ||||||
helm repo add redpanda https://charts.redpanda.com | ||||||
helm repo update | ||||||
helm upgrade --install redpanda-controller redpanda/operator \ | ||||||
--namespace <namespace> \ | ||||||
--set image.tag={latest-operator-version} \ | ||||||
--create-namespace \ | ||||||
--set additionalCmdFlags={--additional-controllers="nodeWatcher"} \ | ||||||
--set rbac.createAdditionalControllerCRs=true | ||||||
---- | ||||||
+ | ||||||
- `--additional-controllers="nodeWatcher"`: Enables the Nodewatcher controller. | ||||||
- `rbac.createAdditionalControllerCRs=true`: Creates the required RBAC rules for the Redpanda Operator to monitor the Node resources and update PVCs and PVs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
. Deploy a Redpanda resource: | ||||||
+ | ||||||
.`redpanda-cluster.yaml` | ||||||
[,yaml] | ||||||
---- | ||||||
apiVersion: cluster.redpanda.com/v1alpha2 | ||||||
kind: Redpanda | ||||||
metadata: | ||||||
name: redpanda | ||||||
spec: | ||||||
chartRef: {} | ||||||
clusterSpec: {} | ||||||
---- | ||||||
+ | ||||||
```bash | ||||||
kubectl apply -f redpanda-cluster.yaml --namespace <namespace> | ||||||
``` | ||||||
|
||||||
To install the Decommission controller as a sidecar: | ||||||
|
||||||
.`redpanda-cluster.yaml` | ||||||
[,yaml,lines=11+13+15] | ||||||
---- | ||||||
apiVersion: cluster.redpanda.com/v1alpha2 | ||||||
kind: Redpanda | ||||||
metadata: | ||||||
name: redpanda | ||||||
spec: | ||||||
chartRef: {} | ||||||
clusterSpec: | ||||||
statefulset: | ||||||
sideCars: | ||||||
controllers: | ||||||
enabled: true | ||||||
run: | ||||||
- "nodeWatcher" | ||||||
rbac: | ||||||
enabled: true | ||||||
---- | ||||||
|
||||||
- `statefulset.sideCars.controllers.enabled`: Enables the controllers sidecar. | ||||||
- `statefulset.sideCars.controllers.run`: Enables the Nodewatcher controller. | ||||||
- `rbac.enabled`: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs. | ||||||
|
||||||
-- | ||||||
Helm:: | ||||||
+ | ||||||
-- | ||||||
[tabs] | ||||||
==== | ||||||
--values:: | ||||||
+ | ||||||
.`decommission-controller.yaml` | ||||||
[,yaml,lines=4+6+8] | ||||||
---- | ||||||
statefulset: | ||||||
sideCars: | ||||||
controllers: | ||||||
enabled: true | ||||||
run: | ||||||
- "nodeWatcher" | ||||||
rbac: | ||||||
enabled: true | ||||||
---- | ||||||
+ | ||||||
- `statefulset.sideCars.controllers.enabled`: Enables the controllers sidecar. | ||||||
- `statefulset.sideCars.controllers.run`: Enables the Nodewatcher controller. | ||||||
- `rbac.enabled`: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs. | ||||||
|
||||||
--set:: | ||||||
+ | ||||||
[,bash,lines=4-6] | ||||||
---- | ||||||
helm upgrade --install redpanda redpanda/redpanda \ | ||||||
--namespace <namespace> \ | ||||||
--create-namespace \ | ||||||
--set statefulset.sideCars.controllers.enabled=true \ | ||||||
--set statefulset.sideCars.controllers.run={"nodeWatcher"} \ | ||||||
--set rbac.enabled=true | ||||||
---- | ||||||
+ | ||||||
- `statefulset.sideCars.controllers.enabled`: Enables the controllers sidecar. | ||||||
- `statefulset.sideCars.controllers.run`: Enables the Nodewatcher controller. | ||||||
- `rbac.enabled`: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs. | ||||||
|
||||||
==== | ||||||
-- | ||||||
====== | ||||||
|
||||||
== Test the Nodewatcher controller | ||||||
|
||||||
. Test the Nodewatcher controller by deleting a Node resource: | ||||||
+ | ||||||
[,bash] | ||||||
---- | ||||||
kubectl delete node <node-name> | ||||||
---- | ||||||
+ | ||||||
NOTE: This step is for testing purposes only. | ||||||
|
||||||
. Monitor the logs of the Nodewatcher controller: | ||||||
+ | ||||||
-- | ||||||
- If you're running the Nodewatcher controller as part of the Redpanda Operator: | ||||||
+ | ||||||
[,bash] | ||||||
---- | ||||||
kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace> | ||||||
---- | ||||||
|
||||||
- If you're running the Nodewatcher controller as a sidecar: | ||||||
+ | ||||||
[,bash] | ||||||
---- | ||||||
kubectl logs <pod-name> --namespace <namespace> -c redpanda-controllers | ||||||
---- | ||||||
-- | ||||||
+ | ||||||
You should see that the controller successfully deleted the PVC of the Pod that was running on the deleted Node resource. | ||||||
+ | ||||||
[,bash] | ||||||
---- | ||||||
kubectl get persistentvolumeclaim --namespace <namespace> | ||||||
---- | ||||||
|
||||||
. Verify that the reclaim policy of the PV is set to `Retain` to allow you to recover the node, if necessary: | ||||||
+ | ||||||
[,bash] | ||||||
---- | ||||||
kubectl get persistentvolume --namespace <namespace> | ||||||
---- | ||||||
|
||||||
After the Nodewatcher controller has finished, xref:manage:kubernetes/k-decommission-brokers.adoc[decommission the broker] that was removed from the node. This is necessary to prevent a potential loss of quorum and ensure cluster stability. | ||||||
|
||||||
NOTE: Make sure to use the `--force` flag when decommissioning the broker with xref:reference:rpk/rpk-redpanda/rpk-redpanda-admin-brokers-decommission.adoc[`rpk redpanda admin brokers decommission`]. This flag is required when the broker is no longer running. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.