-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP/DP split: Add leader election #3092
base: change/control-data-plane-split
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## change/control-data-plane-split #3092 +/- ##
===================================================================
+ Coverage 89.74% 89.96% +0.21%
===================================================================
Files 109 116 +7
Lines 11150 12478 +1328
Branches 50 50
===================================================================
+ Hits 10007 11226 +1219
- Misses 1083 1182 +99
- Partials 60 70 +10 ☔ View full report in Codecov by Sentry. |
So, the leader lease will also change when we scale down the pods? Did you check for that case? |
err, if we scale down the leader lease doesn't change, the unready non-leader NGF pods are the ones that get terminated, probably some kubernetes magic |
okay good to know! thank you. PR looks good to me |
I would rephrase this PR/commit message a bit. We already do support leader election for failover. These changes specifically are to only allow the data plane pods to connect to the leader. If control plane is scaled, only the leader is marked as ready and the backups are Unready so the data plane doesn't connect to them. |
Would you mind updating the image in the design document that we fixed? |
193d232
to
265b813
Compare
Here's what the logs look like when an NGF Pod starts up and acquires the leader lease:
And here's what a non-leader NGF Pod looks like when its been running for a while:
|
@salonichf5 , re-requesting your review since a large chunk of restructuring was done |
Add leader election to allow data plane pods to only connect to the lead NGF pod. If control plane is scaled, only the leader is marked as ready and the backups are Unready so the data plane doesn't connect to them.
Problem: We want the NGF control plane to fail-over to another pod when the control plane pod goes down.
Solution: Only the leader pod is marked as ready by Kubernetes, and all connections from data plane pods are connected to the leader pod.
Testing: Added unit tests.
Closes #2850
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.