Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of kubernetes leader election parameters #663

Open
davidhao3300 opened this issue Dec 11, 2024 · 4 comments
Open

Allow configuration of kubernetes leader election parameters #663

davidhao3300 opened this issue Dec 11, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@davidhao3300
Copy link

Describe the problem to be solved

Hello, thanks again for this great library. Some of our clusters have a large number of nodes (>1000), and the leader election has become a significant portion of the requests made to the k8s API server. As a stopgap, we're considering tweaking the leader election parameters to reduce the number of lease calls made to the k8s API server.

Currently those value are hard-coded here.

Proposed solution to the problem

A couple of questions/notes:

  • Happy to make the PR, this I see enough examples in the codebase to handle
  • If we tweak the numbers to reduce the number of lease calls, this opens up the duration during which we have no leader. What happens during that period of no-leadership? Existing nodes will have a potentially-stale peer list, while new nodes will not be able to discover their peers?

Another option we're considering is using the kubernetes endpoint to discover peers (equivalent of kubectl get endpoints spegel, but for port 5001) to maybe-circumvent the need to use a leader to discover peers, but I'm not too clear if this'll cause weird behavior, would appreciate feedback there

@davidhao3300 davidhao3300 added the enhancement New feature or request label Dec 11, 2024
@davidhao3300
Copy link
Author

Another possibility is to split the leader election as a separate deployment from the worker daemonset:

  • a leader election set of pods will vie for the lease. Leader publishes its key as a configmap (ideally a service, but noticed the lease holder ID is not a straightforward IP)
  • The daemonset acts as a set of worker pods - never vie for leadership, and instead watch the configmap to get the leader key

@phillebaba
Copy link
Member

This brings up an interesting point. Leader election was probably never built for applications running in a daemonset, especially in such a large cluster.

I do see how this is a problem for you. Using leader election is probably not the best solution long term for Spegel. It was used as the best solutions for two problems that had to be solved for bootstrapping. The first is that for a peer to connect to another it needs the ID of the peer which includes a randomly generated public key. The second is that all peers need to agree on the same set of peer(s) to initially connect to. If a random peer was selected a split cluster could in theory have been created.

I am happy to discuss alternatives to solve this problem. The two main things that any solution needs to provide is that the public key needs to e shared, and the same peers need to be selected. One option is for example to choose the oldest Spegel instances, but that does not solve the sharing public key part.

@davidhao3300
Copy link
Author

Happy new years!

How do you feel about having a small "leader-election" group of pods, and then a daemonset that actually performs caching? I think this lets us keep the simple-to-understand leader election without causing spamming of the kubernetes api

@phillebaba
Copy link
Member

I think it is a good solution for users with large clusters. We could make this an opt in feature that is disabled by default. When enabled a Deployment with n (3?) replicas is created that do leader election among themselves. Then all other Daemonset pods use the leader to bootstrap. This way we will have a known amount of pods doing leader election in very large clusters.

There has to be a better solution for this problem that I am not seeing yet, but striving for perfection wont solve things today so its better to revisit this at a later time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants