-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Specs for Kubernetes Cluster #27
Comments
Hi @winmillwill -- thanks for your question. You're right that we have been using The easy part of porting to a managed k8s service is the YAML specs for each service component -- you can find these in The parts that will require work are in I'm happy to chat more if this is something you are going to be working on! |
Thanks for the info and the pointers. GKE has optional autoscaling built in -- you just tick a checkbox and if the total requested resources across pods exceed what is available, then you get another node. I don't know whether it has limitations that would require a more involved approach for some use-cases, nor do I know about whether other services provide a comparable feature or any limitations there. I would lean toward making autoscaling optional. It seems like scaling the number of nodes, automatically or otherwise, is going to be decided by whoever administers the k8s cluster, and adoption and experimentation is easier if the user doesn't have to be a cluster administrator. From looking around in Another issue I see is that the ebs daemonsets expect the disks that are attached by kops provisioning the ig. It would be more portable if we instead used a statefulset that took advantage of the cloud providers facilities for provisioning a volume dynamically. Is there a known issue with that approach? Regarding the general use of daemonsets and hostNetwork, is the idea to avoid the additional latency of iptables and so on? I'm also curious about the use of hostIPC and whether the non-stateful daemonsets can be deployments instead and which containers depend on being on the same node as some other container. |
100% agreed re: removing autoscaling from the purview of the user and relying on out-of-the-box autoscaling as much as possible. Like I said, those services weren't available when we started this project, so we weren't able to take advantage. You're right that AWS/ I'm not familiar with how stateful sets interact with cloud provider provisioning of disks. If you have some pointers about that, I'm happy to take a look. Regarding the use of |
Sorry for the delay in replying. The way a statefulset gets disks is that a statefulset has a PersistentVolumeClaimTemplate that will provide a PersistentVolumeClaim for each pod, which will result in a PersistentVolume getting created. The cloud provider will hook in here much like it does when you create a LoadBalancer service. You can tune this by creating StorageClass resources that spec how to create the disks and those can be referenced from a PVC (and from a PVC Template in a StatefulSet). This arrangement allows for the operator of the workload to just specify the number of replicas and not necessarily have to add taints to nodes or tolerations for those taints to the workload, and not need to care particularly about which nodes in the cluster the workload pods get scheduled on, above the normal anti-affinity rules for making sure that eg pods go in different AZs. The docs on the disk machinery are here: https://kubernetes.io/docs/concepts/storage/persistent-volumes/ For a configmap, the API you consume is essentially a yaml object where the keys are file names, the values are file contents, and in each container in the pod spec you can choose a directory to mount that configmap and it's files will be in that directory. In the application you can watch for filesystem events or just continuously check the interesting file paths to determine if the config needs to be reloaded, or you can put it on the operator to ensure the pods get cycled after a change to the configmap. IIUC, the issue with IP-level access is that much like with cassandra, we need to do client-side loadbalancing. At my day job, we provide a LoadBalancer service for each cassandra pod that needs out-of-cluster access and provide a headless service for the statefulset that runs the cassandra pods so that in-cluster clients can just use the headless service and out-of-cluster clients can use the DNS records or IP addresses of the LoadBalancers. Different organizations can handle this differently by eg putting their kubernetes cluster in the same VPC or peered with the VPC of the out-of-cluster clients. From what we've discussed I feel like I can work up a POC for some things I'm thinking about. Thanks for the help! |
I'm interested in running anna and cloudburst in a different flavor of Kubernetes, specifically GKE. From skimming the repo I see mention of a mesh and an ELB. Other than that I see you are using kops rather than the AWS managed Kubernetes offering, so I wonder if there are specific needs in terms of control of the kubernetes components, the virtual machines, or the network(s) between them.
The text was updated successfully, but these errors were encountered: