Image Backup Controller ensures all running deployments/daemonSets images belong to our backup registry, cloning all external images. Once cloned it updates resource spec and rollouts the new backup images.
We have the needing to watch Deployments and DaemonSets resources and spot what images are external to our backup registry.
On detected external image we clone it to the destination backup registry, and then we update resource Spec rolling out updated versions.
First development choice was to implement 2 separate controllers, watching deployments and daemonSets, both share the whole logic presenting a unique friction point in their own types, they can be implemented in a generic way, sharing mainly everything as both share the whole logic.
This choice was done in the first implementation, but after full implementation and think more about it this basic scenario has some corner cases apart from the lack of external visibility as workloads are rolled out and no state reflection in the system is shown.
Apart from that, on the situation that we need to increase concurrency, it could potentially happen that we will create an image backup more than once at the same time,
Example: Slow backup process is started, and we receive another event from the same resource, basic controller implementation it's not aware of current backup executions.
This opens the space to an 'operator' implementation where deployment/daemonSet controllers acts as producers from image backup tasks to be executed, image backup controller will take care on backup generation and reflect its progress through the status subresource
At the end this model works in a collaborative way, similar to Deployments/ReplicaSets/Pods relation as example, in that scenario our deployment/dameonset controllers will be able to watch image backup task progress through its state, being able to complete the rollout process once all resource backup tasks are completed.
The flow works as:
- deployment/daemonSet watches for objects on ready state from a non-restricted namespaces
- on deployment/daemonSet create/update event controller spots non-backup used image (initContainers/containers)
- it checks if exists an image backup task related
- if it's found continue checks execution state and continue
- if none is found it will create an image backup task fot it
- it checks if exists an image backup task related
- ImageBackup controller process image backup executions progressing the CRD Status subresource
- on execution success an expiration timer will take care of image backup removals
The final model offers more flexibility as it can be easily extended to other workload kinds sharing GenericController
Project assumptions:
- pods are already running in the cluster, so that Deployments/DaemonSets are expected to be Ready
- we are just interested in crete/update events
- restrict events from banned namespaces (kube-proxy)
- we are not interested in any other workload types (StatefulSets, Jobs, CronJobs...)
From the original assumptions we can define what are the events that we want to watch, predicates will implement each one of the restrictions Overall idea is that we just execute Reconcile when an image backup must be generated and rollout
The whole project has been developed using Kubebuilder (3.4.1) and local Minikube (v1.25.2). Autogeneration has been widely use creating project scaffolding and manifests, kustomize is used in background to patch receipts before apply. This project aims to follow the Kubernetes Operator pattern
It uses Controllers which provides a reconcile function responsible for synchronizing resources untile the desired state is reached on the cluster
kubebuilder init --domain k8slab.io --repo github.com/marcosQuesada/image-backup-controller
kubebuilder create api --group k8slab.io --version v1alpha1 --kind Deployment
kubebuilder create api --group k8slab.io --version v1alpha1 --kind DaemonSet
kubebuilder create api --group k8slab.io --version v1alpha1 --kind ImageBackup
ImageBackup CRD definition, generate deepCopy and manifests
make generate
make manifests
Install CRDs on K8s cluster (I'm using local minikube)
make install
Controller can be run locally (ensure required backup registry credentials from env var to make it run locally)
make run
make docker-build
make docker-push IMG=marcosquesada/image-backup-controller:latest
kubectl create ns image-backup
kubectl create secret generic backup-registry-secret --from-literal=username=xxxx --from-literal=passowrd=xxxx -n image-backup
make deploy
To delete the CRDs from the cluster:
make uninstall
UnDeploy the controller to the cluster:
make undeploy
Example Deployment and DaemonSet are provided in config/samples/ folder, install them as:
kubectl apply -f config/samples/
Before backup image rollout:
kubectl get deployments nginx -n nginx -o jsonpath='{.spec.template.spec.containers[*].image}'
nginx:1.14.0
kubectl get daemonset fluentd -n fluentd -o jsonpath='{.spec.template.spec.containers[*].image}'
fluentd:latest
After image update rollout:
kubectl get deployments nginx -n nginx -o jsonpath='{.spec.template.spec.containers[*].image}'
docker.io/marcosquesada/library_nginx:1.14.0
kubectl get daemonset fluentd -n fluentd -o jsonpath='{.spec.template.spec.containers[*].image}'
docker.io/marcosquesada/library_fluentd:latest
- improve BDD controller testing as is the critical core component
- include meta conditions on ImageBackup CRD reflecting resource transitions
- improve security, move from secret env vars to imagePullSecret keyChain
- use autogenerated informer on Image Backup CRD
- fire relevant events (record.EventRecorder)
- CI integration
- Grant Permissions to Prometheus Server so that it can scrape protected metrics
- deploy prometheus stack and check the integration(pending)
kubectl create clusterrolebinding metrics --clusterrole=metrics-reader --serviceaccount=image-nackup:controller-manager
- Deployment example: https://asciinema.org/a/w3UQAtuttNZZp2Yrlue4hQBCa
- Daemonset example: https://asciinema.org/a/8YPmVeqD7YvnBfjdpZW7NG986