From cc61b5a5b872f48c5c2187945c979177c97b94b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Muzik=C3=A1=C5=99?= Date: Wed, 15 Dec 2021 11:35:18 +0100 Subject: [PATCH 1/6] Keycloak.X Operator Co-authored-by: jonathanvila Co-authored-by: andreaTP --- design/keycloak.x/operator.md | 121 ++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 design/keycloak.x/operator.md diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md new file mode 100644 index 0000000..d7acf16 --- /dev/null +++ b/design/keycloak.x/operator.md @@ -0,0 +1,121 @@ +# Keycloak.X Operator + +## Motivation for a new operator + +The current Operator made in Go Lang served us and the community well so far, but increasing challenges are paving the road for a complete re-write. + +* The codebase is hard to maintain because of organic growth and accumulated technical debt +* The Keycloak community is more keen to Java and there is less Go Lang expertise +* The current project needs some high-cost maintenance tasks to be performed in order to use the latest features (e.g. webhooks) and receive the latest fixes and patches, specifically: + * upgrading Go lang version from 1.13 to 1.17 + * upgrading the Operator SDK and the dependencies + + Those upgrades will require creating a completely new project, using different libraries, moving, and in some cases, rewriting the components, e.g. the whole testsuite. +* The current approach around CRDs no longer fits the long term vision for cloud-native Keycloak as it is very error-prone and fragile. +* A Java operator can share business objects with the Keycloak main codebase increasing the code-reuse and dramatically reducing the chances of introducing bugs in the translation process from Kubernetes resources. +* A unified codebase will make it easy to test and debug the entire system. +* A new operator will embrace from the ground up the new Cloud Native capabilities of upcoming Keycloak releases such as the Quarkus distribution and Store.X, making those first-class citizen overall improving the user experience. + + +## Features + +--- +**NOTE** + +The primary target of the operator is to make it easy to achieve production grade installations of Keycloak.X. + +--- + +### Configuring Keycloak deployment + +The operator will use a CRD representing the Keycloak installation. The CRD will expose the following configuration options: +* custom Keycloak image; default base image will be used if not specified +* Git configuration to fetch static business objects configuration (see below) +* manual horizontal and vertical scaling +* pod and node affinity, toleration +* SSL config, truststore + +Since most of the configuration options will come from the Keycloak.X distribution itself, the CRD will also expose appropriate fields for passing any distribution related options to the container, like database connection details (obviously without any credentials), SPIs configuration, etc. + +### Configuring Keycloak business objects using Kubernetes resources + +The new operator will expose two main ways of configuring business objects in Keycloak (e.g.: Realms, Roles, Clients, etc.) in addition to the built-in Dynamic configuration through the REST API/UI console: +* Static file based configuration stored in Git to enable GitOps operations +* Static configuration through Kubernetes API + +Static configuration will be strictly read-only, therefore two-way syncing is not going to be needed. + +Static configuration is going to provide an immutable and cloud native way for managing business objects in Keycloak that can be easily moved between environments (e.g. dev, stage, prod) in a predictible manner. This feature will leverage the new Store.X which enables federation of the configuration from multiple sources (static/dynamic) by architecting the storage layer. + +#### Static configuration through Git + +The `Keycloak` CRD will enable defining a specific commit (identified by an hash) in a Git repository containing the static configuration in the form of JSON/YAML files. To update the configuration the user will simply change a commit hash in a `Keycloak` CR and the operator will roll out the new configuration to all pods. + +#### Static configuration through Kubernetes API + +The operator will leverage dedicated CRD(s), initially, there will be only one `Realm` CRD directly translated from Keycloak's [RealmRepresentation](https://github.com/keycloak/keycloak/blob/c7134fd5390d7c650b3dfd4bd2a2855157042271/core/src/main/java/org/keycloak/representations/idm/RealmRepresentation.java). A Realm includes all subresources. As a result, it is going to be possible to configure every object in Keycloak through this CR even though for some of them it won't be recommended (e.g. Users). To implement this, the operator will simply translate the CRs to YAML files and mount them to Keycloak pods, again leveraging Store.X. + +It's purpose of the upcoming Store.X initiative to provide a full fledged static configuration backend for Keycloak but there will be a mid-term preview to enable bulk imports at startup time leveraging the REST api. + +### Keycloak versions alignment + +The operator will have its release cycle aligned with Keycloak. Each operator version will support only one corresponding Keycloak version. + +### Upgrades + +In order to upgrade to a newer Keycloak version, the operator will be upgraded first to ensure full compatibility with the operand. + +If custom Keycloak image is not used, the operator will use a default base image. After the operator is upgraded, it automatically upgrades Keycloak too using a newer base image. + + +In case a custom Keycloak image is used, the image will need to be rebuilt to perform the upgrade. This is not going to be operator's responsibility as building a custom image often requires a complex process incl. CI/CD pipelines. After the operator is upgraded, it won't manage any existing Keycloak instances until its custom image is manually rebuilt using the right Keycloak base image aligned with the operator and updated and the image coordinates are updated in the CR. + +### Reaugmentation process in Kubernetes + +We will be leveraging Kubernetes volumes to act as "caches" for the [augmented/configured](https://quarkus.io/guides/reaugmentation) version of Keycloak. +An initial POC to show the concept has been drafted here: +https://github.com/andreaTP/poc-mutable-jar + +We will use Kubernetes volumes to cache the augmented version of the binaries. +The artifacts in the kubernetes volume will be produced by an init-container and the operation is going to result in a noop in case the volume has already been populated by a compatible augmentation. + +### Connecting to a database + +A Postgres DB will have to be provisioned externally, it's not Keycloak Operator's responsibility to manage a database. The DB credentials will be stored as K8s Secrets. + +In long-term plan we'll add a limited integration with a Postgres Operator to leverage its backup functionalities for Keycloak upgrades. + +### Observability + +The operator will provide CR metrics as well as it will provide integration with Prometheus, Grafana and AlertManager for both operator and operand metrics. This will be addressed in an upcoming design proposal. + +### Ingresses + +The operator will provide an out-of-the-box experience using an opinionated default Ingress configuration. + + +## Codebase + +The code for the new operator will be organized as a Maven sub-module in the main GitHub `keycloak/keycloak` repository. +Dependency management will automatically piggy-back on the Keycloak BOM of the Quarkus distribution guaranteeing compliance of the used library versions. + +It will use the Java Operator SDK and its Quarkus extension. This implies the usage of Fabric8 K8s client. + +## Kubernetes compatibility matrix +* OpenShift >=4.7 +* Vanilla Kubernetes >=1.20 + +Other Kubernetes distributions are supported only in the best effort mode. + +## Distribution + +The Operator deployment is going to be performed leveraging OLM providing both CLI approach via `Subscription` objects for managing the operator installation, and UI in OpenShift. The Operator as such is going to be distributed as a container image. + +## Migration from the old operator +No direct migration path. Generic migration steps will be documented. + +## Future considerations + +### Autonomous operator + +Our long-term vision includes making the operator autonomous to some extent, basically making it a [Level 5 operator](https://operatorframework.io/operator-capabilities/). It should be able to understand the operand's metrics and reflect them while automatically scaling and healing Keycloak deployment. \ No newline at end of file From 418e36919761d45f64bc663d82eb1b972821b8df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Muzik=C3=A1=C5=99?= Date: Fri, 17 Dec 2021 13:26:32 +0100 Subject: [PATCH 2/6] Elaborate on Ingress and credentials. Co-authored-by: jonathanvila Co-authored-by: andreaTP --- design/keycloak.x/operator.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md index d7acf16..2bc7d27 100644 --- a/design/keycloak.x/operator.md +++ b/design/keycloak.x/operator.md @@ -45,7 +45,7 @@ The new operator will expose two main ways of configuring business objects in Ke Static configuration will be strictly read-only, therefore two-way syncing is not going to be needed. -Static configuration is going to provide an immutable and cloud native way for managing business objects in Keycloak that can be easily moved between environments (e.g. dev, stage, prod) in a predictible manner. This feature will leverage the new Store.X which enables federation of the configuration from multiple sources (static/dynamic) by architecting the storage layer. +Static configuration is going to provide an immutable and cloud native way for managing business objects in Keycloak that can be easily moved between environments (e.g. dev, stage, prod) in a predictable manner. This feature will leverage the new Store.X which enables federation of the configuration from multiple sources (static/dynamic) by re-architecting the storage layer. #### Static configuration through Git @@ -55,7 +55,9 @@ The `Keycloak` CRD will enable defining a specific commit (identified by an hash The operator will leverage dedicated CRD(s), initially, there will be only one `Realm` CRD directly translated from Keycloak's [RealmRepresentation](https://github.com/keycloak/keycloak/blob/c7134fd5390d7c650b3dfd4bd2a2855157042271/core/src/main/java/org/keycloak/representations/idm/RealmRepresentation.java). A Realm includes all subresources. As a result, it is going to be possible to configure every object in Keycloak through this CR even though for some of them it won't be recommended (e.g. Users). To implement this, the operator will simply translate the CRs to YAML files and mount them to Keycloak pods, again leveraging Store.X. -It's purpose of the upcoming Store.X initiative to provide a full fledged static configuration backend for Keycloak but there will be a mid-term preview to enable bulk imports at startup time leveraging the REST api. +It will be possible to store any credentials as Secrets in K8s leveraging [Keycloak Vault functionality](https://www.keycloak.org/docs/latest/server_admin/index.html#_vault-administration) where possible. + +It's purpose of the upcoming Store.X initiative to provide a full-fledged static configuration backend for Keycloak but there will be a mid-term preview to enable bulk imports at startup time leveraging the REST api. ### Keycloak versions alignment @@ -91,7 +93,7 @@ The operator will provide CR metrics as well as it will provide integration with ### Ingresses -The operator will provide an out-of-the-box experience using an opinionated default Ingress configuration. +The operator will provide an out-of-the-box experience using an opinionated default Ingress (Route on OpenShift) configuration. This configuration will support further "manual" modification. Additionally, it will be possible to completely disable this feature. ## Codebase From b101eb766cf0e06616fbfdae9a759523d2e2a60d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Muzik=C3=A1=C5=99?= Date: Fri, 17 Dec 2021 13:38:49 +0100 Subject: [PATCH 3/6] Elaborate on DB. Co-authored-by: jonathanvila Co-authored-by: andreaTP --- design/keycloak.x/operator.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md index 2bc7d27..474ec2c 100644 --- a/design/keycloak.x/operator.md +++ b/design/keycloak.x/operator.md @@ -83,7 +83,7 @@ The artifacts in the kubernetes volume will be produced by an init-container and ### Connecting to a database -A Postgres DB will have to be provisioned externally, it's not Keycloak Operator's responsibility to manage a database. The DB credentials will be stored as K8s Secrets. +A Postgres DB instance will have to be provisioned externally, it's not Keycloak Operator's responsibility to manage a database. The DB credentials will be stored as K8s Secrets. In long-term plan we'll add a limited integration with a Postgres Operator to leverage its backup functionalities for Keycloak upgrades. From b4ccee5e8abf0dca3a94da4514aabbba568352e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Muzik=C3=A1=C5=99?= Date: Fri, 17 Dec 2021 17:30:20 +0100 Subject: [PATCH 4/6] Proxy, rolling upgrades. --- design/keycloak.x/operator.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md index 474ec2c..312ab61 100644 --- a/design/keycloak.x/operator.md +++ b/design/keycloak.x/operator.md @@ -69,9 +69,10 @@ In order to upgrade to a newer Keycloak version, the operator will be upgraded f If custom Keycloak image is not used, the operator will use a default base image. After the operator is upgraded, it automatically upgrades Keycloak too using a newer base image. - In case a custom Keycloak image is used, the image will need to be rebuilt to perform the upgrade. This is not going to be operator's responsibility as building a custom image often requires a complex process incl. CI/CD pipelines. After the operator is upgraded, it won't manage any existing Keycloak instances until its custom image is manually rebuilt using the right Keycloak base image aligned with the operator and updated and the image coordinates are updated in the CR. +Store.X will allow zero-downtime rolling upgrades (a Keycloak upgrade performed pod by pod) that will ensure that Keycloak cluster will remain available even when upgrade fails on one of the pods. + ### Reaugmentation process in Kubernetes We will be leveraging Kubernetes volumes to act as "caches" for the [augmented/configured](https://quarkus.io/guides/reaugmentation) version of Keycloak. @@ -95,6 +96,10 @@ The operator will provide CR metrics as well as it will provide integration with The operator will provide an out-of-the-box experience using an opinionated default Ingress (Route on OpenShift) configuration. This configuration will support further "manual" modification. Additionally, it will be possible to completely disable this feature. +### Outgoing requests proxy settings + +The operator will respect the standard `HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environmental variables for any outgoing requests. These variables are by default [set by OLM](https://docs.openshift.com/container-platform/4.9/operators/admin/olm-configuring-proxy-support.html). The operator will also pass them to Keycloak pods to leverage the built-in support for them. This behaviour will be overridable. + ## Codebase From 8f45ed545ab38d3c59124a8b268cb1bc847044f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Muzik=C3=A1=C5=99?= Date: Mon, 10 Jan 2022 18:09:59 +0100 Subject: [PATCH 5/6] Tweaks for upgrades and DB --- design/keycloak.x/operator.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md index 312ab61..9975e10 100644 --- a/design/keycloak.x/operator.md +++ b/design/keycloak.x/operator.md @@ -71,7 +71,8 @@ If custom Keycloak image is not used, the operator will use a default base image In case a custom Keycloak image is used, the image will need to be rebuilt to perform the upgrade. This is not going to be operator's responsibility as building a custom image often requires a complex process incl. CI/CD pipelines. After the operator is upgraded, it won't manage any existing Keycloak instances until its custom image is manually rebuilt using the right Keycloak base image aligned with the operator and updated and the image coordinates are updated in the CR. -Store.X will allow zero-downtime rolling upgrades (a Keycloak upgrade performed pod by pod) that will ensure that Keycloak cluster will remain available even when upgrade fails on one of the pods. +Store.X will optionally allow zero-downtime rolling upgrades (a Keycloak upgrade performed pod by pod) that will ensure that Keycloak cluster will remain available even when upgrade fails on one of the pods. +"Recreate" upgrade strategy (all pods gracefully shut down and re-created with new image) will be also available. ### Reaugmentation process in Kubernetes @@ -84,7 +85,7 @@ The artifacts in the kubernetes volume will be produced by an init-container and ### Connecting to a database -A Postgres DB instance will have to be provisioned externally, it's not Keycloak Operator's responsibility to manage a database. The DB credentials will be stored as K8s Secrets. +A Postgres DB instance will have to be provisioned externally, it's not Keycloak Operator's responsibility to manage a database. The DB credentials will be fetched from K8s Secrets. In long-term plan we'll add a limited integration with a Postgres Operator to leverage its backup functionalities for Keycloak upgrades. From 93a7b6f2185decf5e65a2dce72f80f1d06aae9d4 Mon Sep 17 00:00:00 2001 From: andreaTP Date: Wed, 12 Jan 2022 18:17:08 +0000 Subject: [PATCH 6/6] Add a note on PodTemplate --- design/keycloak.x/operator.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/design/keycloak.x/operator.md b/design/keycloak.x/operator.md index 9975e10..8a9dbe9 100644 --- a/design/keycloak.x/operator.md +++ b/design/keycloak.x/operator.md @@ -37,6 +37,13 @@ The operator will use a CRD representing the Keycloak installation. The CRD will Since most of the configuration options will come from the Keycloak.X distribution itself, the CRD will also expose appropriate fields for passing any distribution related options to the container, like database connection details (obviously without any credentials), SPIs configuration, etc. +#### PodTemplate + +The new operator will provide specific and opinionated CRD fields to tune the Keycloak deployment for the most common use-cases, but, one of those knobs will be a customizable [`PodTemplate`](https://kubernetes.io/docs/concepts/workloads/pods/#pod-templates) and the ability to add additional, arbitrary `Volumes`. +The provided `PodTemplate` will be merged with the properties set by the operator as an "escape hatch" to configure any Kubernetes property is not explicitly exposed by the CRD API. + +This approach has been already proved to be successful by, among others, [Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template) and [Spark](https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template) projects. + ### Configuring Keycloak business objects using Kubernetes resources The new operator will expose two main ways of configuring business objects in Keycloak (e.g.: Realms, Roles, Clients, etc.) in addition to the built-in Dynamic configuration through the REST API/UI console: