Skip to content

Latest commit

 

History

History
197 lines (165 loc) · 9.5 KB

roadmap.md

File metadata and controls

197 lines (165 loc) · 9.5 KB

Roadmap

General

Planned features

  • Automated verification process: Run tests automatically to verify changes.

    Most tests in the project require a Kubernetes cluster and some additional prerequisites, e.g. istio. Currently, the Gerrit OpenSOurce community does not have these resources. At SAP, we plan to run verification in our internal systems, which won't be publicly viewable, but could already vote. Builds would only be triggered, if a maintainer votes +1 on the Build-Approved-label.

    Builds can be moved to a public CI at a later point in time.

  • Automated publishing of container images: Publishing container images will happen automatically on ref-updated using a CI.

  • Support for multiple Gerrit versions: All currently supported Gerrit versions will also be supported in k8s-gerrit.

    Currently, container images used by this project are only published for a single Gerrit version, which is updated on an irregular schedule. Introducing stable branches for each gerrit version will allow to maintain container images for multiple Gerrit versions. Gerrit binaries will be updated with each official release and more frequently on master. This will be (at least partially) automated.

  • Integration test suite: A test suite that can be used to test a GerritCluster.

    A GerritCluster running in a Kubernetes cluster consists of multiple components. Having a suite of automated tests would greatly help to verify deployments in development landscapes before going productive.

Gerrit Operator

Version 1.0

Implemented features

  • High-availability: Primary Gerrit StatefulSets will have limited support for horizontal scaling.

    Scaling has been enabled using the high-availability plugin. Primary Gerrits will run in Active/Active configuration. Currently, two primary Gerrit instances, i.e. 2 pods in a StatefulSet, are supported

  • Global RefDB support: Global RefDB is required for Active/Active configurations of multiple primary Gerrits.

    The Global RefDB support is required for high-availability as described in the previous point. The Gerrit Operator automatically sets up Gerrit to use a Global RefDB implementation. The following implementations are supported:


    The Gerrit Operator does not set up the database used for the Global RefDB. It does however manage plugin/module installation and configuration in Gerrit.

  • Full support for Nginx: The integration of Ingresses managed by the Nginx ingress controller now supports automated routing.

    Instead of requiring users to use different subdomains for the different Gerrit deployments in the GerritCluster, requests are now automatically routed to the respective deployments. SSH has still to be set up manually, since this requires setting up the routing in the Nginx ingress controller itself.

Planned features

  • Versioning of CRDs: Provide migration paths between API changes in CRDs.

    At the moment updates to the CRD are done without providing a migration path. This means a complete reinstallation of CRDS, Operator, CRs and dependent resources is required. This is not acceptable in a productive environment. Thus, the operator will always support the last two versions of each CRD, if applicable, and provide a migration path between those versions.

  • Log collection: Support addition of sidecar running a log collection agent to send logs of all components to some logging stack.

    Planned supported log collectors:

  • Support for additional Ingress controllers: Add support for setting up routing configurations for additional Ingress controllers

    Additional ingress controllers might include:

Version 1.x

Potential features

  • Support for additional log collection agents:

    Additional log collection agents might include:

    • fluentbit
    • Option to add a custom sidecar
  • Additional ValidationWebhooks: Proactively avoid unsupported configurations.

    ValidationWebhooks are already used to avoid accepting unsupported configurations, e.g. deploying more than one primary Gerrit CustomResource per GerritCluster. So far not all such cases are covered. Thus, the set of validations will be further expanded.

  • Better test coverage: More tests are required to find bugs earlier.

  • Automated reload of plugins: Reload plugins on configuration change.

    Configuration changes in plugins typically don't require a restart of Gerrit, but just to reload the plugin. To avoid unnecessary downtime of pods, the Gerrit Operator will only reload affected plugins and not restart all pods, if only the plugin's configuration changed.

  • Externalized (re-)indexing: Alleviate load caused by online reindexing.

    On large Gerrit sites online reindexing due to schema migrations a) or initialization b) of a new site might take up to weeks and use a lot of resources, which might cause performance issues. This is not acceptable in production. The current plan to solve this issue is to implement a separate Gerrit deployment (GerritIndexer) that is not exposed to clients and that takes over the task of online reindexing. The GerritIndexer will mount the same repositories and will share events via the high-availability plugin. However, it will access repositories in read-only mode.
    This solves the above named scenarios as follows:

    a) Schema migrations: If a Gerrit update including a schema migration for an index is applied, the Gerrit instances serving clients will be configured to continue to use the old schema. Online reindexing will be disabled in those instances. The GerritIndexer will have online reindexing enabled and will start to build the new index version. As soon as it is finished, i.e. it could start to use the new index version as read index, it will make a copy of the new index and publish it, e.g. using a shared filesystem. A restart of the Gerrit instances serving other clients will be triggered. During this restart the new index will be copied into the site. Since there may have been updated index entries since the new index version was published indexing of entries updated in the meantime will be triggered.

    b) Initialization of a new site: If Gerrit is horizontally scaled, it will be started with an empty index, i.e. it has to build the complete index. To avoid this, the GerritIndexer deployment will continuously keep a copy of the indexes up-to-date. It will regularly be stopped and a copy of the index will be stored in a shared volume. This can be used as a base for new instances, which then only have to update index entries that were changed in the meantime.

  • Autoscaling: Automatically scale Gerrit deployments based on usage.

    Metrics like available workers in the thread pools could be used to decide to scale the Gerrit deployment horizontally. This would allow to dynamically adapt to the current load. This helps to save costs and resources.

Version 2.0

Potential features

  • Multi region support: Support setups that are distributed over multiple regions.

    Supporting Gerrit installations that are distributed over multiple regions would allow to serve clients all over the world without large differences in latency and would also improve availability and reduce the risks of data loss.
    Such a setup could be achieved by using the multi-site setup.

  • Remove the dependency on shared storage: Use completely independent sites instead of sharing a filesystem for some site components.

    NFS and other shared filesystems potentially might cause performance issues on larger Gerrit installations due to latencies. A potential solution might be to use the multi-site setup to separate the sites of all instances and to use events and replication to share the state

  • Shared index: Using an external centralized index, e.g. OpenSearch instead of x copies of a Lucene index.

    Maintaining x copies of an index, where x is the number of Gerrit instances in a gerritCluster, is unnecessarily expensive, since the same write transactions have to be potentially done x times. Using a single centralized index would resolve this issue.

  • Shared cache: Using an external centralized cache for all Gerrit instances.

    Using a single cache for all Gerrit instances will reduce the number of computations for each Gerrit instance, since not every instance will have to keep its own copy up-to-date.

  • Sharding: Shard a site based on repositories.

    Repositories served by a single GerritCluster might be quite diverse, e.g. ranging from a few kilobytes to several gigabytes or repositories seeing high traffic and other barely being fetched. It is not trivial to configure Gerrit to work optimally for all repositories. Being able to shard at least the Gerrit Replicas would help to optimally serve all repositories.