Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templated COSA command execution #1739

Merged
merged 5 commits into from
Sep 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
13 changes: 10 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ DESTDIR ?=
# W504 line break after binary operator
PYIGNORE ?= E128,E241,E402,E501,E722,W503,W504

.PHONY: all check flake8 pycheck unittest clean mantle mantle-check install
.PHONY: all check flake8 pycheck unittest clean mantle mantle-check install entry entry-check

all: mantle
all: mantle entry

src:=$(shell find src -maxdepth 1 -type f -executable -print)
pysources=$(shell find src -type f -name '*.py') $(shell for x in $(src); do if head -1 $$x | grep -q python; then echo $$x; fi; done)
Expand All @@ -30,7 +30,7 @@ endif
.%.shellchecked: %
./tests/check_one.sh $< $@

check: ${src_checked} ${tests_checked} ${cwd_checked} flake8 pycheck mantle-check
check: ${src_checked} ${tests_checked} ${cwd_checked} flake8 pycheck mantle-check entry-check
echo OK

pycheck:
Expand Down Expand Up @@ -60,6 +60,12 @@ mantle:
mantle-check:
cd mantle && $(MAKE) test

entry:
cd entrypoint && $(MAKE)

entry-check:
cd entrypoint && $(MAKE) test

install:
install -d $(DESTDIR)$(PREFIX)/lib/coreos-assembler
install -D -t $(DESTDIR)$(PREFIX)/lib/coreos-assembler $$(find src/ -maxdepth 1 -type f)
Expand All @@ -74,3 +80,4 @@ install:
ln -sf coreos-assembler $(DESTDIR)$(PREFIX)/bin/cosa
install -d $(DESTDIR)$(PREFIX)/lib/coreos-assembler/tests/kola
cd mantle && $(MAKE) install DESTDIR=$(DESTDIR)
cd entrypoint && $(MAKE) install DESTDIR=$(DESTDIR)
1 change: 1 addition & 0 deletions entrypoint/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bin/*
33 changes: 33 additions & 0 deletions entrypoint/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
my_dir = $(shell $$(dirname $$(realpath)))
version = $(shell date +%Y-%m-%d).$(shell git rev-parse --short HEAD)~$(shell test -n "`git status -s`" && echo dirty || echo clean)
cosa_dir = $(shell test -d /usr/lib/coreos-assembler && echo /usr/lib/coreos-assembler)
ldflags=-X main.version=${version} -X main.cosaDir=${cosa_dir}


PREFIX ?= /usr
DESTDIR ?=
ARCH:=$(shell uname -m)

pkgs := $(shell go list -mod=vendor ./...)
.PHONY: build
build: test
@mkdir -p bin
find cmd -iname *.go -exec go build -i -ldflags "${ldflags}" -mod vendor -v -o bin {} \;

.PHONY: fmt
fmt:
gofmt -d -e -l $(shell find . -iname "*.go" -not -path "./vendor/*")

.PHONY: fmt test
test:
go test -mod=vendor -i ${pkgs}
go test -mod=vendor -cover ${pkgs}

.PHONY: clean
clean:
@go clean .
@rm -rf bin

.PHONY: install
install: clean build
install -v -D -t $(DESTDIR)$(PREFIX)/bin bin/entry
150 changes: 150 additions & 0 deletions entrypoint/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Entrypoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic write up! One nit which I do not think is worth looking at in this PR itself is I believe it would be best to have the "what and how to use" up front followed by the "why". I believe more people will be interested in what it is and how to use it before they end up asking what brought us from where we are today to this pattern.


Introduced as [part of PR 1739](https://github.com/coreos/coreos-assembler/pull/1739), the GoLang Entrypoint a CI-specific alternative "entrypoint" for executing CoreOS Assemlber (COSA).

## Jenkins Pipeline are Greedy

One of the lessons learned from the RHCOS pipelines is that they are expensive in terms of resources. Only a few steps (build and KOLA) actually require KVM access, and then most commands do not require 4Gb of RAM. However, the entire pipeline run from start to finish, needs to run with resource guarantees of the most resource intensive step. To whit:
- Base builds need ~5Gb of disk space for cache, 4Gb of RAM and KVM access
- KOLA testing requires 1-2Gb of RAM per thread
- When building all the artifacts, 60Gb of ephemeral storage is needed before compression. On over-subscribed nodes, we have had to play games with PVC's in order to avoid node evictions for consuming too much disk space.
- The OCP cluster for the RHCOS pipeline only has a few nodes that are capable of running the pipeline. Development, developer and production-delivery pipelines all compete for the same nodes. This has led to pipeline-launched Pods, being evicted during later stages.
- Environmental differences between the COSA CI, FCOS and RHCOS pipelines has resulted in considerable drift.

Running multiple Jenkins pods is one way to deal with this. Yet, each Jenkins launched pod requires both a COSA and Agent container. In the RHCOS case, we actually have to run three containers (COSA, Jenkins and the Message Bus container) -- adding an extra 512Mb to each pod, all scheduled on an over-subscribed resource. Nevermind the maintance of cost of Jenkins.

The other problem with Jenkins is that you _need_:
- The COSA Image name. The RHCOS pipeline has to know what version of COSA to use since there is a different version for each RHCOS release.
- An Agent and Master image for Jenkins itself.
- A pantheon of Cloud publication secrets that either are stored in Jenkins or in OCP itself. When the RHCOS pipeline started using OpenShift secrets, we ended up mounting roughly a dozen `volumeMounts` to each pod.
- The agents often timeout, and preform poorly over high latency connections.

While its possible to run cross-cluster with Jenkins, in reality, its almost is never done. In fact, for the RHCOS pipleines, we have three teams running various versions of Jenkins and pipeline editions. The nicities of Jenkins in this world, are in fact, liabilities. A common theme for the production delivery team is various Jenkins issues. Worse, for each architecture of RHCOS we have have to stand up Jenkins and then populate an OCP namespace.

## Jenkins has become a templating engine

In the RHCOS case, a considerable amount of Groovy has been written to parse, check and emit "cosa <TARGET> <CLI ARGS>" commands. The FCOS pipleine is easier to maintain and understand; the RHCOS pipeline a is special snowflake of variable-controled rules. The complexity of RHCOS's pipeline comes from the business rules requiring special logic. FCOS's pipeline only has to support a single configuation, while RHCOS has to support at least three releases on four architectures.

Initially the RCHOS pipeline uses OpenShift BuildConfig with envVars. Over time, almost all of these envVars and even Jenkins job parameters were removed. As it turns out, converting YAML to envVars to Groovy to Shell is ripe for type errors; this was especially true when dealing with truthiness.

To help deal with truthiness, the RHCOS pipeline grew the "JobSpec" (as in Jenkins Job). The JobSpec was an attempt at creating a declarative method of setting variables for the pipeline to consume. This idea allows the RHCOS pipeline to run entirely via GitOps and without having to deal with type-conversion errors, or having envVars from a BuildConfig, to provide for dynamic behavior.

## The Cost of Developing Jenkins

The problem with using Jenkins as a templating engine is that its is incredibly inefficient for testing. Consider:
1. changes have to be commited first
1. then a build has to be started
1. a human has to check the run
1. repeat, repeat....

The problem with this model is:
- it requires developers to invest in Jenkins and, by extension Groovy
- its wasteful in terms of resources and re-enforces git-smash'ng
- there is no way to have a pure logic-check run -- the commands have to actually be run

Some work has been done to CI the CI, which introduced its own class of problems.

## The Problem

While the JobSpec provided a means of GitOps controlled execution of the Jenkins Job, *it was simply wall-papering over a glaring missing feature in COSA: the lack of a non-CLI interface into COSA*. A review of the three COSA CI pipelines shows that Jenkins provides:
- launching a pods
- templating COSA commands
- a pretty web-UI

In fact, the problem is a bit deeper:
- COSA assumes an order of operations. This order of operation is codified in the code, but is not documented. The Jenkins pipelines are quasi-authoritative in the order of operations. To whit: `cosa build` must preceed a `cosa buildextend`, some artifacts require the `metal*` artifacts while others require just the `qcow2`.
- The CLI interface is inconsient. Some commands are Bash, others Python and use different styled arguments.
- The notion of `build` is baed on positional perception: COSA considers building the OSTree a build, but by default it builds the `qcow2`. Pipelines consider creating `artifacts` as "building". And users consider a "build" to be _all_ the produced artifacts.
- When COSA changes, all the pipelines have to change.

## The Solution

The `entrypoint` is proposed as the "thing" to provide stable interface(s) into COSA. [Previously an envVar interface](https://github.com/coreos/enhancements/pull/1) was proposed. Bluntly speaking, the idea was not well-recieved. The `entrypoint` seeks to provide a set of interfaces into COSA that:
- provides a means of file-based instrutions to COSA
- provides a means of templating the COSA commands
- initially provide the RHCOS JobSpec to templated COSA commands
- act as a CI `ENTRYPOINT` for COSA containers built to run in OCP
- run COSA as a first-class OpenShift Custom Builder
- provide testable code for parsing the commands
- avoid migrating Jenkins pipeline to Tekton

While `entrypoint` current supports the RHCOS JobSpec, it is anticipated that other "specficiations" will be introduced such as OCP's BuildConfig Specification.

## GoLang to the rescue

The bulk of COSA code is either Bash or Python. [It has been proposed that we support command in GoLang](https://github.com/coreos/coreos-assembler/issues/1668), previously. And since COSA swallowed Mantle, there is a third-language: GoLang.

GoLang was chosen over Bash or Python several reasons:
- GoLang is a compiled language. For something acting as an orchestrator, run-time compilation or scripts are too fragile.
- GoLang is strictly typed. Going from strictly typed to the loosely typed Bash or Python is "safer".
- The contributing developers of COSA prefer Bash or GoLang over Python.
- GoLang templating is commonly used in the OpenShift program.
- Bash is largely untestable.
- GoLang avoids previous COSA disputes regarding OOO and style.

## Why not OpenShift Tempalates?

An early lesson learned writing the RHCOS pipeline is that while an OpenShift template is trivial, they tend to pollute the namespace. OpenShift templates are great for deploying an application, but become tedious when deploying arbitrary configruations. For example, using an OpenShift template to deploy test, dev, and production configurations could require three seperate deployements when all that changes is a single variable.

The vision of the `entrypoint` is to create a templated execution of COSA based the file interface. That is, instead of managing different deployments, COSA will take a configuration (the JobSpec) and `run-steps`. A single `buildconfig` can service the needs of developers and production enviroments.

## Jenkins as a helper

Jenkins is NOT going away in this world view. Rather, Jenkins will not be directly scheduling the pods. A new set of COSA CI Libs will be created that provide wrappers around the `oc` binary for calling Openshift BuildConfig.

And example invocation might look like:
```
stage("build") {
parallel x86: {
cosa_oc("creds-x86", "build.steps")
} aarch64: {
cosa_oc("creds-aarch64", "build.steps")
}
```

Where `cosa_oc` is the wrapper that:
- sets `KUBECONFIG=creds-*`
- creates a `build.tar` containing the `JobSpec`, `build.steps`
- calls `oc start-build bc/cosa-priv --from-archive=build.tar --follow=true`

In this world, the Secrets would exist outside of Jenkins and would be stored in the OpenShift environment and referenced in the `buildConfig` itself. Then `entrypoint`, which will support the OpenShift `buildConfig` spec will:
- unpack `build.tar`
- find the `jobspec` and the `build.steps`
- execute the steps

Since the builds are using `buildConfigs`, each "build" is repeatable.

Ideally, there would be BuildConfigs for:
- privileged execution for builds that need direct /dev/kvm access
- privileged execution for testing
- unprivileged execution publication steps

## Development and Pipeline Parity

A profound pain point for COSA _and_ pipeline development is that environmental differences between the developer (and, by extension their pet container), and COSA, FCOS and RHCOS pipelines can cause a rousing round of "fix a bug whack-a-mole." (Where the code works in one pipeline, but not another) `entrpyoint` seeks to solve that by removing Jenkins from the Pod execution by allowing the developer to run pipeline code locally. That is, a developer should have reasonable assurances that if they run locally run steps via `podman -it --entrypoint /usr/bin/entry coreos-assembler....` it will succeed in one of the pipelines.

## `cosa remote`

In the "Jenkins as a Helper" section, a curious opening appears -- the ability to run `cosa` commands _remotely_ in an OpenShift Cluster.

For those unlucky enough to be obtian their internet access from a major US-based cable monopoly, an incredible pain point is the "build-upload" cylce:
1. developer begs around for $CLOUD credentials
1. they hack on COSA, RPM's, overrides, etc.
1. build
1. upload
1. do something else while 20G image is slurped up at 250Kbs...
1. repeats steps 2-5

By having COSA as a `buildConfig`, we can now have a `cosa remote` command that:
- creates a `devel.tar` of `src`, `overrides`, and local COSA hacks with a JobSpec and `build.steps`
- call `oc start-build bc/cosa-priv --from-archive=devel.tar --env=DEVELOPER_MODE=1 --follow=true`

When the buildConfig starts, it would upack `devel.tar` and then exec into the developer's local COSA environment running remotely. This would save the developer from:
1. having to get their own credentials
1. the build happens close to the source
1. when pushing the build, the developer's in-house broadband is not used
1. development time can be significantly reduced.

## In Conclusion

The rationale behind draining the pipelines into Jenkins is a question of developer efficiency, satisfaction and reducing the operational burden.
Loading