Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed mode phase 1 #98

Merged
merged 47 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
37a9024
prepare the controller for the split
iandyh Apr 17, 2023
4e722c4
controller into cmd
iandyh Jul 7, 2023
46442b3
better build
iandyh Jul 24, 2023
8c54c37
provide an entrypoint for building the controller
iandyh Jul 24, 2023
1ac484a
support build the controller locally
iandyh Jul 26, 2023
4dd10ea
use simplified Dockerfile
iandyh Aug 10, 2023
007124b
install shibuya runtime with helm
iandyh Aug 10, 2023
f099f7d
use dockerhub base image for multi platform build
iandyh Aug 10, 2023
8ed44d1
remove unused file
iandyh Aug 10, 2023
0a3a22b
support distributed mode and non-distributed mode
iandyh Aug 10, 2023
7e96fcc
support kubeconfig cm in the manifests
iandyh Aug 10, 2023
b595242
add helm to the docs
iandyh Aug 13, 2023
64a804d
add some docs about distributed mode
iandyh Aug 13, 2023
602383d
support labels and envvars in the templates
iandyh Aug 22, 2023
5f168e4
support resources and context name
iandyh Aug 23, 2023
47f365b
support annotations
iandyh Aug 24, 2023
0fff565
support labels in controllers
iandyh Aug 24, 2023
9dcc536
better operator usage
iandyh Aug 24, 2023
f25520b
support envvars in controller as well
iandyh Aug 24, 2023
2dbe18c
fix white spaces in templates
iandyh Aug 24, 2023
e7d4922
fix template
iandyh Sep 20, 2023
01ace68
build the image using GitHub actions
iandyh Sep 21, 2023
0f13923
fix install
iandyh Sep 22, 2023
969565e
support the new way of building the images
iandyh Sep 22, 2023
2abbeed
supporting node affinities and tolerations
iandyh Sep 25, 2023
b862321
support building helm charts
iandyh Sep 25, 2023
709a3de
configure auth for helm
iandyh Sep 25, 2023
9b0567b
use helm chart releaser
iandyh Sep 25, 2023
c4b19b6
support image pull secrets
iandyh Sep 29, 2023
3e76c73
fix service name and labels
iandyh Sep 29, 2023
66316db
fix imagepull secrets
iandyh Sep 29, 2023
75e3dad
controller should be a deployment not a job
iandyh Sep 29, 2023
ca66702
improve the format
iandyh Dec 7, 2023
6812909
add the port name
iandyh Dec 7, 2023
1a46d94
render the json list
iandyh Dec 7, 2023
b52e014
fix labels and format
iandyh Dec 7, 2023
c2de939
handle the auth keys that required for connecting with gcp services
iandyh Dec 7, 2023
7d81fd0
fix missing configs
iandyh Dec 8, 2023
928ae0f
fix configmap template in tolerations
iandyh Feb 16, 2024
1d17ce2
fix configmap template in tolerations
iandyh Feb 16, 2024
6f4e5d9
add master branch to event hook
iandyh Feb 20, 2024
fafce57
specify helm version
iandyh Feb 20, 2024
e1adab8
should use the color of value files
iandyh May 17, 2024
6afd126
add some logs
Jun 13, 2024
4cbd4ad
add some logs
Jun 13, 2024
f71f135
trigger the goroutine funcs first
Jun 13, 2024
354fdb2
move engine health checks out from isolated controller loop
Jun 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/build-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This workflow will build a docker container, publish it to Google Container Registry, and deploy it to GKE when there is a push to the "master" branch.
#
# To configure this workflow:
#
# 1. Ensure that your repository contains the necessary configuration for your Google Kubernetes Engine cluster, including deployment.yml, kustomization.yml, service.yml, etc.
#
# 2. Create and configure a Workload Identity Provider for GitHub (https://github.com/google-github-actions/auth#setting-up-workload-identity-federation)
#
# 3. Change the values for the GAR_LOCATION, GKE_ZONE, GKE_CLUSTER, IMAGE, REPOSITORY and DEPLOYMENT_NAME environment variables (below).
#
# For more support on how to run the workflow, please visit https://github.com/google-github-actions/setup-gcloud/tree/master/example-workflows/gke-kustomize

name: Build and Deploy to GCP registry

on:
push:
branches: [ "split" ]
iandyh marked this conversation as resolved.
Show resolved Hide resolved

env:
GAR_LOCATION: asia-northeast1 # TODO: update region of the Artifact Registry
IMAGE: shibuya

jobs:
setup-build-publish-deploy:
name: Setup, Build, Publish
runs-on: ubuntu-20.04
environment: production

steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.17'

- id: 'auth'
uses: 'google-github-actions/auth@v1'
with:
credentials_json: '${{ secrets.GCP_CREDENTIALS }}'
token_format: 'access_token'

- name: Docker configuration
run: |-
echo '${{ steps.auth.outputs.access_token }}' | docker login -u oauth2accesstoken --password-stdin https://$GAR_LOCATION-docker.pkg.dev

# Build the Docker image
- name: Build api
run: |-
cd shibuya && make api_image component=api
rsw-a marked this conversation as resolved.
Show resolved Hide resolved

- name: Build controller
run: |-
cd shibuya && make controller_image component=controller

- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "[email protected]"

- name: Install Helm
uses: azure/setup-helm@v3
iandyh marked this conversation as resolved.
Show resolved Hide resolved

- name: Run chart-releaser
uses: helm/[email protected]
rsw-a marked this conversation as resolved.
Show resolved Hide resolved
with:
charts_dir: shibuya/install
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ Collection is the unit where the actual tests are managed. Therefore, multiple t
Pre-requisites:
1. Kind (https://kind.sigs.k8s.io)
2. kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl)
3. Docker (https://docs.docker.com/install) *On OSX please increase your docker machine's spec or you may face performance issues*
3. Helm (https://helm.sh/docs/intro/install/)
4. Docker (https://docs.docker.com/install) *On OSX please increase your docker machine's spec or you may face performance issues*


Run `make` to start local cluster
Expand All @@ -30,6 +31,17 @@ Then you can go to http://localhost:8080 to check.

note: Local Shibuya does not have authentication. So you need to put `shibuya` as the ownership of the project. This is the same if you turn off authentication in the config file.

## Distributed mode(WIP)

In order to improve the scalibility of Shibuya, we are going to split the single Shibuya process into three components:

- apiserver
- controller.
- Engine metric streamer(Not existing yet)

By default, at locall, it will be run as non-distributed mode. You can enable to by set the `runtime.distributed_mode` to `true`.


### Production setup

Please read the makefile to understand what components are needed and how to set them up in detail.
Expand Down Expand Up @@ -66,4 +78,4 @@ Please read the makefile to understand what components are needed and how to set

- Adding more executor type support. For example, Gatling. Technically speaking, Shibuya can support any executor as long as the executor can provide real time metrics data in some way.
- Manage muliple contexts in one controller.
- Better Authentication
- Better Authentication
36 changes: 0 additions & 36 deletions kubernetes/shibuya.yaml

This file was deleted.

18 changes: 13 additions & 5 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ grafana: grafana/

.PHONY: shibuya
shibuya: shibuya/ kubernetes/
cp shibuya/config_tmpl.json shibuya/config.json
cd shibuya && sh build.sh
docker build -f shibuya/docker-local/Dockerfile --build-arg env=local -t shibuya:local shibuya
kind load docker-image shibuya:local --name shibuya
kubectl -n $(shibuya-controller-ns) replace -f kubernetes/shibuya.yaml --force
docker build -f shibuya/Dockerfile --build-arg env=local -t api:local shibuya
kind load docker-image api:local --name shibuya
helm uninstall shibuya || true
helm upgrade --install shibuya install/shibuya

.PHONY: jmeter
jmeter: shibuya/engines/jmeter
Expand Down Expand Up @@ -84,4 +84,12 @@ ingress-controller:
# if you need to debug the controller, please use the makefile in the ingress controller folder
# And update the image in the config.json
docker build -t shibuya:ingress-controller -f ingress-controller/Dockerfile ingress-controller
kind load docker-image shibuya:ingress-controller --name shibuya
kind load docker-image shibuya:ingress-controller --name shibuya

.PHONY: controller
controller:
cd shibuya && sh build.sh controller
docker build -f shibuya/Dockerfile --build-arg env=local --build-arg="binary_name=shibuya-controller" -t controller:local shibuya
kind load docker-image controller:local --name shibuya
helm uninstall shibuya || true
helm upgrade --install shibuya install/shibuya
2 changes: 2 additions & 0 deletions shibuya/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.tgz
shibuya-install/*
35 changes: 6 additions & 29 deletions shibuya/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,41 +1,18 @@
FROM gcr.io/shibuya-214807/golang:1.17-stretch AS builder
FROM ubuntu:18.04

RUN apt-get update && apt-get install -y curl
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl \
&& chmod +x ./kubectl \
&& mv ./kubectl /usr/local/bin/kubectl

WORKDIR /go/src/shibuya

ENV GO111MODULE on
ADD go.mod .
ADD go.sum .
RUN go mod download

COPY . /go/src/shibuya

RUN GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o /go/bin/shibuya

# Use only binaries from above image for running the app
FROM gcr.io/shibuya-214807/ubuntu:18.04

COPY --from=builder /go/bin/shibuya /usr/local/bin/shibuya
COPY --from=builder /usr/local/bin/kubectl /usr/local/bin/kubectl
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

RUN mkdir /auth
ADD ./shibuya-gcp.json /auth/shibuya-gcp.json
ARG binary_name=shibuya
ADD ./build/${binary_name} /usr/local/bin/${binary_name}

ENV GOOGLE_APPLICATION_CREDENTIALS /auth/shibuya-gcp.json

ARG env=local
ENV env ${env}
ARG lab_image=""
ENV lab_image ${lab_image}
ARG proxy=""
ENV http_proxy ${proxy}
ENV https_proxy ${proxy}

COPY config/kube_configs /root/.kube
COPY config.json /config.json
COPY ./ui/ /
CMD ["shibuya"]
ENV binary=${binary_name}
CMD ${binary}
27 changes: 27 additions & 0 deletions shibuya/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
registry=$(GAR_LOCATION)-docker.pkg.dev/$(GCP_PROJECT)
repository = shibuya
tag=$(GITHUB_SHA)
img=$(registry)/$(repository)/$(component):$(tag)

.PHONY: api_build
api_build:
sh build.sh

.PHONY: api_image
api_image: api_build
docker build -t $(img) -f Dockerfile .
docker push $(img)

.PHONY: controller_build
controller_build:
sh build.sh controller

.PHONY: controller_image
controller_image: controller_build
docker build -t $(img) -f Dockerfile --build-arg="binary_name=shibuya-controller" .
docker push $(img)

.PHONY: helm_charts
helm_charts:
helm create shibuya-install
helm package shibuya-install/
4 changes: 3 additions & 1 deletion shibuya/api/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,11 @@ type ShibuyaAPI struct {
}

func NewAPIServer() *ShibuyaAPI {
return &ShibuyaAPI{
c := &ShibuyaAPI{
ctr: controller.NewController(),
}
c.ctr.StartRunning()
return c
}

type JSONMessage struct {
Expand Down
2 changes: 2 additions & 0 deletions shibuya/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ go mod download
case "$target" in
"jmeter") GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o build/shibuya-agent $(pwd)/engines/jmeter
;;
"controller") GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o build/shibuya-controller $(pwd)/controller/cmd
;;
*)
GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o build/shibuya
esac
1 change: 1 addition & 0 deletions shibuya/config/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ var defaultIngressConfig = IngressConfig{
type ShibuyaConfig struct {
ProjectHome string `json:"project_home"`
UploadFileHelp string `json:"upload_file_help"`
DistributedMode bool `json:"distributed_mode"`
DBConf *MySQLConfig `json:"db"`
ExecutorConfig *ExecutorConfig `json:"executors"`
DashboardConfig *DashboardConfig `json:"dashboard"`
Expand Down
14 changes: 14 additions & 0 deletions shibuya/controller/cmd/main.go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought there would be no data integrity issues caused by race condition based on this separation, tho. coz all processing is about to wipe out unnecessary resources. do you have any concerns in terms of multi-threaded processing on the controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that depends on how to define mutli-threaded support.
If we look at the whole system(controller, api, etc), I don't have concern as this will be the same as before just process/threads are running either as a whole(multi-thread) or separately(multi process). The underlying logic does not change.
If we look at just the controller, it should not have multiple threads/process because essentially it's just a checker with a forever loop and getting the states from either the DB or a k8s cluster. If we have another checker(by using multi-threading), we need to implementing something like worker queue to split the work, which right now it does not exist. So having another checker will essentially doing the same work and it could be problematic so we should avoid that.
In terms of the scalability of the controller, I don't have great concern for it for now because I expect it to be a light weight component(most of its time is likely spending on IO operations anyway)
Hopefully I answered your questions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood. and i'm 80% sure what you've described above. but not 100%. so, it's better to visualize all operations which controller and api have provided (especially goroutines). otherwise it's a bit difficult to see entire picture. I'm about to do that on the confluence page, tho. it's a bit time-consuming and will take some time until illustrating everything. I'll make a review request to you piece by piece.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
package main

import (
"github.com/rakutentech/shibuya/shibuya/controller"
log "github.com/sirupsen/logrus"
)

// This func keep tracks of all the running engines. They should just rely on the data in the db
// and make necessary queries to the scheduler.
func main() {
log.Info("Controller is running in distributed mode")
controller := controller.NewController()
controller.IsolateBackgroundTasks()
}
6 changes: 3 additions & 3 deletions shibuya/controller/garbage.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
log "github.com/sirupsen/logrus"
)

func (c *Controller) checkRunningThenTerminate() {
func (c *Controller) CheckRunningThenTerminate() {
jobs := make(chan *RunningPlan)
for w := 1; w <= 3; w++ {
go func(jobs <-chan *RunningPlan) {
Expand Down Expand Up @@ -107,7 +107,7 @@ func isCollectionStale(rh *model.RunHistory, launchTime time.Time) (bool, error)
return true, nil
}

func (c *Controller) autoPurgeDeployments() {
func (c *Controller) AutoPurgeDeployments() {
for {
deployedCollections, err := c.Scheduler.GetDeployedCollections()
if err != nil {
Expand Down Expand Up @@ -151,7 +151,7 @@ func (c *Controller) autoPurgeDeployments() {
// Last time used is defined as:
// 1. If none of the collections has a run, it will be the last launch time of the engines of a collection
// 2. If any of the collection has a run, it will be the end time of that run
func (c *Controller) autoPurgeProjectIngressController() {
func (c *Controller) AutoPurgeProjectIngressController() {
projectLastUsedTime := make(map[int64]time.Time)
ingressLifespan, err := time.ParseDuration(config.SC.IngressConfig.Lifespan)
if err != nil {
Expand Down
33 changes: 22 additions & 11 deletions shibuya/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,6 @@ func NewController() *Controller {
}
c.schedulerKind = config.SC.ExecutorConfig.Cluster.Kind
c.Scheduler = scheduler.NewEngineScheduler(config.SC.ExecutorConfig.Cluster)

// First we do is to resume the running plans
// This method should not be moved as later goroutines rely on it.
c.resumeRunningPlans()
go c.streamToApi()
go c.readConnectedEngines()
go c.checkRunningThenTerminate()
go c.fetchEngineMetrics()
go c.cleanLocalStore()
go c.autoPurgeDeployments()
go c.autoPurgeProjectIngressController()
return c
}

Expand All @@ -70,6 +59,28 @@ type ApiMetricStreamEvent struct {
PlanID string `json:"plan_id"`
}

func (c *Controller) StartRunning() {
// First we do is to resume the running plans
// This method should not be moved as later goroutines rely on it.
c.resumeRunningPlans()
go c.streamToApi()
go c.readConnectedEngines()
go c.fetchEngineMetrics()
go c.cleanLocalStore()
if !config.SC.DistributedMode {
log.Info("Controller is running in non-distributed mode!")
go c.IsolateBackgroundTasks()
}
}

// In distributed mode, the func will be running as a standalone process
// In non-distributed mode, the func will be run as a goroutine.
func (c *Controller) IsolateBackgroundTasks() {
c.CheckRunningThenTerminate()
go c.AutoPurgeDeployments()
go c.AutoPurgeProjectIngressController()
}

func (c *Controller) streamToApi() {
for {
select {
Expand Down
Loading