Add scaleup step size feature #132

kazuki-hanai · 2023-11-02T05:29:31Z

What this PR does

Add scaleupStepSize and scaleUpInterval field to control scale up feature
- scaleUpStepSize: The minimum value by which the scaling will increase in one step.
- scaleUpInterval: The time interval at which the scaling up will occur.

Why we need it

In a scaling up situation, the rapid growth of processing units (PU) can cause an increase in latency for Spanner by rearrangement of splits. To prevent this issue, we propose a controlling method for scaling up the PUs.

Which issue(s) this PR fixes

api/v1beta1/spannerautoscaler_types.go

docs/crd-reference.md

Co-authored-by: Junichiro Takagi <[email protected]>

kazuki-hanai · 2023-11-10T07:00:05Z

@tjun Thanks for corrections!

tkuchiki · 2023-11-15T06:37:17Z

api/v1beta1/spannerautoscaler_types.go

@@ -108,6 +108,14 @@ type ScaleConfig struct {
 	// The cool down period between two consecutive scaledown operations. If this option is omitted, the value of the `--scale-down-interval` command line option is taken as the default value.
 	ScaledownInterval *metav1.Duration `json:"scaledownInterval,omitempty"`

+	// The maximum number of processing units which can be added in one scale-up operation. It can be a multiple of 100 for values < 1000, or a multiple of 1000 otherwise.
+	// +kubebuilder:default=2000
+	ScaleupStepSize int `json:"scaleupStepSize,omitempty"`


For compatibility, could you please make the default value zero and allow unlimited scaling?

I fixed it to zero and if scaleupStepSize = 0, I made no scale up size limitation.
https://github.com/mercari/spanner-autoscaler/pull/132/files#diff-1984e42ef0a0d5e7edf140b3ae6decf3fd6e1b9bd4ee351d4dcd8cf7213bb537R569-R577

tkuchiki · 2023-11-15T06:37:44Z

api/v1beta1/spannerautoscaler_types.go

+
+	// How often autoscaler is reevaluated for scale up.
+	// The warm up period between two consecutive scaleup operations. If this option is omitted, the value of the `--scale-up-interval` command line option is taken as the default value.
+	ScaleupInterval *metav1.Duration `json:"scaleupInterval,omitempty"`


Same as ScaleupStepSize

I fixed it to 10 seconds because default interval for scaleup was set to 10 seconds before introducing scaleup feature.
https://github.com/mercari/spanner-autoscaler/pull/132/files#diff-1984e42ef0a0d5e7edf140b3ae6decf3fd6e1b9bd4ee351d4dcd8cf7213bb537L488

api/v1beta1/spannerautoscaler_webhook.go

tkuchiki · 2023-11-15T06:39:53Z

cmd/main.go

@@ -60,6 +60,7 @@ var (
 	enableLeaderElection = flag.Bool("leader-elect", false, "Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
 	leaderElectionID     = flag.String("leader-elect-id", "", "Lease name for leader election.")
 	scaleDownInterval    = flag.Duration("scale-down-interval", 55*time.Minute, "The scale down interval.")
+	scaleUpInterval      = flag.Duration("scale-up-interval", 55*time.Minute, "The scale up interval.")


Same as the comment of ScaleupStepSize #132 (comment)

Co-authored-by: tkuchiki <[email protected]>

cmd/main.go

Co-authored-by: tkuchiki <[email protected]>

tkuchiki

I tested the following manifests. The behavior seems to be fine.

Set scalupStepSize and scaleupInterval

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaler
metadata:
  labels:
    app.kubernetes.io/name: spannerautoscaler
    app.kubernetes.io/instance: test-instance
    app.kubernetes.io/part-of: spanner-autoscaler
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/created-by: spanner-autoscaler
  name: spannerautoscaler-sample-beta
  namespace: test
spec:
  targetInstance:
    projectId: test-project
    instanceId: test-instance
  authentication:
    iamKeySecret:
      namespace: test
      name: spanner-autoscaler-gcp-sa
      key: service-account
  scaleConfig:
    processingUnits:
      min: 100
      max: 1000
    targetCPUUtilization:
      highPriority: 30
    scaledownStepSize: 200
    scaledownInterval: 3m
    scaleupStepSize: 100
    scaleupInterval: 3m

Set scaleupInterval

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaler
metadata:
  labels:
    app.kubernetes.io/name: spannerautoscaler
    app.kubernetes.io/instance: test-instance
    app.kubernetes.io/part-of: spanner-autoscaler
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/created-by: spanner-autoscaler
  name: spannerautoscaler-sample-beta
  namespace: test
spec:
  targetInstance:
    projectId: test-project
    instanceId: test-instance
  authentication:
    iamKeySecret:
      namespace: test
      name: spanner-autoscaler-gcp-sa
      key: service-account
  scaleConfig:
    processingUnits:
      min: 100
      max: 10000
    targetCPUUtilization:
      highPriority: 30
    scaledownStepSize: 1000
    scaledownInterval: 3m
    scaleupInterval: 3m

cmd/main.go

Co-authored-by: tkuchiki <[email protected]>

kazuki-hanai · 2023-12-05T05:35:54Z

@tkuchiki Thank you for reviewing. I checked your comments and suggestions and responded those.

internal/controller/spannerautoscaler_controller_test.go

Co-authored-by: Junichiro Takagi <[email protected]>

tkuchiki

LGTM

internal/controller/spannerautoscaler_controller.go

rustycl0ck · 2023-12-11T07:13:37Z

internal/controller/spannerautoscaler_controller.go

@@ -485,7 +500,7 @@ func (r *SpannerAutoscalerReconciler) needUpdateProcessingUnits(log logr.Logger,
 		log.Info("no need to scale", "currentPU", currentProcessingUnits, "currentCPU", sa.Status.CurrentHighPriorityCPUUtilization)
 		return false

-	case desiredProcessingUnits > currentProcessingUnits && r.clock.Now().Before(sa.Status.LastScaleTime.Time.Add(10*time.Second)):
+	case desiredProcessingUnits > currentProcessingUnits && r.clock.Now().Before(sa.Status.LastScaleTime.Time.Add(getOrConvertTimeDuration(sa.Spec.ScaleConfig.ScaleupInterval, r.scaleUpInterval))):


If the scaleUpInterval is zero, then I think it might try to update Spanner instance back to back without allowing any time for Google Cloud to actually perform the scale-up operation. Did it work fine during real scale-up testing?

@rustycl0ck
I think it's better to set default scaleupInterval as 10 seconds because default interval was 10 seconds before this PR. How do you think?

Setting default scaleUpInterval as less than 60 doesn't work anything because reconcile interval is 60 seconds. So I set the default value as 60 seconds.

rustycl0ck · 2023-12-11T07:17:22Z

internal/controller/spannerautoscaler_controller.go

+	if suStepSize < 1000 && sa.Status.CurrentProcessingUnits+suStepSize > 1000 {
+		suStepSize = 1000
+	}


For suStepSize == 0 (which is the default), suStepSize will always be set to 1000 if the current size of spanner instance is more than 1000 PUs. Is this expected?

internal/controller/spannerautoscaler_controller_test.go

rustycl0ck · 2023-12-11T07:40:00Z

internal/controller/spannerautoscaler_controller_test.go

@@ -146,7 +148,7 @@ var _ = Describe("Check Update Nodes", func() {
 		Expect(got).To(BeFalse())
 	})

-	It("needs to scale up nodes because cooldown interval is only applied to scale down operations", func() {
+	It("does not need to scale up nodes because enough time has not elapsed since last update", func() {


This test case should have 4 condition checks:

scales down when outside scaleDownInterval

does not scale down when inside scaleDownInterval

scales up when outside scaleUpInterval

does not scale up when inside scaleUpInterval

Current PR reduces it to only check conditions number 2 and 4

Thank you for reviewing.
I fixed it in a85f59e

rustycl0ck · 2023-12-11T07:49:21Z

internal/controller/spannerautoscaler_controller_test.go

+	Entry("should not scale", 25, 200, 30, 100, 1000, 2000, 2000, 200),
+	Entry("should scale up 1", 50, 300, 30, 100, 1000, 2000, 2000, 600),
+	Entry("should scale up 2", 50, 1000, 30, 1000, 10000, 2000, 2000, 2000),
+	Entry("should scale up 3", 50, 900, 40, 100, 5000, 2000, 2000, 2000),
+	Entry("should scale down 1", 30, 500, 50, 100, 10000, 2000, 2000, 400),
+	Entry("should scale down 2", 30, 5000, 50, 1000, 10000, 2000, 2000, 4000),
+	Entry("should scale down 3", 25, 1000, 65, 300, 10000, 800, 2000, 400),
+	Entry("should scale down 4", 25, 800, 65, 300, 10000, 800, 2000, 400),
+	Entry("should scale down 5", 25, 700, 65, 300, 10000, 800, 2000, 300),
+	Entry("should scale up to max PUs 1", 50, 300, 30, 100, 400, 2000, 2000, 400),
+	Entry("should scale up to max PUs 2", 50, 3000, 30, 1000, 4000, 2000, 2000, 4000),
+	Entry("should scale down to min PUs 1", 0, 500, 50, 100, 1000, 2000, 2000, 100),
+	Entry("should scale down to min PUs 2", 0, 5000, 50, 1000, 10000, 5000, 2000, 1000),
+	Entry("should scale down to min PUs 3", 0, 5000, 50, 100, 10000, 5000, 2000, 100),
+	Entry("should scale down with ScaledownStepSize 1", 30, 10000, 50, 5000, 10000, 2000, 2000, 8000),
+	Entry("should scale down with ScaledownStepSize 2", 30, 10000, 50, 5000, 12000, 200, 2000, 9000),
+	Entry("should scale down with ScaledownStepSize 3", 30, 10000, 50, 5000, 12000, 100, 2000, 9000),
+	Entry("should scale down with ScaledownStepSize 4", 30, 10000, 50, 5000, 12000, 900, 2000, 9000),
+	Entry("should scale down with ScaledownStepSize 5", 25, 2000, 65, 300, 10000, 500, 2000, 1000),
+	Entry("should scale down with ScaledownStepSize 6", 25, 2000, 65, 300, 10000, 800, 2000, 1000),
+	Entry("should scale down with ScaledownStepSize 7", 25, 1000, 65, 300, 10000, 500, 2000, 500),
+	Entry("should scale down with ScaledownStepSize 8", 20, 800, 75, 300, 10000, 200, 2000, 600),
+	Entry("should scale down with ScaledownStepSize 9", 25, 2000, 50, 300, 10000, 500, 2000, 2000),
+	Entry("should scale down with ScaledownStepSize 10", 25, 2000, 50, 300, 10000, 1000, 2000, 2000),
+	Entry("should scale down with ScaledownStepSize 11", 25, 2000, 70, 300, 10000, 400, 2000, 1000),
+	Entry("should scale down with ScaledownStepSize 12", 25, 1000, 70, 300, 10000, 400, 2000, 600),
+	Entry("should scale down with ScaledownStepSize 13", 20, 2000, 50, 300, 10000, 200, 2000, 1000),
+	Entry("should scale down with ScaledownStepSize 14", 20, 2000, 75, 300, 10000, 200, 2000, 1000),


In these old test cases, the default value of scaleupStepSize has been set to 2000, but the actual default value is 0. So when this value is not being tested for any effect, it should be set to the default value.

I fixed this in c4ef133

rustycl0ck · 2023-12-11T08:00:33Z

internal/controller/spannerautoscaler_controller_test.go

+var _ = Describe("Get and overwrite scaleup interval", func() {
+	var testReconciler *SpannerAutoscalerReconciler
+	controllerScaleUpInterval := 55 * time.Minute
+
+	BeforeEach(func() {
+		By("Creating a test reconciler")
+		testReconciler = &SpannerAutoscalerReconciler{
+			scaleUpInterval: controllerScaleUpInterval,
+			clock:           testingclock.NewFakeClock(fakeTime),
+			log:             logr.Discard(),
+		}
+	})
+
+	It("should get controller default scaleup interval", func() {
+		want := controllerScaleUpInterval
+		sa := &spannerv1beta1.SpannerAutoscaler{
+			Status: spannerv1beta1.SpannerAutoscalerStatus{
+				LastScaleTime:          metav1.Time{Time: fakeTime.Add(-time.Minute)},
+				CurrentProcessingUnits: 1000,
+				DesiredProcessingUnits: 2000,
+				InstanceState:          spannerv1beta1.InstanceStateReady,
+			},
+			Spec: spannerv1beta1.SpannerAutoscalerSpec{
+				ScaleConfig: spannerv1beta1.ScaleConfig{},
+			},
+		}
+		got := getOrConvertTimeDuration(sa.Spec.ScaleConfig.ScaleupInterval, testReconciler.scaleUpInterval)
+		Expect(got).To(Equal(want))
+	})
+
+	It("should override default scaleup interval with custom SpannerAutoscaler configuration value", func() {
+		want := 20 * time.Minute
+		scaleupInterval := metav1.Duration{
+			Duration: want,
+		}
+		sa := &spannerv1beta1.SpannerAutoscaler{
+			Status: spannerv1beta1.SpannerAutoscalerStatus{
+				LastScaleTime:          metav1.Time{Time: fakeTime.Add(-time.Minute)},
+				CurrentProcessingUnits: 2000,
+				DesiredProcessingUnits: 1000,
+				InstanceState:          spannerv1beta1.InstanceStateReady,
+			},
+			Spec: spannerv1beta1.SpannerAutoscalerSpec{
+				ScaleConfig: spannerv1beta1.ScaleConfig{
+					ScaleupInterval: &scaleupInterval,
+				},
+			},
+		}
+
+		got := getOrConvertTimeDuration(sa.Spec.ScaleConfig.ScaleupInterval, testReconciler.scaleUpInterval)
+		Expect(got).To(Equal(want))
+	})
+})


This test case just seems to be checking the getOrConvertTieDuration function response. If so, maybe we can have a simpler test case just for that function, which doesn't need any reconciler or SpannerAutoscaler resources.

I made it simpler in b453448

Co-authored-by: Ravi <[email protected]>

kazuki-hanai · 2023-12-20T05:59:03Z

@tkuchiki @tjun @rustycl0ck I responded all reviews you all requested. Please review it again 🙇‍♂️

rustycl0ck

Except the currentCPU values above 100 in all test cases, rest all looks ok.

internal/controller/spannerautoscaler_controller_test.go

rustycl0ck · 2023-12-25T04:30:46Z

internal/controller/spannerautoscaler_controller_test.go

+	Entry("should scale up with ScaleupStepSize when currentPU is equal to zero", 150, 100, 10, 100, 10000, 2000, 0, 2000),
+	Entry("should scale up with ScaleupStepSize when currentPU is lower than 1000", 100, 100, 10, 100, 10000, 2000, 700, 800),
+	Entry("should scale up with ScaleupStepSize when currentPU is lower than 1000", 100, 300, 10, 100, 10000, 2000, 700, 1000),
+	Entry("should scale up with ScaleupStepSize when currentPU is lower than 1000", 100, 800, 10, 100, 10000, 2000, 700, 2000),
+	Entry("should scale up with ScaleupStepSize when currentPU is lower than 1000", 300, 800, 100, 100, 10000, 2000, 2000, 3000),
+	Entry("should scale up with ScaleupStepSize when currentPU is equal to 1000", 300, 1000, 100, 100, 10000, 2000, 700, 2000),
+	Entry("should scale up with ScaleupStepSize when currentPU is equal to 1000", 400, 1000, 100, 100, 10000, 2000, 3000, 4000),
+	Entry("should scale up with ScaleupStepSize when currentPU is equal to 1000", 400, 1000, 100, 100, 10000, 2000, 4000, 5000),
+	Entry("should scale up with ScaleupStepSize when currentPU is more than 1000", 200, 2000, 100, 100, 10000, 2000, 700, 3000),
+	Entry("should scale up with ScaleupStepSize when currentPU is more than 1000", 200, 2000, 100, 100, 10000, 2000, 2000, 4000),
+	Entry("should scale up with ScaleupStepSize when currentPU is more than 1000", 200, 2000, 100, 100, 10000, 2000, 3000, 5000),


In all of these test cases the currentCPU value is 100, 150, 200, 300 or 400. But real CPU usage will probably be capped at 100%. It would be better to use realistic CPU usage values between 0-100% for these test cases to be useful.

[nit] Since multiple entries have same name, if a test case actually fails, then it is not possible to know which values caused the failure. So either make the test case name string unique with more description, or add iterator numbers to differentiate between identical test case names.

I fixed currentCPU value in 2117743

cmd/main.go

Co-authored-by: Ravi <[email protected]>

kazuki-hanai added 3 commits October 25, 2023 16:13

Replace Duration document because of an expired link

162054e

Add scaleup parameters to crd-reference

01b621d

[WIP] Implement scaleup

b193983

github-actions bot added the size/M label Nov 2, 2023

kazuki-hanai added 2 commits November 2, 2023 14:45

fmt

14a6477

Add scaleup interval test

4a07c8e

github-actions bot added size/L and removed size/M labels Nov 9, 2023

kazuki-hanai added 2 commits November 9, 2023 17:49

fix

172fb2e

add generated deepcopy

a5577e3

kazuki-hanai marked this pull request as ready for review November 9, 2023 08:52

kazuki-hanai requested a review from a team as a code owner November 9, 2023 08:52

kazuki-hanai changed the title ~~[WIP] Add scaleup function~~ Add scaleup function Nov 10, 2023

fix

43e5d1b

tjun reviewed Nov 10, 2023

View reviewed changes

api/v1beta1/spannerautoscaler_types.go Outdated Show resolved Hide resolved

docs/crd-reference.md Outdated Show resolved Hide resolved

kazuki-hanai and others added 2 commits November 10, 2023 15:59

Update api/v1beta1/spannerautoscaler_types.go

a065979

Co-authored-by: Junichiro Takagi <[email protected]>

Update docs/crd-reference.md

e78d4af

Co-authored-by: Junichiro Takagi <[email protected]>

tkuchiki reviewed Nov 15, 2023

View reviewed changes

kazuki-hanai and others added 4 commits November 16, 2023 12:37

Update api/v1beta1/spannerautoscaler_webhook.go

3aa0a50

Co-authored-by: tkuchiki <[email protected]>

Fix scaleup when default value

7f9f2b7

Fix scaleupInterval logic

9b33ac6

Fix scaleupInterval default

dd009a9

kazuki-hanai requested a review from tkuchiki November 28, 2023 05:54

tkuchiki reviewed Nov 28, 2023

View reviewed changes

cmd/main.go Outdated Show resolved Hide resolved

Update cmd/main.go

15972a1

Co-authored-by: tkuchiki <[email protected]>

tkuchiki reviewed Nov 29, 2023

View reviewed changes

tkuchiki reviewed Nov 30, 2023

View reviewed changes

cmd/main.go Outdated Show resolved Hide resolved

Update cmd/main.go

8434469

Co-authored-by: tkuchiki <[email protected]>

tkuchiki requested a review from rustycl0ck December 6, 2023 03:25

tjun reviewed Dec 6, 2023

View reviewed changes

internal/controller/spannerautoscaler_controller_test.go Outdated Show resolved Hide resolved

Update internal/controller/spannerautoscaler_controller_test.go

09f507a

Co-authored-by: Junichiro Takagi <[email protected]>

tkuchiki previously approved these changes Dec 7, 2023

View reviewed changes

rustycl0ck reviewed Dec 11, 2023

View reviewed changes

Update internal/controller/spannerautoscaler_controller.go

0d848d8

Co-authored-by: Ravi <[email protected]>

kazuki-hanai dismissed tkuchiki’s stale review via 0d848d8 December 11, 2023 10:52

kazuki-hanai and others added 8 commits December 11, 2023 19:54

Update internal/controller/spannerautoscaler_controller_test.go

44fca5c

Co-authored-by: Ravi <[email protected]>

Set default ScaleUpInterval as 10 seconds

87783d8

Fix small

e7f95ab

Set default ScaleUpInterval as 60 seconds

ae47fad

Fix suStepSize validation

46809d5

Fix test if scaledown and scaleup intervals function works

a85f59e

Fix default value of scaleupStepSize in scaledownStepSize test

c4ef133

Make test for getOrConvertTimeDuration simple

b453448

kazuki-hanai force-pushed the add-scaleup branch from da07818 to b453448 Compare December 20, 2023 05:57

kazuki-hanai requested review from tjun, rustycl0ck and tkuchiki December 20, 2023 05:57

rustycl0ck reviewed Dec 25, 2023

View reviewed changes

kazuki-hanai and others added 3 commits December 25, 2023 14:14

Remove no required setup block

19ad040

Update cmd/main.go

bb399c5

Co-authored-by: Ravi <[email protected]>

Fix ScaleupStepSize test

2117743

kazuki-hanai requested a review from rustycl0ck December 26, 2023 05:26

rustycl0ck approved these changes Dec 28, 2023

View reviewed changes

Merge branch 'master' into add-scaleup

8adf381

tkuchiki approved these changes Jan 9, 2024

View reviewed changes

tkuchiki merged commit 82ac424 into mercari:master Jan 9, 2024
7 checks passed

tkuchiki added the release/minor Indicates minor update to action-release-label label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scaleup step size feature #132

Add scaleup step size feature #132

kazuki-hanai commented Nov 2, 2023 •

edited

Loading

kazuki-hanai commented Nov 10, 2023

tkuchiki Nov 15, 2023 •

edited

Loading

kazuki-hanai Nov 28, 2023

tkuchiki Nov 15, 2023

kazuki-hanai Nov 28, 2023

tkuchiki Nov 15, 2023 •

edited

Loading

tkuchiki left a comment

kazuki-hanai commented Dec 5, 2023

tkuchiki left a comment

rustycl0ck Dec 11, 2023

kazuki-hanai Dec 18, 2023 •

edited

Loading

kazuki-hanai Dec 18, 2023

rustycl0ck Dec 11, 2023

rustycl0ck Dec 11, 2023

kazuki-hanai Dec 20, 2023

rustycl0ck Dec 11, 2023

kazuki-hanai Dec 20, 2023

rustycl0ck Dec 11, 2023

kazuki-hanai Dec 20, 2023

kazuki-hanai commented Dec 20, 2023

rustycl0ck left a comment

rustycl0ck Dec 25, 2023

kazuki-hanai Dec 26, 2023

Add scaleup step size feature #132

Add scaleup step size feature #132

Conversation

kazuki-hanai commented Nov 2, 2023 • edited Loading

What this PR does

Why we need it

Which issue(s) this PR fixes

kazuki-hanai commented Nov 10, 2023

tkuchiki Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkuchiki Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

tkuchiki left a comment

Choose a reason for hiding this comment

Set scalupStepSize and scaleupInterval

Set scaleupInterval

kazuki-hanai commented Dec 5, 2023

tkuchiki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazuki-hanai Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazuki-hanai commented Dec 20, 2023

rustycl0ck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazuki-hanai commented Nov 2, 2023 •

edited

Loading

tkuchiki Nov 15, 2023 •

edited

Loading

tkuchiki Nov 15, 2023 •

edited

Loading

kazuki-hanai Dec 18, 2023 •

edited

Loading