Horizontal Scaling #159

inz · 2016-10-10T08:31:17Z

Adds a new optimization model for horizontal scaling that replaces the original MiniZinc model.
~~It should replace the original MiniZinc model, but currently the original functionality is not yet mirrored.~~ (Closes #156. Eventually.)

While I initially wanted to have an unconstrained model that chooses the optimal combination and number of resources from all available instance types, I had to simplify the model to restrict recommendations per ingredient to one resource type and a multiplier (num_resources) representing the number of required instances. The original, unconstrained model is in the commit history for provenance. The unconstrained model unfortunately did not produce results in reasonable time, as the number of possible resource assignments grows unreasonably large for complex applications and large numbers of users.

In addition to the new MiniZinc model, we now also represent partial CPU cores as reported by the providers (e.g., for the t2.* Amazon bursting instances and for Google f1- and g1-). To accommodate this, CPU cores are represented as 1/100ths in the model.

The output of the optimization is compatible with the original model, with the addition of a num_resources array that represents the number of instances required to fulfill the ingredient constraints.

Things left to do/discuss

Currently, the model will only show recommendations for horizontal scaling. As discussed in Horizontal scaling #156, we might want to let users select if they want vertical or horizontal scaling recommendations. For this, I see several possibilities:
- a) At recommendation generation time (i.e., when hitting 'generate recommendation'), we add a 'vertical/horizontal' checkbox. This would restore the original functionality and allow for the generation of purely vertically scaled recommendations. (but: While I think that these vertical scaling recommendations are educational to clearly see savings potential, I'm not sure we want to prominently show recommendations that basically consist of SPOFs for all ingredients).
- b) We add a 'scale horizontally' flag to CPU and RAM workloads. With this, we can mirror Cloudorado functionality. For each ingredient, we would compute constraints as before, but add array[Ingredients] of bool: distribute_cpu; and array[Ingredients] of bool: distribute_ram; to the optimization model. We update the model to restrict recommendations to resources that fulfill CPU and/or RAM constraints for ingredients with distribute_[cpu|ram] == false. With this approach, we could model ingredients that need, e.g., at least 1 GB of RAM for every instance and scale according to computed CPU constraints (also need to set RAM growth per user to 0 then, otherwise RAM requirement continues to grow).
- c) We add more complex scaling policies, e.g., minimum and maximum number of instances per ingredient, minimum RAM/CPU per instance. With this, we can model more complex deployments than currently possible with Cloudorado. We need to supply additional information to the model, such as array[Ingredients] of var int: min_num_instances;, array[Ingredients] of var int: max_num_instances;, array[Ingredients] of var int: min_cpu_per_instance;, array[Ingredients] of var int: min_ram_per_instance. Overall ingredient constraints are computed as before; the 'per instance' fields would not change with the number of users. This would then allow to model constraints, such as, 'a PostgreSQL ingredient should have ≥2 instances, ≤5 instances, 12MB RAM per user, 1/2500th CPU per user, min 512MB RAM per instance, min 1 vCPU per instance'.
For UI adjustments to request and show horizontally scaled recommendations, see Horizontal Scaling cloud-stove-ui#66.

Implementation summary

A new MiniZinc model to generate horizontal scaling recommendations. Will find one resource type per ingredient, but with an additional number of instances, e.g. 4x n1-standard-2.
We now have scaling_constraints attached to ingredients with a max_number_of_instances attribute to restrict the number of allowed instances per resource for generated recommendations (with 0 for no restriction).
A new 'scaling_workload` attached to an ingredient lets users decide between vertical and horizontal scaling per ingredient.
Provided partial CPU cores reported by providers are now considered in the model (this change, while not closely related to horizontal scaling, came with the new MiniZinc recommendation model)

joe4dev · 2016-10-10T12:23:28Z

👍👍 thanks

Considering partial CPUs makes totally sense to improve fairness
a) I think that a global switch doesn't make so much sense. The ability to scale out is rather component (i.e., ingredient) based.
b) This makes most sense for me. Some ingredients such as Web servers are eligible for horizontal scaling whereas for other ingredients such as DBs vertical scaling might be more appropriate. Why do we need to set the "RAM growth per user to 0"? Doesn't the workload model yield the required amount of RAM (based on min + growth) which subsequently can be used to calculate the distribution splitting (considering min including the number of users it can serve + growth)? It doesn't seem trivial to find the optimum though 🤔 (kinda knapsack problem we discussed earlier on)
c) At the current state, I don't think we gain much by demanding even more input from the user (especially regarding number of instance boundaries). However, I clearly see the benefit of min_cpu_per_instance and min_ram_per_instance😏 Can't we "reuse" the minimum values from the workloads for this purpose? 🤔

inz · 2016-10-10T13:07:57Z

a) agreed.
b) I think we have a slight misunderstanding here. If I understand correctly, you are thinking of a 'scale horizontally' flag per ingredient (which we'll call b.1), whereas I was thinking of one flag per workload, similar to Cloudorado, i.e., a 'scale CPU horizontally' and a 'scale RAM horizontally' flag.
- b.1) I like that the UI with one 'scale horizontally' flag per ingredient would be simpler than with a more complex variant where distribution is decided per-workload. We could implement this with one additional parameter for the model, e.g., array[Ingredients] of int: max_num_instances where we set the maximum number of allowed instances to 0 for no restrictions and 1 for vertical scaling. This way, we could later introduce more complex scaling rules (i.e., max number of allowed instances) without changing the model.
- b.2) My thinking here was the following (of course heavily inspired by Cloudorado): If we have separate scaling (Cloudorado calls it 'distribute') flags for CPU and RAM, then for an ingredient that has 'no distribute' for RAM and 'distribute' for CPU, we would require that each of the chosen resources by itself fulfills the RAM requirement, but the aggregate CPU requirement can be fulfilled by multiple resources (e.g., min 2 GB RAM, no distribute, 1 CPU per 1000 users, would mean that only resources with ≥2 GB RAM are eligible, but e.g. for 10k users we can get the 10 CPUs from multiple resources (with each ≥2 GB RAM)). Hence, RAM growth per user should be 0, otherwise the instances would have to get bigger and bigger with more users, but we basically just wanted to specify that 'min RAM per instance' should be 2 GB.
c) I agree, we should keep additional user input to a minimum. Ad 'reusing' the minimum values from the workloads: Yes, see b.2 above.

inz · 2016-10-10T14:39:13Z

Ad b.1) One thing we should consider then, however, is that for e.g. a DB master, which we only scale vertically, we should then add a separate ingredient for horizontally scaled DB slaves.

Overall, I think this approach (b.1) is very reasonable. If you agree I will go ahead and implement the changes in the model and the UI. We can implement more complex scaling scenarios (b.2, c) later if necessary.

joe4dev · 2016-10-10T15:34:29Z

b1) If it doesn't slow down MiniZinc, the "maximum number of allowed instances" approach sounds fine 👍
b2) Sounds reasonable. What I meant by considering growth would be the (non-trivial) optimization that one take the number of additional users an instance can serve into account to find an optimal distribution. Example: Distributing 12GB using 6* 2GB instances (min RAM) might be less optimal than 4* 3GB instances. To obtain how many users can be served with the additional 1GB, one would need the RAM growth, right?

[DB master-slave] Agree, that special case would need some extra treatment.

Yes, that's fine for me 👍👍
If you have no further comment on PR #145, you could merge. I can test on staging then and migrate production too. Then I can run the provider updaters safely to resolve #160.
Afterwards: What do you think I work on next then?

inz · 2016-10-12T11:49:40Z

A very interesting error on Wercker:

 test_hierarchical_region_constraint#DeploymentRecommendationTest (19.96s)
        --- expected
        +++ actual
        @@ -1 +1 @@
        -[2577369412, 2577369412, 4116750498]
        +[4116750498, 2577369412, 2577369412]
        test/models/deployment_recommendation_test.rb:74:in `block in <class:DeploymentRecommendationTest>'

I am moderately confused. Array#collect should execute in order. For the tested recommendation, the first two resources are from Azure and the last is from Google. So the 4116750498 region at the beginning doesn't make sense. Also the 2577369412 region for the third resource doesn't make sense.

joe4dev · 2016-10-12T11:53:21Z

Mhmmm 🙃
Is the error reproducible? Have you tried to retry the build and gotten the same outcome?

inz · 2016-10-12T12:02:33Z

Hm.. no. Apparently it's kind of reproducible. The second wercker build succeeded.

Locally, it works fine with guard but I just ran a local wercker build and it failed with the same message. When I run the local wercker build again it fails again. I suspect because it is always rebuilding the complete container on wercker build but not when I hit 'retry' in the web app. 😕

joe4dev · 2016-10-12T12:27:23Z

My local Wercker build finally (after ages) passed on first try 🙄:

inz · 2016-10-12T12:29:59Z

Hmm. Interesting. Well I guess then we blame Wercker and assume that it'll work...

joe4dev · 2016-10-12T14:29:44Z

Tried another time locally after deleting ~/.wercker, pass again.
After merging and rebasing on #162, #165, and #166 , we'll see what happens with the build then

inz · 2016-10-12T14:48:04Z

Agreed. I'll finish the UI tomorrow, then we can rebase and roll it out.

On Wednesday, October 12, 2016, Joel Scheuner [email protected]
wrote:

Tried another time locally after deleting ~/.wercker, pass again.
After merging and rebasing on #162
#162, #165
#165, and #166
#166 , we'll see what happens
with the build then

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#159 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAXnTSBoevYy-0sr8IAO31bWiXPJKgeks5qzO7ZgaJpZM4KSVDt
.

inz · 2016-10-13T09:14:30Z

This PR is now ready for review.

inz · 2016-10-14T09:21:11Z

I drafted a blog post to introduce the scaling policies: https://medium.com/cloud-stove/e1410f816b73

Seems to work well when I disable inter ingredient traffic, but is really slow if traffic is considered.

It will now parse as JSON object (with one empty ingredient at the top of the list).

It works fine for smallish ram and cpu constraints, but takes forever with big values (e.g., 100GB RAM).

Instead of finding the ideal combination among all resources per ingredient, we now search for the ideal number of resources from a single resource type to fulfill all constraints. With this simplification, recommendations are generated quickly.

The new model is a drop-in replacement for the original model, except for the additional ‘resource_count’ result.

Also, adjust model to make vCPUs integers again (i.e., vCPUs * 100).

Using `max_num_resources` you can limit the number of resources assigned to an ingredient. Set it to `0` for no restrictions. Currently, the model generated by the app does not yet contain a scaling constraint and just defaults to `0`.

The scaling constraint specifies the maximum number of instances allowed for any ingredient. The scaling workload is currently a boolean flag to indicate whether the ingredient should be scaled horizontally. The template seed ingredients already use the scaling constraint to keep the db master as a single instance with multiple horizontally scaled db slaves. To ease transition, `Ingredient#update_constraints` does not fail if no scaling workload exists. A default scaling workload that activates horizontal scaling is created instead (see `Ingredient#scaling_workload`).

inz · 2016-10-25T11:34:04Z

test/models/deployment_recommendation_test.rb

      'regions' => region_codes,
      'vm_cost' => '475.42',
-      'total_cost' => 475416
+      'total_cost' => 475419


total_cost changed because the number of different resources in the recommendation is added to the costs as a tie-breaker.

For providers like Digital Ocean, where instance costs and specs scale exactly linearly, we have multiple "optimal" solutions and by adding the number of assigned resources to the costs we prefer recommendations with fewer resources.

Good point! Agree to prefer fewer solutions with fewer resources 👍
We should keep that in mind if we want to use total_cost one day 😉

inz · 2016-10-31T13:41:50Z

This PR should be ready. Please review and merge.

/cc @joe4dev

Copying an application set every ingredient to use the horizontal scaling scheme instead of mirroring the scheme of the copy template.

User authentication is done by the application controller by default

joe4dev

In general it seems ok to me.

However, the Amazon recommendations for horizontal scaling do not behave the same as Google with regard to partial CPU cores. Google chooses f1-micro (5x) instances with 0.2 cores whereas Amazon always chooses m3.medium (1x), which is the smallest instance with a full CPU core.
It seems to me that Amazon instance with partial core count are not considered 😏

Comment about instance choice:
Obviously, horizontal scaling often chooses the weakest instances (e.g., f1-micro, basic-a0). This might sometimes not be suitable for production deployment.

joe4dev · 2016-11-02T14:53:30Z

app/controllers/scaling_workloads_controller.rb

@@ -0,0 +1,75 @@
+class ScalingWorkloadsController < ApplicationController
+
+  before_action :authenticate_user!


Not needed as already present in ApplicationController (rationale: protect every endpoint by default to avoid security breaches when adding new controllers)

Fixed in 770b13c

joe4dev · 2016-11-02T15:42:23Z

test/models/deployment_recommendation_test.rb

      'regions' => region_codes,
      'vm_cost' => '475.42',
-      'total_cost' => 475416
+      'total_cost' => 475419


Good point! Agree to prefer fewer solutions with fewer resources 👍
We should keep that in mind if we want to use total_cost one day 😉

inz · 2016-11-07T16:37:57Z

I'm not sure if there is a problem with the recommendations, but Amazon t2 instances have a pretty bad price per CPU, so maybe they are not part of recommendations because m3 is cheaper? And the Google f1-micro is actually cheaper per full CPU than n1 (it is more expensive per GB RAM though). So I guess the recommendations kind of make sense.

Yes, horizontal scaling will always choose the smallest instance. Currently, I guess recommendations should include other instance types if RAM requirements are high (since specialized instances should be cheaper per GB RAM) and when CPU requirements are very high (since specialized instances should be cheaper per core). We could of course add a minimum RAM/CPU threshold to the workload to prevent these instance types from showing up in recommendations.

Nevertheless, I would merge this PR now if there are no other issues.

joe4dev · 2016-11-14T10:12:52Z

Agree 👍

inz added the discussion label Oct 10, 2016

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 10, 2016 08:31 Inactive

inz mentioned this pull request Oct 10, 2016

Horizontal Scaling sealuzh/cloud-stove-ui#66

Merged

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 12, 2016 10:04 Inactive

inz force-pushed the feature/horizontal-scaling branch from 41950ec to c1c1e8c Compare October 12, 2016 11:43

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 12, 2016 11:43 Inactive

joe4dev mentioned this pull request Oct 12, 2016

Update sqlite3 gem #166

Merged

inz force-pushed the feature/horizontal-scaling branch from c1c1e8c to 874cec5 Compare October 13, 2016 10:27

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 13, 2016 10:27 Inactive

inz force-pushed the feature/horizontal-scaling branch from 874cec5 to b4a81a6 Compare October 13, 2016 13:00

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 13, 2016 13:00 Inactive

inz force-pushed the feature/horizontal-scaling branch from b4a81a6 to 7b74651 Compare October 13, 2016 13:01

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 13, 2016 13:01 Inactive

inz added this to the Sensitivity Analysis v1.0 milestone Oct 13, 2016

inz mentioned this pull request Oct 13, 2016

Product data should be comparable across providers #106

Closed

inz force-pushed the feature/horizontal-scaling branch from 7b74651 to 32ec91c Compare October 25, 2016 10:52

inz had a problem deploying to fathomless-escarpment-2-pr-159 October 25, 2016 10:52 Failure

inz had a problem deploying to fathomless-escarpment-2-pr-159 October 25, 2016 11:01 Failure

inz force-pushed the feature/horizontal-scaling branch from f65ac5c to 32ec91c Compare October 25, 2016 11:07

inz had a problem deploying to fathomless-escarpment-2-pr-159 October 25, 2016 11:08 Failure

inz added 9 commits October 25, 2016 13:21

Add (partial) MiniZinc model for horizontal scaling

9b30365

Seems to work well when I disable inter ingredient traffic, but is really slow if traffic is considered.

Some cleanup in output of horizontal scaling model

9cd1eb4

It will now parse as JSON object (with one empty ingredient at the top of the list).

Update horizontal scaling model

f639464

It works fine for smallish ram and cpu constraints, but takes forever with big values (e.g., 100GB RAM).

Use horizontal scaling recommendations

662e1c9

The new model is a drop-in replacement for the original model, except for the additional ‘resource_count’ result.

Add partial vCPUs for Google, Amazon

2b41336

Also, adjust model to make vCPUs integers again (i.e., vCPUs * 100).

Fix tests for horizontal scaling

0e4cd40

Add ingredient scaling constraint to model

31159c7

Using `max_num_resources` you can limit the number of resources assigned to an ingredient. Set it to `0` for no restrictions. Currently, the model generated by the app does not yet contain a scaling constraint and just defaults to `0`.

inz force-pushed the feature/horizontal-scaling branch from 32ec91c to 1427e08 Compare October 25, 2016 11:21

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 25, 2016 11:21 Inactive

inz commented Oct 25, 2016

View reviewed changes

Fix f1-micro vCPU quota

4603eb4

inz temporarily deployed to fathomless-escarpment-2-pr-159 October 26, 2016 14:12 Inactive

Fix copying of scaling workload

8fcb0a9

Copying an application set every ingredient to use the horizontal scaling scheme instead of mirroring the scheme of the copy template.

inz temporarily deployed to fathomless-escarpment-2-pr-159 November 3, 2016 13:30 Inactive

Remove redundant authenticate_user!

770b13c

User authentication is done by the application controller by default

inz deployed to fathomless-escarpment-2-pr-159 November 3, 2016 13:37 View deployment

joe4dev reviewed Nov 3, 2016

View reviewed changes

joe4dev merged commit 15f1172 into master Nov 14, 2016

joe4dev deleted the feature/horizontal-scaling branch November 14, 2016 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Horizontal Scaling #159

Horizontal Scaling #159

inz commented Oct 10, 2016 •

edited

Loading

joe4dev commented Oct 10, 2016

inz commented Oct 10, 2016

inz commented Oct 10, 2016 •

edited

Loading

joe4dev commented Oct 10, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

inz commented Oct 13, 2016

inz commented Oct 14, 2016

inz Oct 25, 2016

joe4dev Nov 2, 2016

inz commented Oct 31, 2016

joe4dev left a comment

joe4dev Nov 2, 2016

joe4dev Nov 3, 2016

joe4dev Nov 2, 2016

inz commented Nov 7, 2016 •

edited

Loading

joe4dev commented Nov 14, 2016

		@@ -0,0 +1,75 @@
		class ScalingWorkloadsController < ApplicationController

		before_action :authenticate_user!

Horizontal Scaling #159

Horizontal Scaling #159

Conversation

inz commented Oct 10, 2016 • edited Loading

Things left to do/discuss

Implementation summary

joe4dev commented Oct 10, 2016

inz commented Oct 10, 2016

inz commented Oct 10, 2016 • edited Loading

joe4dev commented Oct 10, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

joe4dev commented Oct 12, 2016

inz commented Oct 12, 2016

inz commented Oct 13, 2016

inz commented Oct 14, 2016

inz Oct 25, 2016

Choose a reason for hiding this comment

joe4dev Nov 2, 2016

Choose a reason for hiding this comment

inz commented Oct 31, 2016

joe4dev left a comment

Choose a reason for hiding this comment

joe4dev Nov 2, 2016

Choose a reason for hiding this comment

joe4dev Nov 3, 2016

Choose a reason for hiding this comment

joe4dev Nov 2, 2016

Choose a reason for hiding this comment

inz commented Nov 7, 2016 • edited Loading

joe4dev commented Nov 14, 2016

inz commented Oct 10, 2016 •

edited

Loading

inz commented Oct 10, 2016 •

edited

Loading

inz commented Nov 7, 2016 •

edited

Loading