-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange weighted round robin behaviour with special configuration of weights #44
Comments
Real world experimenttl;dr: With the use of a test application, we can observe a momentary dip in traffic down to zero before recovering and shifting to the desired state. SetupI am attempting to reproduce this behaviour using a test application using the staging environment. The test application consists of 3 origins on separate clusters. The initial weights have been set similar to what was observed initially. With this, the next step is to generate some load over a period of time and shift traffic and observe. ObservationsFor this test I am using Looking at splunk log distribution for this test app, we can clearly see the obvious dip in traffic and the increase in traffic for the two origins to account for the loss. Some things to note, this dip in traffic did not cause any requests to fail, all requests were routed successfully with a status of 200. I conducted this experiment additional 2 times and observed the same behaviour, where traffic would momentarily dip to zero before recovering immediately. As an additional measure, I decided to continue shifting traffic to this origin by setting it's weight from 66 -> 100. However, this shift did not cause a sudden dip in traffic. |
hi @kalindudc this is a great post to explain the potential issue in lua-resty-balancer, I do agree you idea on RR。 and IIRC, the Nginx weighted rotation algorithm shoule like. |
@kalindudc Nice catch, let's follow nginx's implementation. Patches welcome. |
I recently discovered that if any two elements are relatively prime, then GCD's optimization will fail, for example |
For example, there is 1 instance with weight 100, and 99 instances with weight 1, each round will waste 99 times to touch last instance. |
@jizhuozhi For now, the round-robin is an |
Also, |
Hello, @doujiang24. Thanks for your reply, I will follow this blog to learn how does Tengine implement VNSWRR. Before I know VNSWRR, I prefer to use the EDF algorithm to implement the weighted load balancing algorithm. When using the binary heap implementation, its space complexity is Without changing the existing WRR implementation, I will provide a separate EDF implementation for selection. |
Description
tl;dr: With the use of a test script, we can observe a momentary dip in traffic down to zero due to the weighted round robin algorithm.
Setup
I am attempting to reproduce this behaviour using a test lua script to investigate the behaviour of the weighted round robin algorithm (WRR). The ingress_nginx balancer utilizes round robin (RR) as it's default load balancing algorithm which is the default implementation provided by openresty.
Weights Configuration
With this, the next step is to use the WRR algorithm to pick a node for a number of cycles to investigate the distribution of nodes. We can repeat this with the second set of weight configurations to observe the distribution of nodes with the WRR algorithm.
Observations
For this test I am using a
lua
script to explicitly call the WRR algorithm for 300 cycles. The distribution of nodes for the first weight configuration is constant with little variations, this is expected. The overall distribution of traffic follows the weight configuration.For the next weight configuration, we can observe that although the overall distribution of traffic adheres to the relative weights, the distribution of traffic is not constant. The node
10.0.0.3
does not get picked by WRR for some time and then the algorithm picks each node as if they were equal in weight. We can also see that this pattern repeats. After sometime, again node10.0.0.3
is not picked by WRR for some time.We can observe similar results regardless of the number of cycles. The fault in this implementation is more apparent when the number of cycles is less than the amount of cycles required for the algorithm to pick node
10.0.0.3
.Openresty WRR implementation
The openresty implementation is an accurate implementation of WRR. However, this implementation is only accurate for large finite sets of data to distribute traffic to weighted nodes. It is not viable for real time data with varying weights. This algorithm is an Interleaved WRR implementation which utilizes the greatest common denominator (GCD) between the weights to calculate the probability a node is picked based on it's weight.
The algorithm initially starts at a maximum probability for the
last_node
that was last picked where only nodes with weights >= the weight of thelast_picked
node will qualify for the next pick.The
last_picked
node is randomized on initialization, so during the first pick a random node will be used to represent thelast_picked
node.On each pick the algorithm infinitely iterates through the set of nodes until a node is picked. On every full iteration of all nodes we increase the chance a node is picked by the
(GCD / MAX_WEIGHT) * 100%
.For example for the first configuration of nodes, we have
GCD = 25
andMAX_WIEGHT = 100
so we pick each node in the following order.Although node
10.0.0.3
only has a weights of25
compared to100
of the other two nodes, the algorithm quickly increases it's chance of being picked (by25%
every complete cycle). However, for the second configuration of nodes, we have aGCD = 2
andMAX_WIEGHT = 100
. This only allows an increase of2%
per complete cycle of nodes. This results in this pattern, where node10.0.0.3
is not picked.After the initial pick of node
10.0.0.3
, we can see a uniform distribution for all nodes. After a while, when the probability of pick becomes <=0%
, we reset back to theMAX_WEIGHT
and this pattern repeats.When running this algorithm with a large number of cycles the overall weighted distribution is correct, however this still leads to a incorrect distribution of nodes for smaller intervals.
Possible Solution
I am suggesting a slight modification to this algorithm to avoid this scenario. Instead of the GCD, we can utilize the smallest weight to better distribute the picked nodes.
Now, instead of resetting the probability of pick back to
max_weight
when it becomes <=0%
, we can allow a guaranteed pick for the next node. This will avoid the cases when a node can be skipped entirely when using to thelowest_weight
to increase the pick chance.With this solution I have conducted the same test for
20
,300
,1000
cycles using the second weight configuration.20 Cycles
300 Cycles
1000 Cycles
We can see that the overall distribution of nodes still follows their respective weights and we no longer have this pattern where node
10.0.0.3
is not picked for large period of time. Even at smaller intervals (20) the distribution of nodes are correct.Some Limitations
Even with the above solutions we can still have configurations where this issue persist. For example the following configuration will still have a similar distribution of nodes even with the modified algorithm.
We can avoid this by introducing an offset to
self.lowest
or by setting a minimum value. However, this will mess with the relative weight distribution at small number of cycles.Conclusion
I do not have a perfect solution to this problem. But it is affecting real world data that relies on this algorithm to accurately distribute traffic across a set of nodes. Something to keep in mind is that, nginx implementation of WRR algorithm is vastly different to the one implemented here and on first inspection it doesn't look like that algorithm suffers this same limitation. It possible that algorithm can be adopted in here. I will include a real world test with production data as comment bellow in this issue.
Test script and setup
Setup
Script
Modified `roundrobin.lua`
Execute with
Data used in graphs: https://docs.google.com/spreadsheets/d/1ba570tbELUbG_N-Q-5vq15EyOP0x6wCd6X2Pu1ZAPrQ/edit?usp=sharing
The text was updated successfully, but these errors were encountered: