-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator stuck in scale down loop #398
Comments
i have not added any group based allocation awareness and logs are not helping either. Let me know if you need further details regarding setup. |
When i manually reduce index replica count then es-operator also reduces stateful replicas. Why is es-operator not able to reduce index replica count? Any ideas? |
@otrosien I tried to run operator from local machine and was able to pinpoint the code which was causing the issue. Its this function that is preventing scale down i.e. Do not scale down when shard to node ratio is 1. I tried changing |
I think we never considered min-shards-per-node to be equal to max-shards-per-node. any reason for that kind of setup? |
Expected Behavior
When CPU load is below scaleDownCPUBoundary then replica count should reduce. Thus node count should go down
Actual Behavior
When CPU load is below scaleDownCPUBoundary, index replica count is not reduced. Thus number of nodes does not go down.
Logs -
time="2024-03-19T06:27:48Z" level=info msg="Waiting for operation to stop" eds=es-mci-data namespace=mci
time="2024-03-19T06:27:49Z" level=info msg="Terminating operator loop." eds=es-mci-data namespace=mci
time="2024-03-19T06:27:50Z" level=info msg="Waiting for operation to stop" eds=es-mci-data namespace=mci
time="2024-03-19T06:27:50Z" level=error msg="Failed to operate resource: failed to update status: Put "https://10.10.0.1:443/apis/zalando.org/v1/namespaces/mci/elasticsearchdatasets/es-mci-data/status?timeout=30s\": context canceled"
time="2024-03-19T06:27:50Z" level=info msg="Terminating operator loop." eds=es-mci-data namespace=mci
time="2024-03-19T06:28:19Z" level=info msg="Scaling hint: DOWN" eds=es-mci-data namespace=mci
time="2024-03-19T06:28:49Z" level=info msg="Scaling hint: DOWN" eds=es-mci-data namespace=mci
time="2024-03-19T06:29:19Z" level=info msg="Scaling hint: DOWN" eds=es-mci-data namespace=mci
Steps to Reproduce the Problem
I have simple setup with 1 ES cluster with 1 master and 1 EDS managed by es-operator. I have single index with 2 shard.
scaling options -
enabled: true
minReplicas: 1
maxReplicas: 6
minShardsPerNode: 1
maxShardsPerNode: 1
minIndexReplicas: 0
maxIndexReplicas: 5
scaleUpCPUBoundary: 50
scaleUpCooldownSeconds: 60
scaleUpThresholdDurationSeconds: 30
scaleDownCPUBoundary: 40
scaleDownCooldownSeconds: 60
scaleDownThresholdDurationSeconds: 30
diskUsagePercentScaledownWatermark: 75
When i start basic busybox load generator , the cpu usage increases and es-operator scales up by increasing replica count of index. But when i stop load generator , cpu usage goes down but replica count is not updated. Thus number of nodes remained high
Specifications
The text was updated successfully, but these errors were encountered: