awx-operator helm chart on GKE Autopilot requires manual intervention #1919

labmonkey42 · 2024-07-12T18:36:03Z

Please confirm the following

I agree to follow this project's code of conduct.
I have checked the current issues for duplicates.
I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

As noted in #1115, GKE Autopilot automatically changes pod resource requests and/or limits. The specific defect described in that issue occurs because Autopilot clusters with the bursting feature disabled change pod resources to set limits equal to the requests value when both values are defined. This means that awx-operator-controller-manager is left with insufficient memory, and will always be OOMKilled.

In my case, I deploy the awx-operator helm chart with an Ansible playbook, and thus I can use a kubernetes.core.k8s_json_patch task on the Deployment to set the requests value equal to the limits value immediately after the helm chart is deployed, effectively circumventing the problem.

What #1115 didn't spell out, and I'm specifically requesting here, is the ability to adjust the resources requests and limits for the awx-manager pod in the Deployment from the helm chart values directly. This would allow us to handle this issue before deploying the chart, and prevent the play from repeatedly adjusting these values in subsequent runs.

AWX Operator version

multiple

AWX version

multiple

Kubernetes platform

other (please specify in additional information)

Kubernetes/Platform version

multiple

Modifications

no

Steps to reproduce

Install the awx-operator helm chart on any GKE Autopilot cluster with the bursting feature disabled.
Watch the status of the awx-operator-controller-manager-<rand> Pod.

Expected results

The awx-operator-controller-manager-<rand> Pod advances through setup stages

Actual results

The awx-operator-controller-manager-<rand> Pod cannot advance and is consistently OOMKilled

Additional information

No response

Operator Logs

No response

The text was updated successfully, but these errors were encountered:

djyasin · 2024-07-24T17:27:05Z

Hello @labmonkey42,
This would be a good question for the Ansible Community Forum. We would suggest raising this there.

Thank you for your time!

labmonkey42 · 2024-07-24T17:30:25Z

Hello @labmonkey42, This would be a good question

This is not a question. I'm telling you that this is a problem that needs a change to the chart in order to fix.

for the Ansible Community Forum. We would suggest raising this there.

This is a product issue requiring modification of the product to resolve.

Thank you for your time!

Thank you.

labmonkey42 · 2024-07-24T17:38:15Z

I should also note that since the original report of this issue I've done a clean install with my playbook and determined that simply patching the deployment object is a tricky workaround. If the helm task waits for the chart to deploy successfully, it will always fail due to the OOMKilled issue. If I specify wait: false to the helm task, then the patch task enters into a race with helm such that k8s_json_patch may or may not patch the deployment after helm creates it.

D1StrX · 2024-08-13T07:30:15Z

I also experience OOMKilled on the AWX manager, and patched it manually with a memory limit of 1500mb.
The CRD AWX should include a section for awx_manager_resource_requirements. (And perhaps checkout if AWX Manager can be less memory hungry). All upgrades are failing here without manual intervention.

oraNod · 2024-09-04T18:32:17Z

Hi @labmonkey42

The helm chart code has moved to a new repository, ansible/awx-operator-helm. You can find more information about this move in the recent forum post about changes to the AWX operator installation methods.

We now plan to close this issue because it is no longer relevant to the code in this repository. If you think the issue is still valid and needs to be fixed, please recreate it in the ansible/awx-operator-helm repository.

Thank you.

labmonkey42 · 2024-09-04T18:56:47Z

Migrated to ansible-community/awx-operator-helm#14

github-actions bot added needs_triage community labels Jul 12, 2024

oraNod added the Helm label Sep 3, 2024

oraNod closed this as completed Sep 4, 2024

labmonkey42 mentioned this issue Sep 4, 2024

Expose resources for awx-controller-manager in chart ansible-community/awx-operator-helm#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awx-operator helm chart on GKE Autopilot requires manual intervention #1919

awx-operator helm chart on GKE Autopilot requires manual intervention #1919

labmonkey42 commented Jul 12, 2024 •

edited

Loading

djyasin commented Jul 24, 2024

labmonkey42 commented Jul 24, 2024

labmonkey42 commented Jul 24, 2024

D1StrX commented Aug 13, 2024

oraNod commented Sep 4, 2024

labmonkey42 commented Sep 4, 2024

awx-operator helm chart on GKE Autopilot requires manual intervention #1919

awx-operator helm chart on GKE Autopilot requires manual intervention #1919

Comments

labmonkey42 commented Jul 12, 2024 • edited Loading

Please confirm the following

Bug Summary

AWX Operator version

AWX version

Kubernetes platform

Kubernetes/Platform version

Modifications

Steps to reproduce

Expected results

Actual results

Additional information

Operator Logs

djyasin commented Jul 24, 2024

labmonkey42 commented Jul 24, 2024

labmonkey42 commented Jul 24, 2024

D1StrX commented Aug 13, 2024

oraNod commented Sep 4, 2024

labmonkey42 commented Sep 4, 2024

labmonkey42 commented Jul 12, 2024 •

edited

Loading