Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awx-operator helm chart on GKE Autopilot requires manual intervention #1919

Closed
3 tasks done
labmonkey42 opened this issue Jul 12, 2024 · 6 comments
Closed
3 tasks done

Comments

@labmonkey42
Copy link

labmonkey42 commented Jul 12, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

As noted in #1115, GKE Autopilot automatically changes pod resource requests and/or limits. The specific defect described in that issue occurs because Autopilot clusters with the bursting feature disabled change pod resources to set limits equal to the requests value when both values are defined. This means that awx-operator-controller-manager is left with insufficient memory, and will always be OOMKilled.

In my case, I deploy the awx-operator helm chart with an Ansible playbook, and thus I can use a kubernetes.core.k8s_json_patch task on the Deployment to set the requests value equal to the limits value immediately after the helm chart is deployed, effectively circumventing the problem.

What #1115 didn't spell out, and I'm specifically requesting here, is the ability to adjust the resources requests and limits for the awx-manager pod in the Deployment from the helm chart values directly. This would allow us to handle this issue before deploying the chart, and prevent the play from repeatedly adjusting these values in subsequent runs.

AWX Operator version

multiple

AWX version

multiple

Kubernetes platform

other (please specify in additional information)

Kubernetes/Platform version

multiple

Modifications

no

Steps to reproduce

Install the awx-operator helm chart on any GKE Autopilot cluster with the bursting feature disabled.
Watch the status of the awx-operator-controller-manager-<rand> Pod.

Expected results

The awx-operator-controller-manager-<rand> Pod advances through setup stages

Actual results

The awx-operator-controller-manager-<rand> Pod cannot advance and is consistently OOMKilled

Additional information

No response

Operator Logs

No response

@djyasin
Copy link
Member

djyasin commented Jul 24, 2024

Hello @labmonkey42,
This would be a good question for the Ansible Community Forum. We would suggest raising this there.

Thank you for your time!

@labmonkey42
Copy link
Author

Hello @labmonkey42, This would be a good question

This is not a question. I'm telling you that this is a problem that needs a change to the chart in order to fix.

for the Ansible Community Forum. We would suggest raising this there.

This is a product issue requiring modification of the product to resolve.

Thank you for your time!

Thank you.

@labmonkey42
Copy link
Author

I should also note that since the original report of this issue I've done a clean install with my playbook and determined that simply patching the deployment object is a tricky workaround. If the helm task waits for the chart to deploy successfully, it will always fail due to the OOMKilled issue. If I specify wait: false to the helm task, then the patch task enters into a race with helm such that k8s_json_patch may or may not patch the deployment after helm creates it.

@D1StrX
Copy link

D1StrX commented Aug 13, 2024

I also experience OOMKilled on the AWX manager, and patched it manually with a memory limit of 1500mb.
The CRD AWX should include a section for awx_manager_resource_requirements. (And perhaps checkout if AWX Manager can be less memory hungry). All upgrades are failing here without manual intervention.

@oraNod oraNod added the Helm label Sep 3, 2024
@oraNod
Copy link
Contributor

oraNod commented Sep 4, 2024

Hi @labmonkey42

The helm chart code has moved to a new repository, ansible/awx-operator-helm. You can find more information about this move in the recent forum post about changes to the AWX operator installation methods.

We now plan to close this issue because it is no longer relevant to the code in this repository. If you think the issue is still valid and needs to be fixed, please recreate it in the ansible/awx-operator-helm repository.

Thank you.

@labmonkey42
Copy link
Author

Migrated to ansible-community/awx-operator-helm#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants