[AVM Module Issue]: AuthorizationFailed #157

Raphael-kainos · 2024-12-09T13:46:58Z

Check for previous/existing GitHub issues

I have checked for previous/existing GitHub issues

Issue Type?

Bug

(Optional) Module Version

0.10.0

(Optional) Correlation Id

No response

Description

The module terraform-azurerm-avm-ptn-alz does not deploy successfully when using Azure DevOps agents. The deployment results in an AuthorizationFailed error indicating the service principal does not have authorization to perform action 'Microsoft.Management/managementGroups/read'. The process often gets stuck at level 0 or 1 and does not progress further.

I have attempted the following steps to mitigate the issue, but none resolved the problem:

-Utilised a self-hosted agent instead with the required permissions.

Granted the service principal Owner role and azure landing zones management group contributor & reader at the tenant root level.

Adjusted settings for timeouts, delays, and retries to account for propagation and eventual consistency issues.

This is exclusive to ADO, when using github or locally it deploy with no errors. Please note that in all scenarios I used the same service princpal with same permission and ADO is the only method that failed.

Despite these efforts, the issue persists, suggesting that the module may not be fully compatible with Azure DevOps agent workflows. Please advise on whether additional configuration or updates to the module are required to resolve this issue.

paul-e-martin · 2024-12-16T14:05:25Z

I am also seeing the exact same error.

Raphael-kainos · 2024-12-18T13:35:54Z

Still experiencing this issue can we get a update please ?

matt-FFFFFF · 2024-12-18T16:00:25Z

Hi,

We successfully use ADO to deploy this module in the alz-terraform-accelerator, therefore I do not believe that there is any specific issue with ADO agents.

Adding @jaredfholgate who has done more testing than I here.

This issue can occur when for to permission time reconciliation. There have been some changes made to azapi to address this but they are not released yet.

In testing with a locally built provider from the main branch I have noticed that these issues do not occur but until a provider version is released then we are a little stuck.

#RR

jaredfholgate · 2024-12-19T11:00:44Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

Raphael-kainos · 2024-12-19T11:17:52Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

No, this isn't being run via the Accelerator deployment. I just reused the role definition from the module deployment. I've also tried assigning it the Owner, Contributor, and User Access Administrator roles at the root tenant level, but I still encounter the same error.

jaredfholgate · 2024-12-19T11:24:33Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

No, this isn't being run via the Accelerator deployment. I just reused the role definition from the module deployment. I've also tried assigning it the Owner, Contributor, and User Access Administrator roles at the root tenant level, but I still encounter the same error.

I'm trying it with the accelerator now. I'll share the new module with you assuming everything works and it may help to isolate the problem.

matt-FFFFFF · 2024-12-19T12:18:14Z

FYI this was the PR that improves behaviour for resources at MG scope:

Azure/terraform-provider-azapi#681

jaredfholgate · 2024-12-19T12:55:07Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

No, this isn't being run via the Accelerator deployment. I just reused the role definition from the module deployment. I've also tried assigning it the Owner, Contributor, and User Access Administrator roles at the root tenant level, but I still encounter the same error.

Hi @Raphael-kainos. It is not clear whether a second plan / apply resolves this for you? Are you saying it never works, not even after a retry?

jaredfholgate · 2024-12-19T13:01:52Z

So far I have been unable to replicate this specific issue.

To reproduce, I used the preview version of the Azure Verified Modules starter module for the accelerator.

The following is not yet GA, but may help you to resolve your problem. It is expected to be GA at end of January.

If you want to try the same, you can find the details here:

New starter module: https://github.com/Azure/alz-terraform-accelerator/tree/main/templates/platform_landing_zone
I ran this using the accelerator, there are some preview docs here: https://azure.github.io/Azure-Landing-Zones/accelerator/startermodules/terraform-platform-landing-zone/
I used these config files:
- Bootstrap config: https://github.com/Azure/alz-terraform-accelerator/blob/main/templates/platform_landing_zone/examples/bootstrap/inputs-azure-devops.yaml
- Platform Landing Zone config: https://github.com/Azure/alz-terraform-accelerator/blob/main/templates/platform_landing_zone/examples/full-multi-region/hub-and-spoke-vnet.tfvars

My command to deploy was: Deploy-Accelerator -inputs "C:\acc-test\config\inputs-azure-devops.yaml", "C:\acc-test\config\hub-and-spoke-vnet.tfvars" -output "C:\acc-test\output"

I used the full multi-region config to test everything, but you could probably use the management only one to replicate your issue: https://github.com/Azure/alz-terraform-accelerator/blob/main/templates/platform_landing_zone/examples/management-only/management.tfvars

jaredfholgate · 2024-12-19T13:05:08Z

Given what Matt said about the known issue with retry, this will hopefully have a solution soon. However, please confirm whether a second plan / apply solves the problem for you? If not, does using the accelerator code I shared solve it for you?

If you still get the issue after that, then there must be an environment specific problem that would require further investigation.

paul-e-martin · 2024-12-19T13:44:36Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

No, this isn't being run via the Accelerator deployment. I just reused the role definition from the module deployment. I've also tried assigning it the Owner, Contributor, and User Access Administrator roles at the root tenant level, but I still encounter the same error.

Hi @Raphael-kainos. It is not clear whether a second plan / apply resolves this for you? Are you saying it never works, not even after a retry?

In mine, if i run a 2nd plan/apply, it will fail. This is due to the management groups actually getting created, but i guess not "logged" into the statefile, so terraform wants to create them again.

If i delete the management groups that had the issue, there is no guarantee that the plan/apply will succeed. I have had instances where it does apply, others where we get the same AuthorizationFailed error.

When checking the Azure portal when the failure occurs, it does appear to take time for the inherited permissions to apply from the parent management group.

Not sure if any of that helps.

Raphael-kainos · 2024-12-19T15:04:13Z

I'll run some testing when I get chance and see if I can replicate. It looks like this is being run via an Accelerator deployment given the role definition shown.

No, this isn't being run via the Accelerator deployment. I just reused the role definition from the module deployment. I've also tried assigning it the Owner, Contributor, and User Access Administrator roles at the root tenant level, but I still encounter the same error.

Hi @Raphael-kainos. It is not clear whether a second plan / apply resolves this for you? Are you saying it never works, not even after a retry?

In mine, if i run a 2nd plan/apply, it will fail. This is due to the management groups actually getting created, but i guess not "logged" into the statefile, so terraform wants to create them again.

If i delete the management groups that had the issue, there is no guarantee that the plan/apply will succeed. I have had instances where it does apply, others where we get the same AuthorizationFailed error.

When checking the Azure portal when the failure occurs, it does appear to take time for the inherited permissions to apply from the parent management group.

Not sure if any of that helps.

Yh @jaredfholgate this is exactly what happens when I tried a 2nd plan/apply. Im going try the accelerator code and get back to you as soon as possible. Thanks

Raphael-kainos · 2024-12-23T20:59:12Z

Hi @jaredfholgate I tried the accelerator code and encountered the same error. This suggests it might be an environment issue, but I’m not sure what the cause could be. I’m not using a self-hosted agent at the moment—could that be the issue?

Since I’m working with the exact same codebase as you, it’s a bit puzzling. Any pointers for investigation would be greatly appreciated

Raphael-kainos · 2024-12-24T17:10:56Z

FYI this was the PR that improves behaviour for resources at MG scope:

Azure/terraform-provider-azapi#681

I can only assume it related to this current bug, and will have to wait to this newest version of AZAPI. Just odd because you guys do not have this problem.

matt-FFFFFF · 2024-12-26T21:43:07Z

Based on what you've both said I think this will be fixed with 2.2

Raphael-kainos · 2025-01-04T20:45:08Z

The recent update to azapi doesn’t seem to address the issue I’m experiencing with Azure DevOps deployments. I’m still encountering the same error, which primarily appears to be related to RBAC propagation, specifically within Azure DevOps.

I’ve tested this in two different tenants, one of which has significantly fewer role assignments, but the problem persists. I’m currently stuck on why this is happening. When I check the portal after receiving the error, I notice also that the RBAC permissions for the managed identity assigned to Azure DevOps take a significant amount of time to propagate. This delay affects not only nested management groups but occasionally even the top-level management group.

@paul-e-martin Are you still experiencing the same issue? Also, @jaredfholgate, I’m using the accelerator configuration you provided, both with the deployment management group only and with all modules. Can you think of any differences in your setup that might help investigate this issue, considering you’ve mentioned you’re currently using Azure DevOps for deployments?

paul-e-martin · 2025-01-04T21:15:56Z

@Raphael-kainos will be testing again on Monday, will update here with the results.

What authentication are you using? My deployment is with a managed identity and the workload identity federation. Will do a test using a service principal, to see if the propagation of RBAC is different.

Raphael-kainos · 2025-01-04T22:33:05Z

Ok @paul-e-martin, that would be very helpful. I’ve primarily been using managed identities and workload identity federation, as this is what is utilised in the ALZ accelerator. However, I also tried using a service principal but encountered the same issues.

paul-e-martin · 2025-01-04T22:49:31Z

Wondering if it has anything to do with where the DevOps agents are located, as to then which azure API endpoints they hit. I have mine setup as the managed DevOps agents in uksouth. Might try another region on Monday as well.

jaredfholgate · 2025-01-06T13:19:35Z

Our tests run in multiple regions. The Azure DevOps org is homed in uksouth. The last end to end test I ran was uksouth.

We do often see issues with role assignments being slow, but they are eventually consistent. I've never seen a permanent failure. We have increased the retry timeouts in the accelerator too.

With regards to the issue being specific to Azure DevOps, the only thing I can think of is that GitHub is refreshing it's access token, but Azure DevOps is not. GitHub has some logic built into the Terraform provider to automatically get and use the id token. It is using the request url behind the scenes for this.

I am currently in discussion with someone working to implement this for the azurerm backend for Azure DevOps. That same logic could be applied to the 3 providers if it works. It's possible that using ARM_OIDC_REQUEST_TOKEN and ARM_OIDC_REQUEST_URL could resolve this. I will take a look into this as time allows as they have recently been exposed in Azure DevOps.

jaredfholgate · 2025-01-07T15:51:13Z

Further to this, my colleague is working on a proper solution in the azurerm provider and the azurerm backend in Terraform Core that will support token refresh for OIDC. You can see his latest update here: hashicorp/terraform#34322 (comment)

azapi already got support for token refresh. See the second option here that uses ARM_OIDC_REQUEST_TOKEN and ARM_OIDC_AZURE_SERVICE_CONNECTION_ID : https://registry.terraform.io/providers/Azure/azapi/latest/docs/guides/service_principal_oidc#configuring-the-service-principal-in-terraform

Once they are all released, I'll be able to update the Accelerator pipelines to support token refresh.

## Overview/Summary Increase timeouts to help with ADO eventual consistency issue ## This PR fixes/adds/changes/removes 1. Azure/terraform-azurerm-avm-ptn-alz#157 2. Azure/ALZ-PowerShell-Module#269 ### Breaking Changes None ## Testing Evidence Please provide any testing evidence to show that your Pull Request works/fixes as described and planned (include screenshots, if appropriate). ## As part of this Pull Request I have - [x] Checked for duplicate [Pull Requests](https://github.com/Azure/alz-terraform-accelerator/pulls) - [x] Associated it with relevant [issues](https://github.com/Azure/alz-terraform-accelerator/issues), for tracking and closure. - [x] Ensured my code/branch is up-to-date with the latest changes in the `main` [branch](https://github.com/Azure/alz-terraform-accelerator/tree/main) - [x] Performed testing and provided evidence. - [x] Updated relevant and associated documentation.

jaredfholgate · 2025-01-11T12:22:54Z

Hi @Raphael-kainos and @paul-e-martin

I was eventually able to reproduce this issue. I changed my root management group and it started happening.

As such I have been able to find a way to resolve it. You can see the PRs I have linked here to fix it in the Accelerator. Eventually, I think the token refresh in the provider may also help.

In the short term, setting the environment variable AZAPI_RETRY_GET_AFTER_PUT_MAX_TIME solves the issue. I found the permissions were consistent between 10 and 15 minutes after creation of the management group, so you could probably set it to 20m. I have set it to 60m to be on the safe side given there is no impact of having a longer timeout for this use case.

Given I believe this solves the originally raised issue, I am going to close this issue now. Please re-open it if you find that it does not solve it for you though.

I continue to work with PG on implementing the token refresh for Azure DevOps OIDC and that will hopefully come in the next few months and potentially help to reduce the time it takes, but I can't be 100% sure it will.

CC: @matt-FFFFFF

Raphael-kainos added Language: Terraform 🌐 This is related to the Terraform IaC language Needs: Triage 🔍 Maintainers need to triage still labels Dec 9, 2024

microsoft-github-policy-service bot added Type: Bug 🐛 Something isn't working Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days labels Dec 9, 2024

microsoft-github-policy-service bot added the Needs: Author Feedback 👂 Awaiting feedback from the issue/PR author label Dec 18, 2024

matt-FFFFFF removed Needs: Author Feedback 👂 Awaiting feedback from the issue/PR author Needs: Triage 🔍 Maintainers need to triage still Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days labels Dec 18, 2024

matt-FFFFFF self-assigned this Dec 18, 2024

jaredfholgate added the Needs: Author Feedback 👂 Awaiting feedback from the issue/PR author label Dec 19, 2024

microsoft-github-policy-service bot added Needs: Attention 👋 Reply has been added to issue, maintainer to review and removed Needs: Author Feedback 👂 Awaiting feedback from the issue/PR author labels Dec 19, 2024

matt-FFFFFF added Needs: External Changes ⚒️ When an issue/PR requires changes that are outside of the control of the module. e.g. to an RP. and removed Needs: Attention 👋 Reply has been added to issue, maintainer to review labels Dec 26, 2024

jaredfholgate closed this as completed Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AVM Module Issue]: AuthorizationFailed #157

[AVM Module Issue]: AuthorizationFailed #157

Raphael-kainos commented Dec 9, 2024 •

edited

Loading

paul-e-martin commented Dec 16, 2024

Raphael-kainos commented Dec 18, 2024

matt-FFFFFF commented Dec 18, 2024

jaredfholgate commented Dec 19, 2024

Raphael-kainos commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024

matt-FFFFFF commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024 •

edited

Loading

jaredfholgate commented Dec 19, 2024

paul-e-martin commented Dec 19, 2024

Raphael-kainos commented Dec 19, 2024

Raphael-kainos commented Dec 23, 2024 •

edited

Loading

Raphael-kainos commented Dec 24, 2024

matt-FFFFFF commented Dec 26, 2024

Raphael-kainos commented Jan 4, 2025

paul-e-martin commented Jan 4, 2025

Raphael-kainos commented Jan 4, 2025

paul-e-martin commented Jan 4, 2025 via email •

edited

Loading

jaredfholgate commented Jan 6, 2025

jaredfholgate commented Jan 7, 2025 •

edited

Loading

jaredfholgate commented Jan 11, 2025 •

edited

Loading

[AVM Module Issue]: AuthorizationFailed #157

[AVM Module Issue]: AuthorizationFailed #157

Comments

Raphael-kainos commented Dec 9, 2024 • edited Loading

Check for previous/existing GitHub issues

Issue Type?

(Optional) Module Version

(Optional) Correlation Id

Description

paul-e-martin commented Dec 16, 2024

Raphael-kainos commented Dec 18, 2024

matt-FFFFFF commented Dec 18, 2024

jaredfholgate commented Dec 19, 2024

Raphael-kainos commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024

matt-FFFFFF commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024

jaredfholgate commented Dec 19, 2024 • edited Loading

jaredfholgate commented Dec 19, 2024

paul-e-martin commented Dec 19, 2024

Raphael-kainos commented Dec 19, 2024

Raphael-kainos commented Dec 23, 2024 • edited Loading

Raphael-kainos commented Dec 24, 2024

matt-FFFFFF commented Dec 26, 2024

Raphael-kainos commented Jan 4, 2025

paul-e-martin commented Jan 4, 2025

Raphael-kainos commented Jan 4, 2025

paul-e-martin commented Jan 4, 2025 via email • edited Loading

jaredfholgate commented Jan 6, 2025

jaredfholgate commented Jan 7, 2025 • edited Loading

jaredfholgate commented Jan 11, 2025 • edited Loading

Raphael-kainos commented Dec 9, 2024 •

edited

Loading

jaredfholgate commented Dec 19, 2024 •

edited

Loading

Raphael-kainos commented Dec 23, 2024 •

edited

Loading

paul-e-martin commented Jan 4, 2025 via email •

edited

Loading

jaredfholgate commented Jan 7, 2025 •

edited

Loading

jaredfholgate commented Jan 11, 2025 •

edited

Loading