-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRASH][v4.4.3][ESP32-S3] "Timed out waiting for completion of AES Interrupt" (IDFGH-9264) #10647
Comments
This is really bad behavior. Crashing is not acceptable during an ota update (where I use AES CBC). Can we not return an error? Should we increase this timeout? is 2 seconds really enough? |
PR to do something more friendly: https://github.com/espressif/esp-idf/pull/10648/files |
Hmm, I dont really see how this could happen. It would help a lot if you were able to find a way to reproduce it. Adding an error return instead of abort seems reasonable, but I would still like to get to the bottom of how this happened. |
@chipweinberger are you by any chance using the Digital signature peripheral as part of your OTA process? (or at all in your app) |
No I don't use DS peripheral for OTA. Wifi + aes. Yes it would be great to repro. I've only hit it once. Thanks for merging the band-aid fix. |
Oh one thing! When I hit this I was debugging a big memory leak in my OTA code! I was allocating a freertos semaphore in internal ram ~20 times a second. My OTA would fail due to out of memory half way into the process. Maybe ota + aes + memory exhaustion. I am also using PSRAM, not sure if related. |
Is it possible to create a small reproducible use-case for this issue? So far, we have not been able to recreate this scenario on our side (tried different configurations including SPIRAM). |
I appreciate your effort! I remember reproducing it a couple times at least, but I imagine a minimal example will take awhile for me to create. Did you try creating a new mutex every time new ota data arrived? I was doing the update over wifi. |
No, I didn't. Do you mean OTA with low internal memory could run into this situation? |
yes. low internal memory, which was then fully exhausted during the ota update. I started the update with 20KB internal ram, and due to a bug was allocating and acquiring a mutex every 4KB of the ota update. |
I am hitting this problem every time. Increasing the main stack size to 6000 bytes makes it work. I do not know what is eating up the stack memory yet. However this likely means that the problem happens when the task performing the AES operations runs out of stack. |
For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).
For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).
For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).
Update: We have added few fixes in the AES DMA port layer recently, following are pointers for v4.4 branch First one is the fix for heap corruption issue that may occur with specific AES dma lengths and second is specific to incorrect DMA descriptor setup. Please have a look at them, they are relevant to this issue. If you could not recreate "AES timeout" issue on latest Thanks. |
For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: espressif#10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).
For certain data lengths, the last input descriptor was not getting appended correctly and hence the EOF flag in the DMA descriptor link list was set at incorrect location. This was resulting in the peripheral being stalled expecting more data and eventually the code used to timeout waiting for the AES completion interrupt. Required configs for this issue: CONFIG_MBEDTLS_HARDWARE_AES CONFIG_SOC_AES_SUPPORT_DMA This observation is similar to the issue reported in: #10647 To recreate this issue, start the AES-GCM DMA operation with data length 12280 bytes and this should stall the operation forever. In this fix, we are tracing the entire descriptor list and then appending the extra bytes descriptor at correct position (as the last node).
im still seeing this error in v5.1.2 as a workaround im using CONFIG_MBEDTLS_AES_USE_INTERRUPT=n |
we otherwise hit the error below, when parsing mainnet block 812548: Timed out waiting for completion of AES Interrupt espressif/esp-idf#10647 This measure does not incur a memory, or latency cost
Same. It was happening on 5.0.2 for me and still on 5.1.2 after upgrading. |
I'm facing the same issue while generating the private key for a Rainmaker application. In order to make it work I have to increase |
Answers checklist.
IDF version.
v4.4.
Operating System used.
macOS
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-S3 Dev Kit C
Power Supply used.
USB
What is the expected behavior?
AES CBC should complete.
What is the actual behavior?
Hits AES timeout.
Steps to reproduce.
I've never hit this before. Might be tempermental.
Debug Logs.
More Information.
No response
The text was updated successfully, but these errors were encountered: