Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix s2 and s3 Cache_Count_Flash_Pages rom function wrapper (IDFGH-14493) #15262

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EliteTK
Copy link

@EliteTK EliteTK commented Jan 22, 2025

Description

The Cache_Count_Flash_Pages rom function on the s2 and s3 only counts one page for any pages which are mapped to page 0 of flash as the Cache_Flash_To_SPIRAM_Copy function attempts to map all flash page 0 mapped pages to one PSRAM page.

As this function can be called for multiple regions, it needs to track if a page mapped to page 0 has previously been accounted for by a previous call. It does this using the page0_mapped in-out parameter. This logic contains an error[0]:

// This code was reverse engineered from the ROM with the help of Ghidra and
// then cleaned up
uint32_t Cache_Count_Flash_Pages(uint32_t bus, uint32_t *page0_mapped)
{
    volatile uint32_t *mmu_table = (volatile uint32_t *)DR_REG_MMU_TABLE;
    uint32_t start, end;
    if (bus == CACHE_IBUS) {
        start = CACHE_IROM_MMU_START;
        end = CACHE_IROM_MMU_END;
    } else {
        start = CACHE_DROM_MMU_START;
        end = CACHE_DROM_MMU_END;
    }

    uint32_t count, page0_count = 0, valid_flash_count = 0;
    for (uint32_t i = start >> 2; i < end >> 2; i++) {
        uint32_t mapping = mmu_table[i];
        if ((mapping & (SOC_MMU_INVALID | SOC_MMU_TYPE)) == (SOC_MMU_VALID | SOC_MMU_ACCESS_FLASH)) {
            if ((mapping & SOC_MMU_VALID_VAL_MASK) == 0) page0_count++;
            valid_flash_count++;
        }
    }
    // BUG: If page0_count is 0, 1 is still added
    // Should be `if (page0_count != 0 && *page0_mapped == 0) {`
    if (*page0_mapped == 0) {
        count = valid_flash_count + 1 - page0_count;
    } else {
        count = valid_flash_count - page0_count;
    }
    *page0_mapped += page0_count;
    return count;
}

The current Cache_Count_Flash_Pages wrapper in the idf attempts to compensate for this bug by checking if the page0_mapped parameter was changed by a call to the function and reducing the count if it has not.

This, however, will incorrectly over-compensate in situations where the initial value of page0_mapped was not zero as the code above only miscounts when it was zero.

This patch addresses the issue in this wrapper function by correctly compensating for the bug only in cases where the final page0_mapped value is 0.

Testing

I used a ESP32-S3 based LCD board to run code which made use of CONFIG_SPIRAM_FETCH_INSTRUCTIONS and CONFIG_SPIRAM_RODATA, I added some additional verbose logging in mmu_config_psram_text_segment and mmu_config_psram_rodata_segment to check the calculated page count and compare it to the change in page_id after the SPIRAM copy is performed.

Since my code did not result in page0 getting mapped (I'm not sure how to arrange for that organically), I manually triggered the issue by initialising page0_mapped to 1 which resulted in the original code under-counting the total number of required pages by 1 in each instance of the function being called.

With the fix applied, the count matched the jump in page number that subsequent calls to Cache_Flash_To_SPIRAM_Copy resulted in. This was tested both in cases which did and cases which did not trigger the bug.

Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
    NOTE: This change would only break code which would already break due to a lack of PSRAM space.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

The rom function on the s2 and s3 only counts one page for any pages
which are mapped to page 0 of flash as the Cache_Flash_To_SPIRAM_Copy
function attempts to map all flash page 0 mapped pages to one PSRAM
page.

As this function can be called for multiple regions, it needs to track
if a page mapped to page 0 has previously been accounted for by a
previous call. It does this using the page0_mapped in-out parameter.
This logic contains an error:

```
if (*page0_mapped == 0) {
    // BUG: If page0_count is 0, 1 is still added
    count = valid_flash_count + 1 - page0_count;
} else {
    count = valid_flash_count - page0_count;
}
*page0_mapped += page0_count;
return count;
```

The current Cache_Count_Flash_Pages wrapper in the idf attempts to
compensate for this bug by checking if the page0_mapped parameter was
changed by a call to the function and reducing the count if it has not.

This, however, will incorrectly over-compensate in situations where the
initial value of page0_mapped was not zero as the code above only
miscounts when it was zero.

This patch addresses the issue in this wrapper function by correctly
compensating for the bug only in cases where the final page0_mapped
value is 0.
@EliteTK
Copy link
Author

EliteTK commented Jan 22, 2025

I've resubmitted this as the CI complained about the branch name.

Copy link

github-actions bot commented Jan 22, 2025

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello EliteTK, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against c29d8b9

@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 22, 2025
@github-actions github-actions bot changed the title Fix s2 and s3 Cache_Count_Flash_Pages rom function wrapper Fix s2 and s3 Cache_Count_Flash_Pages rom function wrapper (IDFGH-14493) Jan 22, 2025
EliteTK added a commit to EliteTK/esp-hal that referenced this pull request Jan 23, 2025
This implementation mirrors how the ESP-IDF implementation of this
feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom
function) works except it differs in a few key ways:

The ESP-IDF seems to map `.text` and `.rodata` into the first and second
128 cache pages respectively (although looking at the linker scripts,
I'm not sure how, but a runtime check confirmed this seemed to be the
case). This is reflected in how the `Cache_Count_Flash_Pages`,
`Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code
executing them works. The count function can only be made to count flash
pages within the first 256 pages (of which there are 512 on the
ESP32-S3). Likewise, the copy function will only copy flash pages which
are mapped within the first 256 entries (across two calls). As the
esp-hal handles mapping `.text` and `.rodata` differently, these ROM
functions are technically not appropriate if more than 256 pages of
flash (`.text` and `.rodata` combined) are in use by the application.

Additionally, the functions both contain bugs, one of which the IDF
attempts to work around incorrectly, and the other which the IDF does
not appear to be aware of. Details of these bugs can be found on the IDF
issue/PR tracker[0][1].

As a result, this commit contains a heavily modified/adjusted rust
re-write of the reverse engineered ROM code combined with a vague port
of the ESP-IDF code.

There are three additional noteworthy differences from the ESP-IDF version
of the code:

1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped
   independently and separately allowing only one to be mapped. But the
   current version of the code does not allow this flexibility. This can
   be implemented by checking the address of each page entry against the
   segment locations to determine which segment each address belongs to.
2. The ESP-IDF calls
   `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));`
   (functions from the ESP-IDF) in order to "Enable the most high bus,
   which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3
   after careful inspection these calls result in a no-op as the address
   passed to cache_ll_l1_get_bus will result in an empty cache bus mask.
   It's currently unclear to me if this is a bug in the ESP-IDF code, or
   if this code (which from cursory investigation is probably not a
   no-op on the -S2) is solely targetting the ESP32-S3.
3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address
   when copying `.text` and a dcache address when copying `.rodata`.
   This affects which cache the reads will occur through. But the writes
   always go through a "spare page" (name I came up with during reverse
   engineering) via the dcache. This code performs all reads through the
   dcache. I don't know if there's a proper reason to read through the
   correct cache when doing the copy and this doesn't appear to have any
   negative impact.

[0]: espressif/esp-idf#15262
[1]: espressif/esp-idf#15263
EliteTK added a commit to EliteTK/esp-hal that referenced this pull request Jan 23, 2025
This implementation mirrors how the ESP-IDF implementation of this
feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom
function) works except it differs in a few key ways:

The ESP-IDF seems to map `.text` and `.rodata` into the first and second
128 cache pages respectively (although looking at the linker scripts,
I'm not sure how, but a runtime check confirmed this seemed to be the
case). This is reflected in how the `Cache_Count_Flash_Pages`,
`Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code
executing them works. The count function can only be made to count flash
pages within the first 256 pages (of which there are 512 on the
ESP32-S3). Likewise, the copy function will only copy flash pages which
are mapped within the first 256 entries (across two calls). As the
esp-hal handles mapping `.text` and `.rodata` differently, these ROM
functions are technically not appropriate if more than 256 pages of
flash (`.text` and `.rodata` combined) are in use by the application.

Additionally, the functions both contain bugs, one of which the IDF
attempts to work around incorrectly, and the other which the IDF does
not appear to be aware of. Details of these bugs can be found on the IDF
issue/PR tracker[0][1].

As a result, this commit contains a heavily modified/adjusted rust
re-write of the reverse engineered ROM code combined with a vague port
of the ESP-IDF code.

There are three additional noteworthy differences from the ESP-IDF version
of the code:

1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped
   independently and separately allowing only one to be mapped. But the
   current version of the code does not allow this flexibility. This can
   be implemented by checking the address of each page entry against the
   segment locations to determine which segment each address belongs to.
2. The ESP-IDF calls
   `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));`
   (functions from the ESP-IDF) in order to "Enable the most high bus,
   which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3
   after careful inspection these calls result in a no-op as the address
   passed to cache_ll_l1_get_bus will result in an empty cache bus mask.
   It's currently unclear to me if this is a bug in the ESP-IDF code, or
   if this code (which from cursory investigation is probably not a
   no-op on the -S2) is solely targetting the ESP32-S3.
3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address
   when copying `.text` and a dcache address when copying `.rodata`.
   This affects which cache the reads will occur through. But the writes
   always go through a "spare page" (name I came up with during reverse
   engineering) via the dcache. This code performs all reads through the
   dcache. I don't know if there's a proper reason to read through the
   correct cache when doing the copy and this doesn't appear to have any
   negative impact.

[0]: espressif/esp-idf#15262
[1]: espressif/esp-idf#15263
EliteTK added a commit to EliteTK/esp-hal that referenced this pull request Jan 23, 2025
This implementation mirrors how the ESP-IDF implementation of this
feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom
function) works except it differs in a few key ways:

The ESP-IDF seems to map `.text` and `.rodata` into the first and second
128 cache pages respectively (although looking at the linker scripts,
I'm not sure how, but a runtime check confirmed this seemed to be the
case). This is reflected in how the `Cache_Count_Flash_Pages`,
`Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code
executing them works. The count function can only be made to count flash
pages within the first 256 pages (of which there are 512 on the
ESP32-S3). Likewise, the copy function will only copy flash pages which
are mapped within the first 256 entries (across two calls). As the
esp-hal handles mapping `.text` and `.rodata` differently, these ROM
functions are technically not appropriate if more than 256 pages of
flash (`.text` and `.rodata` combined) are in use by the application.

Additionally, the functions both contain bugs, one of which the IDF
attempts to work around incorrectly, and the other which the IDF does
not appear to be aware of. Details of these bugs can be found on the IDF
issue/PR tracker[0][1].

As a result, this commit contains a heavily modified/adjusted rust
re-write of the reverse engineered ROM code combined with a vague port
of the ESP-IDF code.

There are three additional noteworthy differences from the ESP-IDF version
of the code:

1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped
   independently and separately allowing only one to be mapped. But the
   current version of the code does not allow this flexibility. This can
   be implemented by checking the address of each page entry against the
   segment locations to determine which segment each address belongs to.
2. The ESP-IDF calls
   `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));`
   (functions from the ESP-IDF) in order to "Enable the most high bus,
   which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3
   after careful inspection these calls result in a no-op as the address
   passed to cache_ll_l1_get_bus will result in an empty cache bus mask.
   It's currently unclear to me if this is a bug in the ESP-IDF code, or
   if this code (which from cursory investigation is probably not a
   no-op on the -S2) is solely targetting the ESP32-S3.
3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address
   when copying `.text` and a dcache address when copying `.rodata`.
   This affects which cache the reads will occur through. But the writes
   always go through a "spare page" (name I came up with during reverse
   engineering) via the dcache. This code performs all reads through the
   dcache. I don't know if there's a proper reason to read through the
   correct cache when doing the copy and this doesn't appear to have any
   negative impact.

[0]: espressif/esp-idf#15262
[1]: espressif/esp-idf#15263
EliteTK added a commit to EliteTK/esp-hal that referenced this pull request Jan 23, 2025
This implementation mirrors how the ESP-IDF implementation of this
feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom
function) works except it differs in a few key ways:

The ESP-IDF seems to map `.text` and `.rodata` into the first and second
128 cache pages respectively (although looking at the linker scripts,
I'm not sure how, but a runtime check confirmed this seemed to be the
case). This is reflected in how the `Cache_Count_Flash_Pages`,
`Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code
executing them works. The count function can only be made to count flash
pages within the first 256 pages (of which there are 512 on the
ESP32-S3). Likewise, the copy function will only copy flash pages which
are mapped within the first 256 entries (across two calls). As the
esp-hal handles mapping `.text` and `.rodata` differently, these ROM
functions are technically not appropriate if more than 256 pages of
flash (`.text` and `.rodata` combined) are in use by the application.

Additionally, the functions both contain bugs, one of which the IDF
attempts to work around incorrectly, and the other which the IDF does
not appear to be aware of. Details of these bugs can be found on the IDF
issue/PR tracker[0][1].

As a result, this commit contains a heavily modified/adjusted rust
re-write of the reverse engineered ROM code combined with a vague port
of the ESP-IDF code.

There are three additional noteworthy differences from the ESP-IDF version
of the code:

1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped
   independently and separately allowing only one to be mapped. But the
   current version of the code does not allow this flexibility. This can
   be implemented by checking the address of each page entry against the
   segment locations to determine which segment each address belongs to.
2. The ESP-IDF calls
   `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));`
   (functions from the ESP-IDF) in order to "Enable the most high bus,
   which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3
   after careful inspection these calls result in a no-op as the address
   passed to cache_ll_l1_get_bus will result in an empty cache bus mask.
   It's currently unclear to me if this is a bug in the ESP-IDF code, or
   if this code (which from cursory investigation is probably not a
   no-op on the -S2) is solely targetting the ESP32-S3.
3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address
   when copying `.text` and a dcache address when copying `.rodata`.
   This affects which cache the reads will occur through. But the writes
   always go through a "spare page" (name I came up with during reverse
   engineering) via the dcache. This code performs all reads through the
   dcache. I don't know if there's a proper reason to read through the
   correct cache when doing the copy and this doesn't appear to have any
   negative impact.

[0]: espressif/esp-idf#15262
[1]: espressif/esp-idf#15263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants