Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64/executor: pmevcntr0_el0 always get 0 in el1. #115

Open
Kingfish404 opened this issue Dec 2, 2024 · 2 comments
Open

arm64/executor: pmevcntr0_el0 always get 0 in el1. #115

Kingfish404 opened this issue Dec 2, 2024 · 2 comments

Comments

@Kingfish404
Copy link

Kingfish404 commented Dec 2, 2024

In my machine, the template_l1d_flush_reload and template_l1d_prime_probe doesn't work as I expect. I located that the problem is in pmevcntr0_el0.

I have tested that using https://github.com/Kingfish404/pmu-utils/, pmevcntr0_el0 could be used to trace events such as L1D cache refills (0x3) in el0 (Linux User Mode).

But, using the feature-arm-port of this repo, adding test code at the end of src/arm64/executor/measurement.c > int config_pfc(void), such as:

    volatile uint64_t test_mem[1024];
    uint64_t pmu_value0, pmu_value1, pmu_value2, pmu_value3;
    asm volatile("mrs %0, pmevcntr0_el0\n" : "=r"(pmu_value0)::);
    for (int i = 0; i < 100; i++)
    {
        asm volatile("mrs %0, PMCCNTR_EL0" : "=r"(pmu_value2));
        asm volatile(
            "dc civac, %0\n"
            "dsb sy\n"
            "isb\n"
            "dc ivac, %0\n"
            ".rept 1024\n"
            "nop\n"
            ".endr\n"
            "ldr x0, [%0]\n" ::"r"(&test_mem[0]) : "x0");
        asm volatile("mrs %0, PMCCNTR_EL0" : "=r"(pmu_value3));
    }
    asm volatile("mrs %0, pmevcntr0_el0\n" : "=r"(pmu_value1)::);
    printk(KERN_ERR "PMU value: %llu, %llu\n", pmu_value1 - pmu_value0, pmu_value0);
    printk(KERN_ERR "PMU value timer: %llu\n", pmu_value3 - pmu_value2);

The PMU value will always get 0. If any setup for ARM that is out of this repo or document?

Tested Environment:

$ lscpu
Architecture:             aarch64
  CPU op-mode(s):         32-bit, 64-bit
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                ARM
  Model name:             Cortex-A76
    Model:                1
    Thread(s) per core:   1
    Core(s) per cluster:  4
    Socket(s):            -
    Cluster(s):           1
    Stepping:             r4p1
    CPU(s) scaling MHz:   62%
    CPU max MHz:          2400.0000
    CPU min MHz:          1500.0000
    BogoMIPS:             108.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
$ uname -a
Linux yujin-desktop 6.8.0-1013-raspi #14-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct  2 15:14:53 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

@cwshugg

@OleksiiOleksenko
Copy link
Contributor

Hi @Kingfish404. Thank you for the report!

We don't maintain the ARM port branch (at least not yet); the branch was more of an experimental effort.

However, the issue sounds familiar. I'll see if I can find a solution, and I'll get back you after holidays.

@cwshugg
Copy link
Collaborator

cwshugg commented Jan 2, 2025

Hey @Kingfish404 - thanks for bringing this up. I've run into this issue as well; when testing outside of a Cortex-A72 on a Raspberry Pi 4, I also observed that pmevcntr0_el0 was not properly counting events. IIRC I couldn't get L1D Cache Refills, or simple events like CPU cycles to count with this setup, when I was debugging last year.

I suspect there is something wrong with the configuration code in the config_pfc function, within measurement.c.

Oleksii - I don't currently have a testing environment to debug this, but I'm happy to help resolve this in any way I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants