Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why accessing memory cause numerous pgfault? #300

Open
xfan1024 opened this issue Dec 21, 2024 · 1 comment
Open

Why accessing memory cause numerous pgfault? #300

xfan1024 opened this issue Dec 21, 2024 · 1 comment

Comments

@xfan1024
Copy link

xfan1024 commented Dec 21, 2024

Description

I've noticed that on the SG2042, extensive memory access operations in user mode result in significant time spent in kernel mode. It seems that many page faults are occurring in the kernel.

Typically, page faults happen during the initial memory access or if the system has swap enabled. However, even with swap disabled and after the initial access, there are still numerous page faults. Are these page faults necessary? If not, can they be optimized?

Steps to reproduce

pgfault.py

this script help to monitor the pgfault count.

import time

def read_pgfault():
    with open("/proc/vmstat", "r") as f:
        for line in f:
            if line.startswith("pgfault"):
                return int(line.split()[1])
    return 0

def main():
    previous_pgfault = None
    while True:
        current_pgfault = read_pgfault()
        if previous_pgfault is not None:
            diff = current_pgfault - previous_pgfault
            print(f"{time.strftime('%Y-%m-%d %H:%M:%S')} Current pgfault: {current_pgfault}, Diff: {diff}")
        previous_pgfault = current_pgfault
        time.sleep(1)

if __name__ == "__main__":
    main()

memtest.c

this is the test code to access memory.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 64
#define NUM_ELEMENTS ((size_t)((1ull * 1024 * 1024 * 1024) / sizeof(uint64_t) / NUM_THREADS))
#define NUM_ITERATIONS 128

struct thread_data
{
    uint64_t *data;
    size_t elements;
    size_t iterations;
};

void memtest(uint64_t *data, size_t elements)
{
    for (size_t i = 0; i < elements; i++)
        data[i] = (uint64_t)i;
}

void *thread_memtest(void *arg)
{
    struct thread_data *data = (struct thread_data *)arg;
    for (size_t i = 0; i < data->iterations; i++)
        memtest(data->data, data->elements);
    return NULL;
}

int main(int argc, char **argv)
{
    pthread_t threads[NUM_THREADS];
    struct thread_data thread_data[NUM_THREADS];
    
    for (size_t i = 0; i < NUM_THREADS; i++)
    {
        thread_data[i].data = (uint64_t *)malloc(NUM_ELEMENTS * sizeof(uint64_t));
        thread_data[i].elements = NUM_ELEMENTS;
        thread_data[i].iterations = NUM_ITERATIONS;
    }

    printf("press enter to warm up");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        memtest(thread_data[i].data, thread_data[i].elements);

    printf("press enter to start test");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_create(&threads[i], NULL, thread_memtest, &thread_data[i]);

    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Test Results

Test on SG2042 (linux 6.6)

warm up stage

2024-12-21 12:25:30 Current pgfault: 747897, Diff: 0
2024-12-21 12:25:31 Current pgfault: 762836, Diff: 14939
2024-12-21 12:25:32 Current pgfault: 781113, Diff: 18277
2024-12-21 12:25:33 Current pgfault: 781113, Diff: 0

test stage

A large number of pgfaults occur here

2024-12-21 12:25:34 Current pgfault: 781113, Diff: 0
2024-12-21 12:25:35 Current pgfault: 781247, Diff: 134
2024-12-21 12:25:36 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:37 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:38 Current pgfault: 785357, Diff: 4110
2024-12-21 12:25:39 Current pgfault: 800029, Diff: 14672
2024-12-21 12:25:40 Current pgfault: 817000, Diff: 16971
2024-12-21 12:25:41 Current pgfault: 834280, Diff: 17280
2024-12-21 12:25:43 Current pgfault: 836192, Diff: 1912
2024-12-21 12:25:44 Current pgfault: 836320, Diff: 128
2024-12-21 12:25:45 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:46 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:47 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:48 Current pgfault: 836362, Diff: 42
2024-12-21 12:25:49 Current pgfault: 836362, Diff: 0

Test on x86_64

only warm up stage cause pgfault, test stage no pgfault.

warm up stage

2024-12-22 01:39:34 Current pgfault: 1235160, Diff: 0
2024-12-22 01:39:35 Current pgfault: 1268376, Diff: 33216
2024-12-22 01:39:36 Current pgfault: 1268376, Diff: 0

test stage

These 134 pgfaults should not be caused by accessing the data array but by starting the threads.

2024-12-22 01:39:38 Current pgfault: 1268376, Diff: 0
2024-12-22 01:39:39 Current pgfault: 1268510, Diff: 134
2024-12-22 01:39:40 Current pgfault: 1268510, Diff: 0

@xfan1024
Copy link
Author

xfan1024 commented Dec 21, 2024

In the previous linux-6.1.55, the speed of concurrent memory access by 64 threads sometimes was even below 10MB/s.(not per thread speed, it's speed of all threads)

unicornx pushed a commit to unicornx/linux-riscv that referenced this issue Jan 15, 2025
DC driver is using two different values to define the maximum number of
surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as
the unique definition for surface updates across DC.

It fixes page fault faced by Cosmic users on AMD display versions that
support two overlay planes, since the introduction of cursor overlay
mode.

[Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b
[  +0.000015] #PF: supervisor read access in kernel mode
[  +0.000006] #PF: error_code(0x0000) - not-present page
[  +0.000005] PGD 0 P4D 0
[  +0.000007] Oops: Oops: 0000 [sophgo#1] PREEMPT SMP NOPTI
[  +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ sophgo#300
[  +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024
[  +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper]
[  +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[  +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b
[  +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206
[  +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070
[  +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000
[  +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000
[  +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000
[  +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000
[  +0.000004] FS:  0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000
[  +0.000005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0
[  +0.000005] Call Trace:
[  +0.000011]  <TASK>
[  +0.000010]  ? __die_body.cold+0x19/0x27
[  +0.000012]  ? page_fault_oops+0x15a/0x2d0
[  +0.000014]  ? exc_page_fault+0x7e/0x180
[  +0.000009]  ? asm_exc_page_fault+0x26/0x30
[  +0.000013]  ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[  +0.000739]  ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu]
[  +0.000470]  update_planes_and_stream_state+0x49b/0x4f0 [amdgpu]
[  +0.000450]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu]
[  +0.000446]  update_planes_and_stream_v2+0x24a/0x590 [amdgpu]
[  +0.000464]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? sort+0x31/0x50
[  +0.000007]  ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu]
[  +0.000508]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu]
[  +0.000377]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm]
[  +0.000058]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? dma_fence_default_wait+0x8c/0x260
[  +0.000010]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? wait_for_completion_timeout+0x13b/0x170
[  +0.000006]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? dma_fence_wait_timeout+0x108/0x140
[  +0.000010]  ? commit_tail+0x94/0x130 [drm_kms_helper]
[  +0.000024]  ? process_one_work+0x177/0x330
[  +0.000008]  ? worker_thread+0x266/0x3a0
[  +0.000006]  ? __pfx_worker_thread+0x10/0x10
[  +0.000004]  ? kthread+0xd2/0x100
[  +0.000006]  ? __pfx_kthread+0x10/0x10
[  +0.000006]  ? ret_from_fork+0x34/0x50
[  +0.000004]  ? __pfx_kthread+0x10/0x10
[  +0.000005]  ? ret_from_fork_asm+0x1a/0x30
[  +0.000011]  </TASK>

Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode")
Suggested-by: Leo Li <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693
Signed-off-by: Melissa Wen <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 1c86c81)
Cc: [email protected]
xingxg2022 pushed a commit that referenced this issue Jan 17, 2025
commit 7de8d5c upstream.

DC driver is using two different values to define the maximum number of
surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as
the unique definition for surface updates across DC.

It fixes page fault faced by Cosmic users on AMD display versions that
support two overlay planes, since the introduction of cursor overlay
mode.

[Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b
[  +0.000015] #PF: supervisor read access in kernel mode
[  +0.000006] #PF: error_code(0x0000) - not-present page
[  +0.000005] PGD 0 P4D 0
[  +0.000007] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ #300
[  +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024
[  +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper]
[  +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[  +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b
[  +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206
[  +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070
[  +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000
[  +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000
[  +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000
[  +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000
[  +0.000004] FS:  0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000
[  +0.000005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0
[  +0.000005] Call Trace:
[  +0.000011]  <TASK>
[  +0.000010]  ? __die_body.cold+0x19/0x27
[  +0.000012]  ? page_fault_oops+0x15a/0x2d0
[  +0.000014]  ? exc_page_fault+0x7e/0x180
[  +0.000009]  ? asm_exc_page_fault+0x26/0x30
[  +0.000013]  ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[  +0.000739]  ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu]
[  +0.000470]  update_planes_and_stream_state+0x49b/0x4f0 [amdgpu]
[  +0.000450]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu]
[  +0.000446]  update_planes_and_stream_v2+0x24a/0x590 [amdgpu]
[  +0.000464]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? sort+0x31/0x50
[  +0.000007]  ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu]
[  +0.000508]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu]
[  +0.000377]  ? srso_return_thunk+0x5/0x5f
[  +0.000009]  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm]
[  +0.000058]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? dma_fence_default_wait+0x8c/0x260
[  +0.000010]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? wait_for_completion_timeout+0x13b/0x170
[  +0.000006]  ? srso_return_thunk+0x5/0x5f
[  +0.000005]  ? dma_fence_wait_timeout+0x108/0x140
[  +0.000010]  ? commit_tail+0x94/0x130 [drm_kms_helper]
[  +0.000024]  ? process_one_work+0x177/0x330
[  +0.000008]  ? worker_thread+0x266/0x3a0
[  +0.000006]  ? __pfx_worker_thread+0x10/0x10
[  +0.000004]  ? kthread+0xd2/0x100
[  +0.000006]  ? __pfx_kthread+0x10/0x10
[  +0.000006]  ? ret_from_fork+0x34/0x50
[  +0.000004]  ? __pfx_kthread+0x10/0x10
[  +0.000005]  ? ret_from_fork_asm+0x1a/0x30
[  +0.000011]  </TASK>

Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode")
Suggested-by: Leo Li <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693
Signed-off-by: Melissa Wen <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 1c86c81)
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant