Why accessing memory cause numerous pgfault? #300

xfan1024 · 2024-12-21T17:41:16Z

Description

I've noticed that on the SG2042, extensive memory access operations in user mode result in significant time spent in kernel mode. It seems that many page faults are occurring in the kernel.

Typically, page faults happen during the initial memory access or if the system has swap enabled. However, even with swap disabled and after the initial access, there are still numerous page faults. Are these page faults necessary? If not, can they be optimized?

Steps to reproduce

pgfault.py

this script help to monitor the pgfault count.

import time

def read_pgfault():
    with open("/proc/vmstat", "r") as f:
        for line in f:
            if line.startswith("pgfault"):
                return int(line.split()[1])
    return 0

def main():
    previous_pgfault = None
    while True:
        current_pgfault = read_pgfault()
        if previous_pgfault is not None:
            diff = current_pgfault - previous_pgfault
            print(f"{time.strftime('%Y-%m-%d %H:%M:%S')} Current pgfault: {current_pgfault}, Diff: {diff}")
        previous_pgfault = current_pgfault
        time.sleep(1)

if __name__ == "__main__":
    main()

memtest.c

this is the test code to access memory.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 64
#define NUM_ELEMENTS ((size_t)((1ull * 1024 * 1024 * 1024) / sizeof(uint64_t) / NUM_THREADS))
#define NUM_ITERATIONS 128

struct thread_data
{
    uint64_t *data;
    size_t elements;
    size_t iterations;
};

void memtest(uint64_t *data, size_t elements)
{
    for (size_t i = 0; i < elements; i++)
        data[i] = (uint64_t)i;
}

void *thread_memtest(void *arg)
{
    struct thread_data *data = (struct thread_data *)arg;
    for (size_t i = 0; i < data->iterations; i++)
        memtest(data->data, data->elements);
    return NULL;
}

int main(int argc, char **argv)
{
    pthread_t threads[NUM_THREADS];
    struct thread_data thread_data[NUM_THREADS];
    
    for (size_t i = 0; i < NUM_THREADS; i++)
    {
        thread_data[i].data = (uint64_t *)malloc(NUM_ELEMENTS * sizeof(uint64_t));
        thread_data[i].elements = NUM_ELEMENTS;
        thread_data[i].iterations = NUM_ITERATIONS;
    }

    printf("press enter to warm up");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        memtest(thread_data[i].data, thread_data[i].elements);

    printf("press enter to start test");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_create(&threads[i], NULL, thread_memtest, &thread_data[i]);

    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Test Results

Test on SG2042 (linux 6.6)

warm up stage

2024-12-21 12:25:30 Current pgfault: 747897, Diff: 0
2024-12-21 12:25:31 Current pgfault: 762836, Diff: 14939
2024-12-21 12:25:32 Current pgfault: 781113, Diff: 18277
2024-12-21 12:25:33 Current pgfault: 781113, Diff: 0

test stage

A large number of pgfaults occur here

2024-12-21 12:25:34 Current pgfault: 781113, Diff: 0
2024-12-21 12:25:35 Current pgfault: 781247, Diff: 134
2024-12-21 12:25:36 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:37 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:38 Current pgfault: 785357, Diff: 4110
2024-12-21 12:25:39 Current pgfault: 800029, Diff: 14672
2024-12-21 12:25:40 Current pgfault: 817000, Diff: 16971
2024-12-21 12:25:41 Current pgfault: 834280, Diff: 17280
2024-12-21 12:25:43 Current pgfault: 836192, Diff: 1912
2024-12-21 12:25:44 Current pgfault: 836320, Diff: 128
2024-12-21 12:25:45 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:46 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:47 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:48 Current pgfault: 836362, Diff: 42
2024-12-21 12:25:49 Current pgfault: 836362, Diff: 0

Test on x86_64

only warm up stage cause pgfault, test stage no pgfault.

warm up stage

2024-12-22 01:39:34 Current pgfault: 1235160, Diff: 0
2024-12-22 01:39:35 Current pgfault: 1268376, Diff: 33216
2024-12-22 01:39:36 Current pgfault: 1268376, Diff: 0

test stage

These 134 pgfaults should not be caused by accessing the data array but by starting the threads.

2024-12-22 01:39:38 Current pgfault: 1268376, Diff: 0
2024-12-22 01:39:39 Current pgfault: 1268510, Diff: 134
2024-12-22 01:39:40 Current pgfault: 1268510, Diff: 0

xfan1024 · 2024-12-21T17:54:10Z

In the previous linux-6.1.55, the speed of concurrent memory access by 64 threads sometimes was even below 10MB/s.(not per thread speed, it's speed of all threads)

DC driver is using two different values to define the maximum number of surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as the unique definition for surface updates across DC. It fixes page fault faced by Cosmic users on AMD display versions that support two overlay planes, since the introduction of cursor overlay mode. [Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b [ +0.000015] #PF: supervisor read access in kernel mode [ +0.000006] #PF: error_code(0x0000) - not-present page [ +0.000005] PGD 0 P4D 0 [ +0.000007] Oops: Oops: 0000 [sophgo#1] PREEMPT SMP NOPTI [ +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ sophgo#300 [ +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 [ +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper] [ +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b [ +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206 [ +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070 [ +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000 [ +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000 [ +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000 [ +0.000004] FS: 0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0 [ +0.000005] Call Trace: [ +0.000011] <TASK> [ +0.000010] ? __die_body.cold+0x19/0x27 [ +0.000012] ? page_fault_oops+0x15a/0x2d0 [ +0.000014] ? exc_page_fault+0x7e/0x180 [ +0.000009] ? asm_exc_page_fault+0x26/0x30 [ +0.000013] ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000739] ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu] [ +0.000470] update_planes_and_stream_state+0x49b/0x4f0 [amdgpu] [ +0.000450] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu] [ +0.000446] update_planes_and_stream_v2+0x24a/0x590 [amdgpu] [ +0.000464] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? sort+0x31/0x50 [ +0.000007] ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu] [ +0.000508] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu] [ +0.000377] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm] [ +0.000058] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_default_wait+0x8c/0x260 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? wait_for_completion_timeout+0x13b/0x170 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_wait_timeout+0x108/0x140 [ +0.000010] ? commit_tail+0x94/0x130 [drm_kms_helper] [ +0.000024] ? process_one_work+0x177/0x330 [ +0.000008] ? worker_thread+0x266/0x3a0 [ +0.000006] ? __pfx_worker_thread+0x10/0x10 [ +0.000004] ? kthread+0xd2/0x100 [ +0.000006] ? __pfx_kthread+0x10/0x10 [ +0.000006] ? ret_from_fork+0x34/0x50 [ +0.000004] ? __pfx_kthread+0x10/0x10 [ +0.000005] ? ret_from_fork_asm+0x1a/0x30 [ +0.000011] </TASK> Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode") Suggested-by: Leo Li <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693 Signed-off-by: Melissa Wen <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 1c86c81) Cc: [email protected]

commit 7de8d5c upstream. DC driver is using two different values to define the maximum number of surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as the unique definition for surface updates across DC. It fixes page fault faced by Cosmic users on AMD display versions that support two overlay planes, since the introduction of cursor overlay mode. [Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b [ +0.000015] #PF: supervisor read access in kernel mode [ +0.000006] #PF: error_code(0x0000) - not-present page [ +0.000005] PGD 0 P4D 0 [ +0.000007] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ #300 [ +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 [ +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper] [ +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b [ +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206 [ +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070 [ +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000 [ +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000 [ +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000 [ +0.000004] FS: 0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0 [ +0.000005] Call Trace: [ +0.000011] <TASK> [ +0.000010] ? __die_body.cold+0x19/0x27 [ +0.000012] ? page_fault_oops+0x15a/0x2d0 [ +0.000014] ? exc_page_fault+0x7e/0x180 [ +0.000009] ? asm_exc_page_fault+0x26/0x30 [ +0.000013] ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000739] ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu] [ +0.000470] update_planes_and_stream_state+0x49b/0x4f0 [amdgpu] [ +0.000450] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu] [ +0.000446] update_planes_and_stream_v2+0x24a/0x590 [amdgpu] [ +0.000464] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? sort+0x31/0x50 [ +0.000007] ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu] [ +0.000508] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu] [ +0.000377] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm] [ +0.000058] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_default_wait+0x8c/0x260 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? wait_for_completion_timeout+0x13b/0x170 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_wait_timeout+0x108/0x140 [ +0.000010] ? commit_tail+0x94/0x130 [drm_kms_helper] [ +0.000024] ? process_one_work+0x177/0x330 [ +0.000008] ? worker_thread+0x266/0x3a0 [ +0.000006] ? __pfx_worker_thread+0x10/0x10 [ +0.000004] ? kthread+0xd2/0x100 [ +0.000006] ? __pfx_kthread+0x10/0x10 [ +0.000006] ? ret_from_fork+0x34/0x50 [ +0.000004] ? __pfx_kthread+0x10/0x10 [ +0.000005] ? ret_from_fork_asm+0x1a/0x30 [ +0.000011] </TASK> Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode") Suggested-by: Leo Li <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693 Signed-off-by: Melissa Wen <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 1c86c81) Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why accessing memory cause numerous pgfault? #300

Why accessing memory cause numerous pgfault? #300

xfan1024 commented Dec 21, 2024 •

edited

Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading

Why accessing memory cause numerous pgfault? #300

Why accessing memory cause numerous pgfault? #300

Comments

xfan1024 commented Dec 21, 2024 • edited Loading

Description

Steps to reproduce

pgfault.py

memtest.c

Test Results

Test on SG2042 (linux 6.6)

Test on x86_64

xfan1024 commented Dec 21, 2024 • edited Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading