Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HMSDK v3.0] Issue of Tiering Memories #5

Open
JongminKim-KU opened this issue Dec 29, 2024 · 3 comments
Open

[HMSDK v3.0] Issue of Tiering Memories #5

JongminKim-KU opened this issue Dec 29, 2024 · 3 comments

Comments

@JongminKim-KU
Copy link

JongminKim-KU commented Dec 29, 2024

Hello, I'm Jongmin Kim from Korea University.

I am pleased to hear about the release of HMSDK 3.0 and that key functionalities of "bandwidth expansion" and "capacity expansion" have been merged into the main branches of Linux and DAMON, respectively.

However, I am encountering an issue with tiering memories between DIMMs and two CXL devices from SK hynix.
The output of the following commands indicates that the Linux kernel is detecting only one CXL device out of two.
numactl -H shows NUMA nodes 0, 1, and 2, not including 3.
cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist: 0-1
cat /sys/devices/virtual/memory_tiering/memory_tier35/nodelist: 2

Due to kernel version differences (v6.6 and v6.12), the HMSDK 2.0 solution I attempted appears to be incompatible, suggesting the need for a new approach.

Thank you so much for your hard work and dedication.

@JongminKim-KU JongminKim-KU changed the title [HMSDK-3.0] Issue of Tiering Memories [HMSDK v3.0] Issue of Tiering Memories Dec 29, 2024
@honggyukim
Copy link
Member

honggyukim commented Dec 29, 2024

Hi @JongminKim-KU, thanks for the report and your kind appreciation.

I am encountering an issue with tiering memories between DIMMs and two CXL devices from SK hynix.
The output of the following commands indicates that the Linux kernel is detecting only one CXL device out of two.

Could you tell us your kernel version with uname -r? If you're using custom built kernel, then please make sure if you added INSTALL_MOD_STRIP=1 during the kernel build as our kernel build guide at https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion#building-kernel.

$ sudo make INSTALL_MOD_STRIP=1 modules_install

We also saw the same problem when building the kernel without INSTALL_MOD_STRIP=1 and we found that it's because the physical address of the second CXL memory goes beyond the detectable range when the binary isn't striped.

cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist: 0-1
cat /sys/devices/virtual/memory_tiering/memory_tier35/nodelist: 2

From hmsdk v3.0, it no longer uses memory tiering information from the above info. Instead,gen_migpol.py script receives the migration destination node explicitly.

$ ./tools/gen_migpol.py --help
usage: gen_migpol.py [-h] [-d SRC DEST] [-p SRC DEST] [-g] [-o OUTPUT]
        ...
  -d SRC DEST, --demote SRC DEST
                        source and destination NUMA nodes for demotion.
  -p SRC DEST, --promote SRC DEST
                        source and destination NUMA nodes for promotion.
        ...

https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion#gen_migpolpy

@JongminKim-KU
Copy link
Author

JongminKim-KU commented Dec 29, 2024

Thank you for your prompt reply!

You were correct. I copied a config file of custom Linux kernel.
The issue was resolved after I re-installed HMSDK 3.0 following the instruction.
Both CXL devices are now being successfully identified.
I missed that the installation instructions were updated recently.

Thank you again for your assistance.
Wishing you a happy and prosperous New Year!

@honggyukim
Copy link
Member

I'm glad to hear that you've fixed the problem with INSTALL_MOD_STRIP=1. The solution was previously found by @kyet.

Thanks and happy new year you too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants