90


My name is Jungmin Choi, and I'm a memory system architect at SK Hynix. So today, as you can see here, I'm going to talk about our CXL disaggregated memory solution.

All right, so here is our agenda for this presentation. So first, I'm going to talk about some motivations, in particular some challenges in today's data center, why we should focus on the CXL disaggregated memory system. Then I'm going to talk about the Niagara, Charles said in his presentation, which is our CXL disaggregated memory prototype. And also I will explain two powerful use cases of the Niagara, which can mitigate above challenges. And next, I will briefly explain our research items for how to assist the future of Niagara. And I will end this presentation with an introduction to our next step.

All right, so let's talk about the challenges ahead when we are expanding our system memory. So the volume of data grows exponentially, so we need more cores, and it requires a continued increase in memory bandwidth and capacity. But due to some various reasons, including implementation cost and signal integrity issues, the gap between such a requirement and platform capability is growing. So basically, we need a new approach, and we believe that the CXL-based memory expansion is a decent solution to address these issues. And CXL can also enable the memory disaggregation system beyond memory expansion.

All right, so from now on, I'm going to talk about the challenges that today's data center is facing. So the first challenge is that the memory utilization of the compute cluster varies time to time based on the working size. But due to the inflexible nature of cloud hardware platform, as shown on the left figure, some memory resources are made underutilized, leading to memory stranding. Memory is a costly resource in today's data center, so the memory stranding due to overprovisioning causes significant resource waste in today's data center. And also, there can be a temporary skew in memory usage, causing data spill, as shown on the right side. So the storage swap can lead to system performance degradation.

OK, so this is a second challenge. So a distributed computing system can be a powerful solution for handling large-scale AI applications, in particular, the emergence of generative AI. But I think it is very important to note that there can be data transfer overhead between multiple nodes. And also, there is a duplication of shared data between multiple nodes, and it can increase the local memory pressure in each node. So up to this point, we have looked two issues of today's data center. And now, let's explore a solution to address these challenges.

All right, so we propose a CXL-based disaggregated memory solution that supports memory pooling and memory sharing capabilities. So memory pooling can mitigate memory stranding and data spill issues by sharing disaggregated memory resources between multiple nodes. And as I will explain about it later, we support a solution that can dynamically allocate memory resources to each node at runtime. And the second, memory sharing can eliminate data transfer overhead and also data duplication by sharing data or the object between multiple nodes.

So yeah, as I said, Niagara is a 4U, FPGA-based CXL-disaggregated memory prototype. And up to four host servers can be connected, and the Niagara supports up to four channel DDR4 DIMMs. And the maximum memory capacity is 1 terabyte. You can see the Niagara specification, as you've shown on the left table. This is a FPGA-based prototype, so the performance is a little bit lower than ASIC or product level, but even though we were able to achieve a meaningful result through our benchmark experiment. And I will show you that result in the following pages. OK, so Niagara supports memory pooling and sharing capability and also supports some hardware assist feature. And I will also discuss about it later. The figure on the right side is our Rackscaler system with our Niagara platform. So as I said, Niagara can connect up to four host servers. So and yeah, actually, I'd like to invite you to our demo booth to see the real Rackscaler system in action.

OK, so from now on, I will introduce two powerful use cases of CXL-designated memory that can overcome the challenges we discussed. So first, let's dive into the memory pooling. Niagara supports dynamic capacity service. It is a hardware software integrated solution for memory pooling. And this is a solution that can dynamically allocate the memory resources to host at runtime, I mean, without any system reset or reboot. As you may know, DCS is very similar to the DCD, the dynamic capacity device feature described in CXL 3.1 specification. And one of the differences is that DCD is a fabric manager-driven system, but the DCS is just a host-driven system. So the DCS is our very initial version to support a memory pooling capability. And as I will talk about later, but we are also working on a DCD design, which complies with the CXL specification. Anyway, we have demonstrated that the memory utilization can be improved with our DCS solution. And the bottom layer, you can see the hardware software architecture of DCS. So the left side is software architecture for DCS. So on each node, the memory pooling daemon can monitor the free memory space of the VM. And the free memory is running low. The daemon requests memory allocation to VM scheduler located in the controller node or the master node. And then the VM scheduler can instruct also the memory allocation request to the VM manager of the node where the VM is located to allocate the memory. After that, the VM manager provides the allocated CXL memory to the real VM. But if the CXL memory is insufficient, then it can also request a memory allocation to the hardware device, our Niagara platform. And actually, the process of deallocating mechanism is very similar to that. And the right side is hardware architecture for the DCS. So the pooled memory manager, the PMM, can communicate with the host using the mailbox to allocate or deallocate CXL memory regions based on the host request. And if some memory regions are allocated to the specific host, then the PMM can update the host ID or the ownership information of that memory region to memory section table. So that the memory protection unit, the MPU, can detect or drop some unauthorized CXL.mem traffic to the specific memory region. We also have a secure eraser which can do zeroing or randomizing to the specific memory region due to some security issue based on the user request.

You can see the evaluation result of memory pooling. And actually, we collaborated with our MemVerge to develop this solution and also evaluate the performance of the memory pooling capability using Cloud Suite Benchmark. And we demonstrated that even an FPGA-based memory system can mitigate performance loss due to swap memory usage instead of NVMe. So spilling the bottom line, spilling 20% of data to Niagara outperforms spill to NVMe by up to 2.5 times. So in other words, if Niagara can provide 20% of CXL memory to host, then we actually can dramatically improve the system performance in this benchmark scenario. The table on the right side shows the result of improved memory utilization when the DCS is enabled to Kubernetes compared to the baseline. So you can see that both baseline and DCS enabled Kubernetes have the same local and CXL memory capacity. But when the page rank workload is running, the DCS enabled Kubernetes can improve memory utilization by about 35% compared to the baseline as the disaggregated memory resources are dynamically allocated.

OK, this is our second use case, memory sharing. So multiple hosts can access Niagara as a shared memory region. So if some data should be shared between multiple nodes, it can be right into the CXL shared memory region from the writer node. And then the reader node can read that shared data or shared object from that memory region. So it's very similar to that SWMR mechanism. And this enables data sharing between multiple nodes without data transfer overhead via network. And also, data duplication can be prevented as there is no need for shared object or shared data copy.

So yeah, you can also see the evaluation result of memory sharing. And we also collaborated with the MemVerge to evaluate the performance of memory sharing capability. So the left graph shows our benchmark result for Ray. And you know that Ray is open source-based distributed computing framework. And by applying Niagara memory sharing capability to Ray, we can eliminate node-to-node data transfer overhead, resulting in a performance improvement of up to 5.9 times compared to the native Ray. And the graph on the right side is Niagara-based Spark Join performance evaluation result that we designed. So we demonstrated that the performance can be improved by up to 1.8 times compared to the baseline by eliminating the shuffle operation between multiple nodes. And in addition to these two use cases, we are also actively investigating other hardware use cases, hardware-assisted features to enhance CXL-disaggregated memory system efficiency.

All right, so as you can see here, Niagara supports memory pooling and sharing function as well as other hardware-assisted features. So first, block data management that can reduce data migration overhead by moving or copying data within CXL memory. And I think something like VM live migration could be useful, but we are still looking at the other use cases. And second is snapshot, and it is a function that directly saves or restores data between CXL memory and storage device. And we saw that the host CPU I/O handling burden can be reduced by uploading this snapshot function to the memory device. The last but not least is memory failure prediction. So you know that CXL-disaggregated memory system connects multiple hosts. It means memory failure can have a significant negative impact on the system. So we believe that memory failure prediction can improve reliability of CXL-disaggregated memory system. OK, so apart from the research items introduced here, we are exploring, also exploring, various other hardware-assisted features. So if you have any interest about these items, we are always open to collaborate with you.

OK, so our next step involves preparing Niagara 2.0, which will be available by the end of this year. So Niagara 2.0 is a 2U-based CXL-disaggregated memory prototype, which can connect up to eight host servers. And we plan to support, as I said, the DCD feature described in CXL specification. So if devices support DCD capability, the server cluster configuration can be changed, as shown on the right side. So overall, we are very much looking for collaboration with our industry partners to enable the CXL hardware-assisted ecosystem.

All right, so lastly, I'd like to introduce our live demo. So through our live demo, we are showcasing the dynamic memory allocation for VMs across four different host servers. You know, unfortunately, today is the last day. But if you visit our demo booth, number A8, you can see the Niagara live demo in more detail with a real Rackscale system, as shown on the right side. All right, so that's all I have for this presentation. Thank you for your attention.