-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path103
24 lines (12 loc) · 8.18 KB
/
103
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Hi. My name is Poorna Kale. I'm a system architect in the CXL team within Micron. So today I'm going to talk a little bit about memory expansion with CXL use cases, which you probably heard already several times today and yesterday, and probably very familiar, but I'll focus my time mostly on what we are doing, Micron is doing, to enable the ecosystem.
Okay. So the pyramid on the left, again, everyone is familiar with that. By now you all understand where CXL falls in there in terms of capacity and bandwidth expansion, right? It is a definitely low latency and a high bandwidth solution. Okay. So on the CXL use cases, I'm showing a few sample workloads here, data center workloads, and I'm also trying to show which workloads benefit from capacity, CXL capacity, and which ones benefit from the bandwidth. So AI/ML, if you take training, training definitely benefits from both capacity and bandwidth, inference, earlier it used to be mostly bandwidth, but with large language models and with, you know, DLRM, we are seeing the value of capacity expansion as well. So overall AI/ML benefits from both capacity expansion and the bandwidth that CXL offers. In-memory databases that can tolerate low latency, that can tolerate a little bit higher latency than DRAM, benefit from capacity expansion, right? We're talking about benchmarks like Redis, SAP, workloads like SAP, HANA. Then the data analytics, again, that can tolerate a little bit higher latency, benefit from capacity expansion. Here I'm talking about example OLAP, TPCH, those kind of examples. And general purpose compute, again, you can think of virtual machines, capacity expansion will help have more applications, more VMs. Last is high performance compute. Obviously, they are highly compute applications, need high compute, and bandwidth is the key for them, right? So that's the, you know, sample workloads and trying to show how CXL capacity and bandwidth can help those workloads.
And how do you use, deploy CXL? If you're looking for capacity expansion, right, let's take the direct attached memory tiering approach. There are several ways you can do it. It's application transparent, memory tiering, you know, it's always managed. And some, you know, vendors have SDK capabilities using user-space library and then Intel has this 2LM mode that was developed for persistent memory. These are some, you know, OS application transparent ways of expanding expansion, right? Then there is application managed, where applications are aware of the memory tiering, the NUMA modes, characteristics, and so they can utilize those characteristics to run the applications, application and allocation of the pages, right? So one library that I mentioned as an example there is libnuma. And then there's this next level where applications actively manage, you know, the pages and managing the NUMA characteristics and the allocation. That's application modified, right? Application needs to be modified for that to get the capacity expansion. Then the next level is the switch. Several vendors talked about switch. This is taking the tiering, adding another tier with higher latencies, right? This is an area where we are working in the software community to add some, you know, sharing to make sharing the data in a fabric attached memory, we are adding some patches in the Linux community. Right. Bandwidth expansion, this is the heterogeneous hardware interleaving that you're interleaving between the local DRAM nodes and the CXL. Second is a software and hardware interleaved approach where you're taking the local NUMA and dividing them into subcluster NUMAs and then the CXL, you're interleaving between the subcluster NUMA nodes and the CXL. The last one is a software-based. And in the software-based, you know, it's fixed allocation between the local NUMA and the tier 2. So, what we are trying to do here is we are trying to -- we introduced a patch trying to get that into the Linux where ratio based, you know, pages will be allocated depending upon the characteristics, bandwidth, latency characteristics. You can allocate, you know, M pages to the local NUMA and then N pages to the CXL memory.
So, this is our product we announced a couple of months ago. CZ120. This is our -- it's available in 128 and 256 gigabyte capacities. So, using these capacities, you can -- if you have -- if you attach four CXL devices, you can get up to 2 terabytes of incremental capacity. So, using MLC, we characterized, and we are able to get 36 gigabytes per second bandwidth with four CXL slots. And then -- sorry. With eight CXL lanes. So, that's 36 gigabytes per second. And that increases your read bandwidth, read/write bandwidth by 24% if you're using 126400 RDMs. 4864 gigabyte RDMs. It's available in E3.S, 2T form factor, PCIe Gen 5, and x8 lanes. So, we ran a Microsoft SQL server database, TPC-H, on a Fortune AMD Epic platform. And what we noticed was when we added to the local DDR, which was 786 gigabyte, if we added another terabyte of CXL, we were able to get the queries per day 96% better. So, you know, it's a scale-up story. Instead of going to two servers, now with one server adding CXL memory, you can do more. Okay.
So, what are we doing to enable the ecosystems with CZ120? So, I've listed some of the important ecosystem players. We have given up to 300 CZ120 samples for free to enable the ecosystem. And I will touch upon what we are doing with each of them.
First is the enablers. That's the CPU vendors, Intel, AMD. Our goal is to achieve, you know, when they launch the CPU, our product is thoroughly validated and qualified. So, we do a lot of joint testing with them, joint qualification, comparability testing, very vigorously, you know, ensure interoperability. So, we have shown some demos at Intel Innovation Forum as well as FMS on this with the CPUs.
Then it's the OEMs. Again, actively get the joint qualification done by the time the systems are launched. So, again, we do a lot of activities with them in terms of interoperability, optimization, and also working on what are the ideal configurations, DDR to CXL ratios, what they should be for different applications, different workloads. So, in our, this is a presentation from FMS, Super Micro platform, but we work with all tier 1 vendors. We're doing joint qualification.
ASIC vendors, our product is launched with the Microchip CXL controller, but we work with all ASIC vendors. And the goal here is to make sure that the industry has options for CXL deployment. So, we work with all major ASIC vendors. We validate, we do joint qualifications, we work on optimizing the power and performance with our DDR4 and DDR5 DIMMs. So, the idea is that industry has options, not only several different CXL controller vendor options, but also different form factors. Ours is E3.S, and you can also get AIC CEM cards from some of these ASIC vendors. OS software, we are very active in the Linux community, working with, working on several patches for ideal utilization of CXL. And we also work with hyperscalers, so hypervisors, and, you know, other software vendors like MemVerge.
This is looking at in-memory database, ISVs, VMs. Again, we are doing a lot of validation, proof points, POCs, joint customer engagements. Essentially, the goal is, you know, making sure their applications run smoothly with CXL in an optimized way, and certification of the, for production with the OEMs.
And the last one in the ecosystem is the test vendors. We are part of the, one of the memory vendors in the integrators list, CXL integrator list. We work actively with the software and hardware test vendors. So the idea, again, is the goal is to have a mature hardware software test ecosystem. We are working with the deep collaboration on early validation, compliance testing, and joint validation.
Oh, one other ecosystem partner, which is the switch vendor. So you probably have seen XConn demo out there with Micron CZ120. So we work with switch vendors and anyone that's working on composable memory solutions. This is looking for, working with them to figure out the proof points, POCs, looking at pooling and sharing future use cases.
This is the full Micron portfolio. Thank you very much. If you are interested in partnering with Micron, go to micron.com/cxl. There's a technical enablement program you can join.