-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path264
58 lines (29 loc) · 40.5 KB
/
264
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Hi everyone, thank you all for joining us today for the CXL Consortium webinar, Compute Express Link Supporting Persistent Memory. Today's webinar will be presented by Mahesh Natu, Data Center Platform Architect at Intel, and Thomas Won Ha Choi, Director DRAM Product Planning and Enabling at SK Hynix. We'll set aside approximately 15 minutes at the end of the presentation for Q&A, so please enter any questions you have into the chat window at any time. Now I'll hand it off to Mahesh to begin.
Thank you, Kevin, and thanks everyone for attending. So before we get into the main topic, let me set up some context. As everyone here is probably aware, there have been significant changes in the industry over the last few years. Cloud computing is everywhere. There's just tremendous growth happening in the areas of AI, machine learning, and analytics in terms of processing data, more and more data. And now we are seeing that the network and edge, which used to be standalone entities, are now becoming more like clouds. So with these and other changes, industry is looking to solve a number of key problems. So next slide, please.
And that's what the CXL Consortium is trying to address. So if you look at the consortium as a whole, we have great participation from all the major industry players. The list of board of directors are shown above. Very well representation from all types of vendors, processor vendors, IHVs, and software vendors, and system vendors, and all the CSPs. So CXL is basically an open industry standard that provides a high-speed communication path. And because everybody understands the problems and everybody believes that CXL is the right solution for that, we have about 150 members right now and growing. So we have a great momentum behind CXL. And the reason why CXL is so exciting to everyone is on the next slide.
So if you look at some of the key challenges that folks are facing in order to address some of the market forces that we saw, they're just increasing demand for faster data processing. We just so much data to be processed. That implies that it needs to be processed faster and to get more insights. That drives higher next-gen data center performance. There's a demand for computing to be heterogeneous. So instead of having homogeneous computing with one type of computing engine, folks are looking at mixing and matching different types of compute engine, different types of memory to optimize it for workloads. There's disaggregation happening through cloud. We're already seeing evidence of I/O pooling, where I/O being disaggregated. There's great interest in memory also being separated out. So the compute node and the memory node are sort of in a separate box in a chassis, and you can do a mix and match and pooling of memory. There's obviously need more memory capacity bandwidth as the data needs to go up. You need more memory to feed the data, store the data. And the problem that everyone is struggling with was that there is not a great industry standard that's open for everyone to participate that defines an interconnect that meets all these above challenges. So that's where CXL comes in. Like I mentioned, CXL is an open standard. It provides a cache coherent interconnect that allows the processors, the memory expansion and accelerators to share memory as well as the caches and get us to the next level of computing. I like to think that CXL is going to change the way we compute. And this is some of the reasons why. Just to summarize what CXL is, for folks who are not familiar, it's a coherent interface. It is built on top of PCI Express. So we can leverage the PCI Express ecosystem in terms of the form factor, the physical layer and all of that. And we provide a strong backwards compatibility. Devices are designed for CXL 1.1 specification will work with a 2.0 system and vice versa. And that's great for industry in terms of pre-coding investment. The major selling point of CXL is low latency. It is really targeted at the heterocomputing where the memory and cache accesses across the CPU and accelerator and other agents are almost seamless. And another design decision that CXL made that is helping us a big way is that it's somewhat a symmetric protocol. The cache coherency is asymmetric in the sense that the CPU or the host is responsible for the heavy lifting, whereas the device is a little bit easier in terms of designing the cache coherency. The burden on the device is deliberately less, which allows the vendors to build devices quickly, plug it in and support the new and exciting workload. So that's where CXL is. It's essentially the right features and right architecture for the key problems that industry is facing. And that's really why we see so much momentum behind it. Next slide, please.
So where does persistent memory fit into the whole scheme? So persistent memory is one of the key usage that CXL. So CXL, if you have attended the previous webinars, and if you haven't, I would encourage you to go look at these either on the CXL Consortium or on the BrightTrack. CXL has three key usages, type one, type two, type three. Type one and type two that are shown on the left side are accelerators. Type one has no memory. Type two has memory that's cache coherent with the system memory. Type three is where we'll focus on today, and specifically in type three devices, persistent memory devices. So type three devices are basically memory expanders that communicate with the processor using sort of two sub protocols of CXL. CXL.io, which is for innovation. CXL.memory, which is for a memory protocol that's transactional in nature. The processor or other CXL agents like type one or type two can use these interfaces to communicate with the memory expander device, just as if they were native attached to their memory like up to date. And the memory expanders can be used, or type three can be used for a couple of reasons. It's a good way to add more bandwidth to the system. It's also a good way to expand the capacity. But the other key one that is expressing really a lot of interest is being able to attach persistent memory devices to the CXL interface. And I'll get into why persistent memory and CXL are really a good match in the next slide.
So just before we go in there, some little bit background on persistent memory for folks who may not be totally familiar with that. So some of the key aspects of the persistent memory or PMEM as it's called, are A, it's byte-addressable. Again, differentiate that with NVMe, which is block-addressable. Generally has low latencies when you compare to SSDs. It is cacheable when attached to CXL versus NVMe, which is PCI Express attached is un-cached. Persistency means the data has to stick around after power loss, unlike DRAM where when you lose power, the DRAM loses content. And generally, these have a larger capacity versus DRAM driven generally by use of different media. But these are general characteristics and benefits of the PMEM. And industry is finding this, there's a number of workloads that are benefiting greatly from PMEM. Good examples are the standard databases. Some of the bottlenecks that the databases have is logging journaling. And PMEM allows speeding up those operations, thereby speeding up the entire process. Recovery can be instantaneous because there's no need to go back to the disk and fix up any error that may be there. Analytics, AI, ML, again, they are data-hungry applications. They need real-time access to large data sets. And PMEM helps them greatly. Again, they do need to do checkpointing, and that can be a bottleneck in the overall performance. The PMEM helps there too. Storage is PMEM is used for caching and also for adding a new tier to the storage hierarchy, again improving performance and flexibility. HPC also, there's exciting usages. Again, HPC also tends to use checkpointing a lot to make sure the data is not lost across these checkpoints. And that overhead can be significant. So PMEM helps there too. And again, there's more applications that industry is developing and realizing can benefit from this exciting breakthrough technology. Again, just to summarize, right on the right-hand side shows a picture where a CXL PMEM device is attached to the processor through CXL interface. And you can see the green block indicating that the processor is now able to cache the PMEM content. Unlike NVMe where the processor is connected to PCI Express and there is no caching. So that's a fundamental difference that CXL brings to the table. Next slide, please.
Then let's get into why I'm arguing that the CXL is an ideal protocol for attaching the PMEM. So first of all, it's a transactional protocol. You can see the picture below. On the picture, it shows a home agent host, right? A typical, this would be the processor. And a memory controller device on the right-hand side, which would be the PMEM device. And if you see the transactions of the request and response type, there's a request that the home agent or the CPU will send, processor will send to the device and the device will respond. They can be read request or the write request. Those are the ones that are shown with data and the device may respond with or without data. But the key thing is this transactional, which means the controller on the device is able to insert delays. And that's important, right? Because PMEM media often have longer latencies or they may have variable latencies because of the media management operations that sometimes have to happen. So the protocol nature being transactional helps greatly. The other key aspect is the CXL.mem has the abstraction level that we've chosen is higher than what you would see typically for, let's say, DDR. The memory controller and the media are abstracted from the CXL.mem. As you can see, the processor or the other agents can send memory request and just get data back. There is no notion of type of media that's used. It's completely hidden behind the type 3 device. And again, it allows industry to innovate and build new media types, experiment them, do mix and match, different types of memory controller architectures. All of that is completely part of the type 3 device and the vendor can design based on the product needs. Some of the things that we added to CXL 2.0 that also help with PMEM is called memory QoS. One of the challenges that systems have when in a heterogeneous memory configuration, for example, a system that has both DRAM as well as the persistent memory is that there could be a performance difference between the two. Like I mentioned, it's often the case that the PMEM may have a higher latency, potentially lower bandwidth compared to DRAM. So that may lead to head of line blocking, meaning the PMEM accesses can block transactions going to DRAM and slowing down the entire system. The memory QoS feature in the CXL 2.0 allows the device to synchronously report how busy it is. A type 3 device can say, "Oh, I'm really, really busy right now. Please don't send me any more transactions because I'm just going to fight you. It'll just back things up." So using that feedback mechanism, the processor or the other agents that are requesting can make intelligent decisions regarding when to send more requests to the PMEM device and when not to, and allowing the DRAM access to make forward progress in parallel. Again, some of these things are very well described in other webinars that have been aired before, so I encourage you to go look at those. The other thing that was added in CXL 2.0 is a few things actually added based on the industry feedback is ability to interleave memory and configuration of the memory device using a standard register interface. And the third thing that I'll talk about is global persistent flush, often called a GPF. So let's get into the memory interleaving and then take it from there. The next slide, please.
So this picture shows how CXL 2.0 allows memory interleaving. Again, this is a feature that was added in CXL 2.0, but can be leveraged by CXL 1.1 devices as well. So inherently, performance requires high bandwidth, and interleaving is one way to essentially get effectively higher bandwidth by spreading the accesses across multiple devices. And this is a common technique used in DRAM, where the memory DIMMs can be interleaved to get effectively higher bandwidth. So in the CXL case, similar things can be done. So the example on the right shows an eight-way interleaved CXL device. So all the way at the bottom, I'm showing two devices, but imagine there's a total of eight devices at that level, and using decoders in the processor, as well as the CXL switch, these eight devices can be interleaved in this example shown at 1K granularity to effectively get better performance. So again, probably should refer to the CXL specifications for some of the details as to how this works, but at a high level, both the CPU processor root port, as well as the switch, are selecting using a certain address bit in the host address space to figure out which device to target. And the device, when it receives the address, is now converting the address, the host physical address, or HPA, into its local address, and then responding to the request. So from the device perspective, it is eight-way interleaved at 1K granularity, which means the address range, the first 1K address will go to the first device, the next 1K chunk will go to the next device, and so on and so forth. And after the 8K, after eight of these, it'll go back to the first device and keep going. So this helps performance greatly and is expected to be widely used. Next slide, please.
The other key aspect, and sometimes this gets sort of overlooked in terms of the benefit, but it's really, really critical. And let me take some time to explain what this is. So persistent memory, specifically, requires significant management and provisioning support from system software. One example would be a persistent memory device may need to be partitioned. One may want to create volumes of file systems on it. After all, it's a persistent memory device that may store persistent data. The device may report its health through smart or similar mechanisms, and the system software, the OS, may want to keep track of that. The device may have bad locations, and the system software would need to stay away from those so it doesn't keep creating problems. And one of the challenges that we faced going to CXL when we wanted to look into the PMEM, expanding the PMEM in CXL was that there was not a standard interface by which the OS could do it. And obviously, the problem with that is then each vendor that builds a PMEM device needs to create its own driver and need to make sure every OS, the OS distribution, has that driver included. And that can be a big challenge for ecosystem and get in the way of enabling the devices. So what we did is in the CXL specification, we defined a standard register interface so the OS software can manage the CXL attached memory device that includes PMEM as well in a standard way. Again, the volatile devices don't require so much management, but they do require some. For example, you may want to do a firmware update of the device, but the PMEM is where a lot of the focus is. And what that gives us is the ability to create a generic memory device driver. And so every OS can just have a standard memory device driver that works for every memory vendor design as long as they follow the specs. Again, the vendors are allowed to build vendor-specific features and produce the vendor-specific driver. We're not preventing that. That's obviously a good thing. But having the standard driver essentially speeds up the enabling, the ecosystem, software distribution, all of those stuff. And I think that's a great benefit that the CXL brings in terms of precision memory enabling. As you can see later in Thomas' slide, it's still a new technology and there's some challenges getting everyone to be able to use it effectively. And anything that we can do in terms of enabling ease of use or deployment helps greatly. So in terms of the interface, it looks like it's a number of features that can be discovered independently. So if they can look at those and then enable them as it sees fit. The device report status in terms of health and most of the device features are accessed through mailboxes, where the host or CPU will... The CPU software will issue a request and device will provide a response. These commands generally handle things like reporting errors, managing the health, getting alerts, partitioning. I mentioned about creating file system and data test security like passphrases. So we had defined commands for doing those things in a standard way. And like I mentioned, vendor's extensions are allowed for vendor add, but most of the basic features that someone needs to get a system working are already part of the spec. So show on the right side, so CXL memory generic driver at the top that will talk to a bus driver in the OS through some OS proprietary mechanism. But the interface between the PMEM and the OS is standard called CXL2.0 memory register interface. Next slide, please.
So now that we talked about interleaving and the standard interface, let's get into the next concept called GPF. And to understand GPF, a global persistent flush, it's important to realize that the PMEM away applications that are already been written, they expect that when they issue a write and it's completed from their perspective, they expect it will be made persistent. But in reality, if the system were to make sure that every write from a PMEM application has to go all the way to the PMEM device and committed before it is marked completed, the system performance could be terrible. So what the system designers do is the data is often cached in the processor cache, could be cached in the CXL device cache, or the memory vendors may build write buffers or storage buffers on the device itself to store the data temporarily and then write it to the media at a later point. And these are things that are typically done for performance reasons. And I essentially get good performance from PMEM. The problem then is that if there are events like a sudden power loss, the system now has to make sure the data that's in the cache or in-prem storages is committed to the persistent domain, right? Because we have made promise to the PMEM application that write is completed is persistent, right? So we need to keep that promise. So the GPF is sort of the CXL mechanism for enabling that. Like the name suggests, it's a global event. It goes across the entire cache coherency domain, right? Or if you have a full processor system with, let's say, seven CXL memory attached devices, all of those are covered by the GPF event. It is controlled by the host, initiated by the host, which means the CPU, that allows some form of coordination between the flush that happens within the processor domain and the CXL domain. And there's two phases effectively, right? So the right-hand picture sort of goes into that. So in the first phase, the processor will ask each CXL device to stop injecting new traffic and flush its cache. In that, during that phase, all the devices and the, including this processor, will flush the cache and the CXL memory devices will now end up with the data that they need to commit to. At the end of phase one, the processor will insert a barrier, making sure that phase one is complete before going to phase two. And the phase two is sort of easy because now each CXL device has the data that it needs to push to its local media. So again, the processor will request each device to do that. And when the device acknowledges, the processor will move to the next phase. So if there's errors or timeouts in the phase one, the processor will propagate a flag to each device. So it will know that something happened and the device can then log a dirty shutdown event. And the handling of dirty shutdown event or these errors, I think is better explained in the next slide. So I'll hand off to Thomas.
Thank you, Mahesh. I'm Thomas from SK Hynix, and I would like to explain mainly on failure management of persistent memory features, four factors used for persistent memory and the challenges ahead. So for any reason, the GPF flow may not be successful, whether the timeout, errors, or any case where data is untraceable of completion. That's where we claim as a dirty shutdown state. So we keep track of device internal state called shutdown state, which is set dirty when the GPF flow is not successful. And then we keep track of the dirty shutdown count, DSC, where incremented when the shutdown state is claimed to be dirty. And this dirty shutdown count is exposed by a get help info, which is mailbox command. We must account for DSC from other devices in the interleave set in for the persistent memory to make sure that every device within that interleave set is in the clean state, or if anything is dirty, then we need to keep track of it. So in the below, I tried to organize under three different scenarios on how these DSC and shutdown state are managed. So when the device is not in use in the waking up from the reset, and you check the shutdown state whether it's dirty from the previous run, then you set dirty, then you increment the dirty shutdown count, let the host know, and you change the shutdown state back to clean. If it was clean anyways, then do nothing. When the device is in use, you execute the GPF flow. And then you launch the mailbox command, set shutdown state. Now you check it with the GPF flow has been successful, then it remains clean. But if it's not for any reason, then the shutdown state should be set dirty. So in the case where the flow is successful, and you go to normal shutdown, then you can start your normal shutdown flow, and you launch the mailbox command, set shutdown state, and the shutdown state should be remaining clean. So when you wake up from reset the next time, you should be able to consistently check the shutdown state, whether it's dirty, and keep incrementing the dirty shutdown count, which is a monotonic counter. So this is a simple mechanism that you allow to handle this GPF failure. And the next slide, please.
So another major feature for failure management is the internal poison list retrieval and scan media. So I would like to explain a little background on why this poison management is needed. So to preserve RAS, we want to keep track of any non-fatal DOE, detectable but incorruptible error as a poison. So I would like to first explain the example shown in the right side is a poison notification example. So this is a case where your host reads and the device responds. And CXL memory, when it discovers DOE, it generates poison for specific physical address as a result of DOE. And this is not the only case of a poisoning scenario. So there is another case not shown in the diagram, but host generates poison where DOE is found in a dirty cache line in the best of a cache and being evicted to the CXL memory. That's where you need a poison location in the source within the communication. And there is another case where CXL memory encounters a DOE at a specific address independent of host access. That's where you generate a log and poison and inform the host. There are multiple scenarios where you need to keep track of poison to avoid any DOE failure. So there is a mechanism called Get Poison List mailbox command to execute the internal poison list retrieval. So you basically obtain a complete list of poison locations on the memory device. So that host knows where these locations are, and avoids access to this memory locations with false to avoid DOE. And whenever there is a new poison locations added in the poison list, that's a device notifies by MSI or VDM notifications. And whenever the host manages this poison, and is deemed okay to clear this poison location, then host can issue a clear poison command in the mailbox command. But there are cases where you have so many poison to keep track of, and the poison list may overflow. Or for any some reason, the device may ask the host to execute a complete scan of the memory devices. That's where you use the scan media. So again, the update of the scan outcome is similar to internal poison list retrieval, you notified by MSI or VDM notifications. And based on the outcome, the host addresses the address is the poison media ranges, and updates the poison list if the DOE is found. And this is a very slow background operation. So the host will try to avoid doing scan media when there are many runtime operations needed and lots of memory requests to handling the CXL memory. And you should be aware that if in cases the number of errors happening are so large that the mailbox is full, then this scan media may need to stop. But our expectation is that will rarely happen. So there shouldn't be much of the worries on me having the mailbox full unless there's a significant problems found in the device. So next slide, please.
Now I would like to explain more about the form factors for CXL persistent memory. For CXL persistent memory, we plan to use all form factors supporting PCI. And I added some major candidates to use for the CXL persistent memory. For example, it's a EDSFF E1.S. And the second is EDSFF E3.S or E3.L. And the third one is an add-in card. So I would like to explain the major factors associated in terms of the comparing with the DDR server DIMM. Because in terms of memory, you are most comfortable with the DIMM perspective. So I would like to explain the form factor compared to the DIMMs and expected maximum power ranges. So first, E1.S is friendly for 1U servers. It's fairly smaller than DDR, say, RDIMM used in the servers. And the expected maximum power range is around 12 to 25 watts. So depending on the thicknesses, it may vary quite a lot. It is expected to be a similar range of the DIMM where the maximum power allowed for DIMMs are usually under 15 watts these days. And there's E3.S or E3.L, which is larger than the server DIMM. The power budget for the E3.S will be 25 to 40 watts. For E3.S, it will be 25 watts under the 1T, and 40 watts under the 2T. 1T is 7.5 millimeters thickness, and 2T is 16.8 millimeters thickness. So E3.S, 25 watts to 40 watts. And E3.L is 40 watts to 70 watts. And there's AIC, which is quite larger, even larger than the E3.S or L. And it is expected to have a maximum power range similar compared to E3.S or L. Next page, please.
So I would like to explain why we are considering these kind of form factors instead of the DIMM form factors. Basically these CXL memory form factors allow better capacity scaling under separate expansion memory channel. So first, I would like to explain the background on the capacity scaling trend on the current DIMMs. In order to explain why, we had to think about separate expansion memory channel using PCIe. So in the DDR generations, there were changes in the maximum number of DIMMs per channel you can put as we progressed. So in DDR3, the mainstream DIMM speed was found in 1333 to 1866, where we had three DIMMs per channel allowed. But in DDR4, starting say 2133 to 3200, especially starting with 2400, we started to have signal integrity problems, putting three DIMMs in the same channel. So that's where we started to have two DIMMs per channel. So these days, you should be able to see only two DIMMs per channel. Now when we started DDR5 era, very, very soon, starting with 4400 to 5600 Mbps, you should still be able to see two DIMMs per channel. But our forecast, based on the signal integrity analysis these days, is that beginning 6400 Mbps, one DIMM per channel era, may be inevitable. I believe this also was addressed in the previous webinar in the memory last year. But when it comes to one DIMM per channel, this high speed restricts both capacity scaling and flexibility to allow persistent memory with relaxed bandwidth. So if you only are allowed to put one DIMM per channel, then putting any memory with more latency, being persistent memory, compared to say DDR, then the performance may be very risky because you're going to put quite slower memory in the channel, and you don't have any complementing DIMMs to back up the throughput. So that's where you will have problems using the DIMM channel for this persistent memory or storage class memory. So on the right side, it explains on how EDSFF or AIC offers more advantages. So on the DIMM channel, you see the DDR channel and the x64 I/Os. In the DIMMs, it will have less tolerable power budget, 15 to 18 watts. Even the high speed 18 watts may be feasible with some changes on the memory system, the CPU. Apology, but going over 20 watts probably is very risky under the current memory system architecture that directly attaches to DIMM, very close to the CPU. And it's less flexible in form factor scaling up, and it will be more restricted in stacking options. What I'm saying is it will be very hard to stack without TSVs. And when it comes to the TSVs, you have to worry about the power internals, and also the cost may be an issue. But in the CXL and PCIe, in the x4, x8, or x16, you have more tolerable power budget. I expect you will have more than 25 watts allowed for persistent memory. You have more flexible in form factor scaling up. And it will have less restriction in stacking options, meaning more room to explore non-TSV options with more area allowed and more power allowed. And you have a more hot plug more feasible. So that's why we see a lot better potential in capacity scaling under the CXL and PCIe memory channel. Of course, the DIMMs will remain as the interface that provides throughput, bandwidth, performance. And it will provide some moderate amount of capacity scaling. But there are many emerging applications in memory compute that requires a lot higher scaling rate than ever before. So in terms of the capacity scaling, the CXL interface alongside is CXL memory form factors will provide you better values in capacity scaling. Next slide, please.
And I would like to answer some or provide some opinions based on a lot of customer and partner feedbacks in the persistent memory, especially on the challenges ahead. So I've been hearing many questions on when this persistent memory will thrive. But this is a completely new paradigm in terms of the memory. And more experience are still needed in enabling these new features. This enabling persistent memory, I believe, still is at the early stage, where some features need to reference the existing literature, especially DRAM based, and but some others need a new completely new paradigm. So example of new features applied to PMEM hardware includes power management, RAS and security. And especially in security, there was never good literature for the memory itself. Especially all the security management has been done in the CPU, which has been for a very long time. So applying these features, although there has been many efforts to enable these features for the last four to five years, still more works are needed to make sure that existing literature are appreciated, and also new paradigm is respected. And the second point I want to make is the infrastructure readiness, both hardware development and the software infrastructure. Hardware development will be still a challenge in order to address the throughput scale. Although the CXL memory may be focused on the capacity scaling in terms of the persistent memory or storage class memory, but still you need some moderate amount of throughput for this, say, tiered memory. But considering power and thermal restrictions, throughput scaling can be a big challenge. Hopefully, by using this CXL memory form factors, we will be able to alleviate this challenge better. In terms of the software infrastructure, the groundwork is done with the PMEM libraries and all the software infrastructure worked on for the last few years. But more explorations are still needed for general purpose applications. Of course, the definition of general purpose applications may start to vary in the future with the emergence of these new applications, but more different than existing DRAMs, then more works are needed in general purpose applications to understand how to use this persistent memory well. And then there is another point that I want to make is this in order to really use this persistent memory well, you got to have some more time a lot for the user experience readiness, meaning the users need to be comfortable in how to utilize the PMEM better. Because even though infrastructure is ready, programmers need to learn how to use this persistent memory well. So we expect that still a few more years are needed for the users to learn how to utilize these. So these are some of the points that I wanted to make, because there are so many questions about how to enable and when to enable persistent memory. But also, this is a very big change. And this effort has been done for a long time. So with more experiences and more infrastructure practicing, I think you'll be able to see more a lot more clearer pictures in the near future. Next slide, please.
So in summary, the CXL consortium momentum continues to grow. Right now, 150 plus members is still growing, responding to industry needs and challenges. And CXL is very ideal for attaching persistent memory. The protocol is designed with PMEM in mind and is media agnostic. So you don't have to worry about specific media attributes, a lot more flexibility. And the generic driver model uses software enabling, robust RAS and reliability features, and a variety of form factors enable innovative system designs. And also, one thing I wanted to mention again, was that as we get into more concepts in the CXL memory, including the pooling and the switching concepts, further it comes from the CPU to the memory, I think more opportunities will come with this persistent memory, maybe with a relaxed latency, or with more flexibility in the persistence in the software. Even though challenges are ahead, I think we will be able to have more answers in how this will be enabled in the future. So join CXL Consortium and follow us on YouTube, Twitter and LinkedIn for more updates.
Thank you so much, Mahesh and Thomas for sharing your expertise today. We'd like to encourage everyone interested in learning more about persistent memory in CXL to visit our website, computeexpresslink.org, to view the educational resources, white papers and CXL technical trainings. So we'll now go ahead and begin the Q&A portion of the webinar. If any of our viewers have questions, please submit those via the BrightTalk chat window. So our first question is, persistent memory includes NVDIMMs as media, not just Intel's Optane memory. Is this correct?
Yeah, it's a little confused by the question. So let me try to define persistent memory, right? It can use different types of media, right? It's not really specifically tied to the media that's used on NVDIMMs, which just tend to be flash, or it's not tied to the media that Intel uses, right, in Optane. So when we say CXL PMEM, it can be different types of media, different types of controller designs, right, all attached to CXL, but in the end, provide persistent data storage with some of the other characteristics that I mentioned earlier. And that's really what, so it's, yeah, it covers a lot more than what you're mentioning here.
Great, so the next question is, what is the standard register interface in CXL? Can you expand upon that?
So if you're familiar with the NVM Express register interface, it's very similar in nature. Again, it's a register interface that a CXL device implements that the software, the UEFI firmware, the OS, a driver can use to manage the device using a standard mechanism, right? So it's, I think the best way to learn more about that is look into sections 8.2.8 and 8.2.9 in the CXL 2.0 specification. You can go to the consortium site that Kevin just mentioned and download the spec and look at all the details.
So our next question is, what is the preferred form factor for CXL persistent memory?
So to be fair, the CXL as such does not have a preferred form factor, right? The CXL devices, the CXL spec, would essentially allow all form factors that can fit into all form factors that PCI Express, for example, allows, right? So folks are looking at form factors like EDSFF, right? Different options in that. There's also a potential to use the standard PCIe CEM form factor, carrier cards and everything. So CXL for the most part defines a protocol, right? So it's sort of agnostic to the form factor. And that again is one key benefit that CXL brings is that it allows the system vendors and the device vendors to innovate and pick the correct form factor based on their needs. I think Thomas mentioned some of the form factors provide more leeway in terms of the thermals and power. So devices that are high performance could use that form factor, those form factors, and devices that maybe don't really have such high power use, they could use the ones that have more optimized for the use model.
Okay, the next question is in reference to the poison list that Thomas had mentioned. So they're asking, shall I take it as kind of bad block list that the host uses and prepares the list and not to use those list of addresses? And then the second question is, is there an open source driver support Linux inbox, persistent memory and any sample reference driver? And then the third part is how will persistent memory with respect to SLD or MLD?
Okay, so let me take it one by one. To answer the first question, yeah, your analogy is exactly right, correct. It's like a bad block list, except because PMEM is byte addressable. The granularity tends to be much more than a block, right? Block is like vital bytes, right? Typically. So for PMEM, the granularity could be a cache line. And yeah, the software will know that and will stay away from that. So in that sense, it looks like a bad block list. To answer the second question, yes, there is support being added to Linux to support both CXL bus animation as well as memory animation. So there's mailing lists that are available that you can subscribe to, to keep track of the development and also contribute. So and again, a lot of work has yet to come, but there's some initial code out there that you can go look at right now. In terms of third question, so SLD and MLD can have PMEM, meaning it's possible for a PMEM device to be an SLD, a single logical device, or they can be a memory pool, right? Or MLD device that exports PMEM and that allows pooling of PMEM memory. So those sort of work together to provide new system design options.
Okay, thank you. The next question here is what are typical latencies when persistent memory is used through a CXL switch when interleaving?
So again, a lot of it would depend upon the individual switch design, the latency that introduces, and to a great extent, the PMEM media, right? So the different media that are used for persistent memory tend to have very different latency characteristics as well as the bandwidth characteristics. So it's kind of hard to put a number down. But like I mentioned, the CXL protocol has a sort of a transactional nature, so it can tolerate longer latencies as well as more variability in terms of the latency, a longer tail, for example. But yeah, I mean, generally the PMEM latencies would be higher than the volatile. And because the media latency seem to be high, moving them behind CXL and the latency adder of the switch as well as the CXL doesn't really seem to change the performance that much. So it would be pretty comparable to today's PMEM designs in my mind.
Great. I think we have time for one more question. You mentioned non-fatal errors, but not fatal errors. Can you enlighten what happens if there are fatal errors?
So if the device has fatal errors that are exposed to the system, the CXL spec defines a capabilities that is called viral. It is for handling fatal errors and creating error containment. So a device that is experiencing fatal error can communicate to the rest of the system that it has a fatal error. And if it's enabled, the entire system can sort of go into a mode where it's going to stop committing any writes to the persistent media in order to prevent further damage or further corruption. So that we did not talk about that today, but there's a vast white paper on the CXL Consortium site that provides more details as to how this works and how it could be used in real systems. So I encourage you to take a look at that.
Great. Well, thank you so much, Mahesh. And being mindful of time, we'll go ahead and wrap up today's webinar. The recording of this presentation will be available on CXL Consortium's YouTube channel soon and we'll also be uploading the slides to the website. Once again, thank you so much for attending the CXL Supporting Persistent Memory webinar. Have a great rest of your day.