-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path178
101 lines (50 loc) · 38.8 KB
/
178
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Good morning and good afternoon. Thank you for attending the CXL Consortium introducing the CXL 3.1 specification webinar. Today's webinar will be led by CXL Consortium Technical Task Force Co-Chair Mahesh Wagh and Protocol Working Group Co-Chair Rob Blankenship. Now I will hand it off to the presenters to begin the webinar.
So CXL ecosystem, right, the CXL ecosystem, it started with, you know, a direct connect and over the span of the last four years has really grown. If you look at the growth of CXL ecosystem since its inception, you have the CXL specification at its center, but it's addressing lots of things. It's addressing the needs for CPU and GPU interconnects, for example, it's meeting the needs of accelerators and FPGAs. This is where you have, you know, programmable compute, you have accelerators that can participate in, you know, providing you a solution. It's opening up a lot of opportunities for CXL IPs. As the specification is developed, there is a room for IP vendors to provide these IPs that can be made use of, you know, whether it's a CPU, GPU, accelerator or devices, right? So pretty big business opportunities for CXL IPs. It's meeting the needs for memory expanders. What CXL is fundamentally changing is it's enabling the memory hierarchy that we've been, you know, the industry has been working on for many, many years, but it was missing a key piece of that development, which was having an interconnect that would allow you to have memory expansion capabilities. So that has enabled memory expanders, quite a few of them you would see in the ecosystem right now. From a scaling perspective, it's providing the business opportunities there for switches, because as you look at and we'll cover some of the developments for CXL, that says you're increasing your fan out as well as you're providing new and interesting use cases that would be enabled with, you know, with switches. As you develop any interconnect, it goes with, you know, you need analyzers, you need traffic generators, so you can get both your validation as well as debug infrastructure all set up. So CXL does enable that. And then what rides on all top of that is the software solutions. So you have either accelerator solutions that are very new and interesting memory use cases and memory solutions that run on top of CXL. So overall, you know, we have the specification and that's basically meeting the needs across all of these components and their interconnection with each other. So from a specification perspective, a couple of things that we make sure that we address is we work very hard to look into keeping your investment in the protocol and your products so that you can recoup your investments. So we look at fully backwards compatible solutions. That is a key for all of the developments that we do. We also keep in mind the overall system cost. You can do a lot of innovations, but we continue to look at making sure that we can address the solutions with keeping cost as well as performance in mind. And then what goes along with that is we have a comprehensive support for compliance and testing support. Doing specification is one you need to along with that specification for success of that technology, you need to have a comprehensive compliance and testing support. So that is already planned and provided. The specification has sections that talk about how to design for compliance and testing as well. So from an overall perspective, it covers all of those aspects that are needed for a very strong ecosystem.
So we'll cover where the scope of CXL is, where did it start with as an overview and where it's scaling. So it started with CXL 1.1, focusing on single node core and interconnect between the CPU, enabling use cases for an accelerator direct attach or a memory expansion use cases. And we'll cover those in more details. We brought in the notion of type 1, type 2 and type 3 type of profiles for the devices and their use cases. So that's how it started with CXL 1.1. In that time frame, the CXL link itself was sort of not visible to legacy PCI bus drivers. So there were some ways in which you could discover those devices. But having said that, it enabled the use case of directly attaching accelerators and memory expanders to a processor interconnect in that time frame.
As we looked at CXL 2.0, then it built on the success of, you now define the ability to attach accelerators and memory expansion devices. The need was then to provide the scaling for those devices. So that with CXL 2.0, we brought in a couple of new features, which is the ability to force support hot plug capability with CXL 1.1. So you can add and remove devices. We extended the support where the link itself would be discoverable by a PCI bus driver. So that allowed you to take advantage of all of the development that was done in PCIe for hot plug, for example. On the memory and accelerator side of things, it enabled the ability to now support a CXL 2.0 switching capability that allowed you to at a first level provide fan out. So you can fan out from one link to many CXL devices, as an example. And then it introduced the concept of memory pooling. This is where you can have a particular memory resource and then allow that particular resource to be pooled, in which case, you take that resource, divide it up into different distinct segments, and then assign those resources to a given particular host image. And you could do that with having an entire device being allocated through a CXL 2.0 switch to one host image, or you can share, you can pool a device between multiple host images. So that was supported through CXL 2.0 switch capabilities. And it's just an example that's shown here. Those H1s, those colors denote a host image, and then the colors on the bottom for D1, D2, for example, those are devices. And the association is that a color belongs to a given host image, and you're showing the mapping of the resources to that given host image. So one of the additional things that CXL 2.0 did was define in the spec the notion of multi-headed CXL 2.0 switches. And this has always been possible, for example, in PCI Express, but CXL went in and defined the switching architecture that supported multi-headed CXL switches, as well as supporting multi-headed CXL devices. As you can see, for example, the device D2 has the ability to be connected and pool its resources across host 1 and then host 3, just as an example. So that's what CXL 2.0 enabled at a high level in this capability. One additional thing from a security perspective, CXL 2.0 brought in the support for link encryption that you would enable across either direct attach or across CXL 2.0 switches. So from a security perspective, that was the enhancement that was added to CXL 2.0.
When we get to CXL 3.0 and 3.1, it is really growing sort of the scale for what we could do with switching. It introduced the concept of multi-layer switching, as well as introducing the notion of a composable fabric, in which case you can now have many of these devices as well as the host nodes interconnected with each other and providing a fabric, a composable fabric that would allow both disaggregation and pooling of accelerator and the resources. One additional thing that we will cover is CXL 3.0 and 3.1 brings in the concept of sharing of resources, which means that a given memory resource can now be shared directly, accessible, and mapped between multiple host images. And then to support that, you could always do that with software coherency mechanisms, but if you wanted to support the coherency mechanisms through hardware-based coherency, then you had to support a new type of protocol. So there was a new protocol channel that was introduced in CXL 3.0 to enable the back-invalidate flows. And we'll cover some of those details today. So at a high level, this shows sort of the scope of CXL as it has expanded from node-level properties at CXL 1.1, continue to develop that through fan-out switches, pooling in CXL 2.0, extend those capabilities to CXL 3.0 and 3.1. And as we're doing this development, continue to keep the backwards compatible aspect of CXL so that the investments that people have made in CXL 1.1 continue to be useful where you can take a CXL 1.1 device and still play in the CXL 3.0 ecosystem.
So with that, we'll do a quick recap of what are the representative CXL use cases. It starts with really having what's called CXL Type 1. It's caching devices or accelerators. For those who are familiar with CXL, it runs on PCI Express physical layer, but it brings in three different protocols that you can interleave at a much finer granularity. So you can interleave between what's CXL.io, CXL.cache, and CXL.mem. For Type 1, you only make use of CXL.io and CXL.cache protocols. And the use case for that is accelerators, which make use of just the CXL.cache protocol. A good example of that would be PGAS, NIC, or atomics, where you can bring a cache line into the NIC, and then you can apply either your atomics or you can do source-based ordering if that is what you intended to do with the Type 1 devices. Type 2 devices are the ones that have memory associated with an accelerator that you're mapping in the system memory address space. And to enable that, now you need to support all the three protocols, the CXL.io, CXL.cache, and CXL.mem. In all of these examples, the CXL.io protocol is required for device discovery, configuration, and enabling the devices. And then you make use of CXL.cache and CXL.mem, depending on your use case. So for a Type 2 device, you're using the CXL.cache as well as CXL.mem protocol. And it allows the processor to cache device-attached memory, and it also allows the accelerator to cache both processor-attached memory as well as device-attached memory. A good example of that would be either you have GPGPU use cases, accelerators which have high touch for memory components. So that type of use cases can be addressed with Type 2. And finally, on the right side is memory buffers. So this is providing memory expansion capability with CXL. And in that case, you need to use CXL.io protocol for discovery and then CXL.mem protocol for accessing the memory. And in this case, you have a memory controller, so you're either using it for memory bandwidth expansion. So to your processor-attached memory, you can expand it by going over CXL link and supporting memory bandwidth expansion or memory capacity expansion. In addition to that, you can also think about storage-class memory that could be supported. The way to think about these memory buffers are they provide you with an access to .mem controller, and they abstract out the media that you can connect behind that memory controller. So it's enabling a lot of interesting use cases for making use of different types of memory, either it is DDR4, DDR5, or storage-class memory behind the memory controller, and provide the opportunities for the memory controller ecosystem to innovate with CXL Type 3. And on top of CXL Type 3, you can then start to make use of some of the extensions that I talked about that support the use cases for memory pooling or memory sharing. So as you get more advanced, then you can extend those capabilities to support those use cases.
Quick look at where we have, I talked about the expansion, but if you look at it from a CXL specification perspective, the 1.0 spec was released in March 2019. Then in September, there was a revision to that. The consortium was officially incorporated. There was a CXL 1.1 specification that was released based on initial feedback. November 2020, we released the 2.0 specification, addressing some of the things that I talked about. August 2022 was CXL 3.0 specification. And then last year, November 2023, we released the CXL 3.1 specification. And today, we'll cover the aspects of CXL 3.1 spec.
So from a feature summary perspective, if you're looking for how does that progression work, what are the things that are covered through the specs, as I was talking about 1.1, the data rate is 32 gigatransfers per second. It supports what we now know as a flit mode, 68 byte flit. Type 1, type 2, type 3 devices were supported. CXL 2.0 added more expansion as I touched on. It still kept with 32 gigatransfers per second, but it added the ability to now do memory pooling with multi-logical devices. We added the support for persistent memory with global persistent flush, added security and supported switching. And then as we get into 3.0, a couple of things change here with CXL 3.0. We changed the data rate to 64 gigatransfers per second. So now you're getting doubling of the bandwidth. And along with that, we introduced new flit types, which was 256 byte flits. This was cleverly done so that you reduce the impact of going to 256 byte and don't not take a threat on latency. So there were several enhancements in there to make sure that we focus on performance. And then from a use case perspective, it broadened the mechanisms to address the switching multi-level use case, direct memory access for peer to peer. There's enhanced coherency support. And then the ability to support multiple type 1 and type 2 devices per root port. CXL 2.0, there were some limitations on how many type 1, type 2 devices you could support. That was extended with CXL 3.0. And then fabric capabilities were defined, and they were even further enhanced with CXL 3.1 that Rob will cover in the next few slides.
So we'll cover quickly on what is the CXL 3.0, a quick recap of what's the difference between pooling and sharing. So I've talked about pooling before where a good example of that would be if you look at that device D2, it's shown in two different colors. Those colors denote a host image where those devices are mapped. So if you look into the host images up there, the device D2, its resources are partitioned. And then all of its sort of orange resources are allocated to host 4 and the magenta color resources that are allocated to host 3 as an example. And what the specification supports is the ability to do this as a static allocation, or you can also do this dynamic, which is at runtime, you can change those allocations. So all of that support for enabling the dynamic allocation of memory is supported in the CXL 3.0 specification. There's a lot of development that's happening in the ecosystem to enable the dynamic pooling of those memory resources. Now when we get into sharing, the notion there is when pooling was you take your resources, you partition, and then you allocate a given partition to a host, what sharing allows you to do is map that same device sort of memory into multiple different address spaces. So as shown here, the S denotes sharing. So you can see if it device 1, S2 is the region that's now shared between sort of the host 1 and the host 2, in which case both of them have that particular memory region mapped in their address space, and then they can simultaneously access that particular memory. The coherency for that, depending on the scheme that you choose, can be done either through software-based coherency, or you can do hardware-based coherency to support that particular use case. The specification provides all the mechanisms that you need to enable that hardware-based coherence. Now a key thing in all of these memory sharing and pooling use cases is you need a fabric manager that is available to do this setup, deploy, and basically manage this particular memory. And so there are specific APIs that are defined that allow for a fabric manager to understand and allocate those resources across this multiple different host images so that you can enable a memory sharing or a pooling use case. So all of those APIs, the commands that you need to enable that fabric manager are also supported in the specification.
All right, as we go into CXL 3.1, from a feature enhancements perspective, it was basically addressing the new use models. And if you look into what those new use models were coming into, for one was the ability to now scale out the CXL fabric resources. So in which case now you're looking for CXL fabric improvements. And to provide at the scale that we were looking at, you needed to provide new ways of routing those requests or flits across that fabric. So the notion of CXL fabrics with port-based routing was defined. So now you can support the scale out needs of the ecosystem. The additional thing that was looked into is the ability to support CXL attached memory in confidential compute sort of environments that required you to support basically the TSP protocol, which is the Trusted Execution Environment Security Protocol. So we had at the baseline memory link encryption, but you needed something more than that to allow for confidential compute. So TSP was added to CXL 3.1 specification, and there were some other memory expander improvements to increase the metadata as well as RAS capabilities of the CXL 3.1 specification. And Rob will go through and cover those.
So with that, I'll hand over to Rob for going through the next level of details for what is specific in CXL 3.1.
Thanks Mahesh. So on the fabric improvements, we enhanced CXL 3.0 to find the basic PBR fabric. We've enhanced that in terms of defining more details on fabric decode and routing rules. We added this feature we call global integrated memory, which allows for host to host communication. We added the ability to do direct peer to peer dot mem support. This is with along with part of the precursor to that is PBR switches. And this effectively allows CXL.mem protocol to be symmetric from an accelerator device point of view. Accelerator device can both be a destination CXL.mem protocol as well as an initiator targeting peer device. And, you know, 3.0 defined the ability to do unordered IO to CXL memory. But this is enabling using the CXL.mem protocol natively, which allows the accelerator to cache, directly cache that memory without having to go through the host. And then lastly, we added basically the full definition of the fabric manager API for PBR switches. 3.0 defined PBR switch construct but didn't left the API open. And so 3.1 is closing that hole and defining the API for fabric manager.
Okay. So talking about the details of fabric, let's start with the picture on the hierarchy on the left side of the slide, which is showing HBR switches. HBR switches were part of CXL 2.0 definition. And these switches are built on the idea of a tree based hierarchy and every link has an upstream and downstream direction. So things flow from the hierarchy point of view in top to bottom, bottom to top direction. There are peer to peer traffic within that. But most but our CXL.cache and CXL.mem protocols and specifically are directional in this in a hierarchy based switch topology. When we go to this picture on the right, we have PBR switches and you'll see that in each switch in this picture, we have hosts and devices, meaning that and there is still a concept of a tree based topology from a logical tree based topology. But the links between the switches, they can flow in both directions, meaning you can have a host in one talking to a device in the other and that same that other switch can have a host and that switch talking to a device in the other switch. So we have traffic on those links between the switches that are is really flowing in both directions. There's no longer directional nature to the protocols on those links. So we break down the basic construct that we had with the hierarchy based switches in the past. And this enables a much more flexible topology.
So I'm talking about the global integrated memory. The idea with global integrated memory is that each host in this picture can expose a memory region that's on the fabric. And this allows other hosts or requesters to use the unordered IO protocol to directly access that host memory. And this, you know, it's basically a window into that host. And this is going to enable kind of host to host communication models. And it's built on top of the basic fabric address mechanism that PBR introduced. And this is different, although, again, it's similar to fabric attached memory. So you see the device at the bottom, the type 3 memory device. Think of that as a fabric attached memory device that's exposed to many hosts. And basically, there's a decode that happens as messages enter the fabric that decode the fabric address space and can map to both that fabric attached memory device or the global integrated memory in a peer host. The one difference here that's important is that the global integrated memory in a peer host is accessed only through the unordered IO protocol. It doesn't use the CXL.mem protocol to access. But the decode mechanisms are similar.
Next slide. So talking about the peer to peer CXL.mem feature that we've enabled in 3.1. This enables an accelerator. So in the diagram, we can see the purple device is an accelerator. It can, again, be the target of a CXL.mem access from the host. And it can also initiate peer to peer CXL.mem accesses to the peer type 3 device. This is utilizing port-based routing features. So it relies on PBR switch to do this. And it's supporting a type 3 memory device such that the device can be either dedicated to the accelerator. So you can make this device fully in the memory that's exposed in that device only accessible from the accelerator or if it's a multi-logical device that memory can be shared between the accelerator and a host. Mahesh was talking about that sort of shared memory can also be shared across multiple hosts.
So, Rob, there's one question on this one as you were talking about. It's from Chandraprakash. Question is, what do you mean by CXL type 1 device supporting CXL.mem protocol peer to peer? Will CXL type 1 device in this case have CXL memory inside it? I think what you were saying is it's a type 1 accelerator that is accessing the type 3 memory.
Yeah. And I think in the use models you were showing, Mahesh, it did show the type 1 device as being able to initiate CXL.mem to a peer device. So in that context, if the device is a type 1 device, meaning it can initiate CXL.cache request to the host, but it may not expose memory to the host, but it can still initiate peer to peer CXL.mem to the peer type 3 device.
It's accessing a memory that's mapped in type 1. I mean, that's mapped to CXL.mem.
All right. Okay. And Mahesh kind of went through some of this. We have a fabric manager that is the entity that configures the PBR switches and even the CXL 2.0 switches. We have a fabric manager. And this is a necessary entity to configure these resources, which may be assigned to different hosts or shared between multiple hosts. And also just a general configuration of the PBR switch, because those links between the switches aren't exposed to individual hosts. The host really sees the PBR fabric as a single level of switching, even though there are intermediate links. And all that is controlled through this fabric manager. And 3.1 really is defining that fabric manager for PBR switches.
So switching now to the TSP definition. This is the trusted -- we have two acronyms here. Trusted execution environment is TEE, and the TEE security protocol is TSP.
So just to recap, in 2.0, we added the link IDE feature, which Mahesh also discussed. This is providing protection, both encryption and integrity protection against hardware adversaries to protect the links, both between host and device and host and switch and switch and device. So providing that mechanism. And our TSP is really building on top of this feature, extending further into the protocol level.
And specifically, we're enabling virtualized environments to support trusted virtual machines running on -- you know, in a host where the host may be running trusted virtual machines as well as regular virtual machines. And the TSP protocol is intended to provide kind of memory isolation so that we can have -- and you'll see on the left side, we have a host that has its own memory that's connected to the host and has regions of memory that are trusted memory for particular VMs, trusted VM and regular host memory. So there's two colors kind of indicating that. We're extending that into the CXL. So on the right, we have a CXL capable device that has both the blue memory, the trusted memory, as well as untrusted memory or memory for regular VMs. And we use, again, building on top of the link IDE feature to provide security across the link, protecting against hardware adversaries there. And let's see. Anything else to note on this slide? You know, this is building -- we're making sure all the features were added to meet the needs of kind of that virtualized environment where, you know, different hosts have different particular requirements that are on the host, but we've abstracted the trusted security protocol such that we enable the device to meet the needs of different hosts running their trusted VMs and conform to all the, you know, best practices in terms of security to make sure that we can meet the needs of customers running these trusted VMs.
So talking about specific elements of TSP. So we have trusted executions, state and access control, configuration of those -- of the different elements, attestation and authentication, as well as locking. We have memory at rest encryption. This is the memory that's stored in the device, how that data gets encrypted is enabled through the spec, as well as the transport security. This is the data in flight. This is primarily the IDE part. And we can -- currently, the TSP definition is defined to support HDMH, kind of directly connected memory expanders, but we do intend to extend this in the future for coherent memory, the HDMDB memory type, as well as for switches.how access to memory is controlled, this is the first part. The configuration, basically how we determine the security features in the device and enable those features and lock that configuration to ensure we don't -- that it doesn't change while the trusted application is running. Attestation, authentication, this, you know, is just trusting who you're talking to over the device. The data at rest part, this is allowing the device to do that data at rest encryption or the host may do it. So this is configurable depending on the needs of the host and the needs of the use model in the device. And then, again, the transport and security is that link IDE primarily.
So last topic is memory expander improvements.
So for 3.1, we added this extension to provide 32 bits of metadata per cache line. The spec originally defined two bits of metadata that could be used for non-coherent memory, it can be host-specific use, or for coherent memory, it defines a coherent state. So that eliminates the ability for coherent memory, the host now uses those bits for coherence definition that can't be used for host-specific use. But these 32 bits are intended for -- to work for both coherent and for HDMH memory, which is the non-coherent memory type. So it can be used for either and allows a host-specific use model to be extended for both coherent and non-coherent memory expansion. And we obviously get up to 32 bits of data now, so we have more data available for more advanced use cases. Some of the use cases that have been talked about, the spec doesn't define these, but they leave it up to the host or the platform design to define those use cases. But possible use cases may be access control. So you could have data tagging where the requester makes sure that it has the right tag for the accessor. You could use -- have memory tiering algorithms that rely on this extra storage to manage the tiering of memory. And DDR6 is also proposed to align to having up 16 to 32 bits of metadata for each cache line as well. So this is also following in that regard from other DDR memory technology. All right. And then we also included basically some new spec-defined API for managing memory devices. You can manage the correctable error limits on the device. We may have the ability to expose more information about the source of errors, the type of errors that happen in the device. Added more control over memory RAS, memory sparing, patrol scrubbing, and those sort of features. And then lastly, we added the direct peer-to-peer CXL.mem. We covered that in the Fabric discussion, so I won't talk about that more. But this is also specific for memory expansion devices.
All right. So this is the feature list again. I just went over the first three and just highlighting again the four new features that are shown on this table for 3.1 are closing the PBR connection with the Fabric Manager API, the host-to-host communication through the global integrated memory for port-based routing, the TSP definition for the security protocol, and the memory expander enhancements, those RAS capabilities in the 32 bits of metadata.
All right. So to summarize, the spec is continuing to evolve to meet new use cases. We have the three kind of areas of new features introduced in 3.1 for Fabric improvements, extensions, the TSP security protocol, and the memory expander improvements. So to support future spec developments, please join the consortium. We have many member companies today and many contributors. The 3.1 spec is available for download at the consortium website, and you can follow us on social media for additional updates.
All right. Q&A.
Thanks, Rob. There are about four or five questions and more coming in. Let's start with the first one from Ankar from NVIDIA. His question is, what is the difference between the symmetrical CXL.mem and UIO usage for P2P?
Yeah. So UIO is a new capability that's part of PCI Express now, but was envisioned from originally 3.0 definition that allows both PCI Express devices as well as CXL accelerators to directly access memory in pure CXL type 2 accelerator or a type 3 device. Now, symmetric CXL.mem is really focused on a CXL accelerator directly accessing a memory expander in that it's not intended for targeting a CXL type 2 device, for example. It's really targeting pure memory expanders. And there's some reasons that... And the value of that potentially is that the CXL accelerator can now use kind of basic simple HDMH memory expanders and get direct access to those. And those targets don't need the unordered I/O protocol. But I think... So it is more of a specific use case for the CXL.mem protocol where the UIO is more general purpose and maybe more flexible depending on their use.
Thanks Rob. This next one from Vishnu from Micron. Hey Vishnu. His question was, what solutions in CXL spec offering to get around the blast radius issue in a fabric? Customers are concerned about single device failure impacting multiple servers in a pooling environment. I can take a stab at that and Rob, you can add to that. My take on it is, if you think about from a pooling perspective, yes, you should look at that as device and media type of errors. And you can think about that as all of that really needs to be handled through from a device perspective. But as you get into that media and when you do see errors on that, still where your link is functional and the device controller is functional, you have means of mapping that to data poison or viral. You have those means of then notifying the host of memory and then that would get handled in a way that hosts typically handle memory errors today. And the next level is, well, the device is non-responsive. How do you handle that? And CXL 2.0 specification added, there was an ECN to that for error isolation that made sure that when there is a link that goes down or a device is not responsive, then you take care of that at the host interface by making sure that the host takes ownership of those responses and then starts to, does requests and starts to create responses for those devices. So there are all of those mechanisms that are built in. In addition to that, as you look at pooling, there is the DCD device. So there are mechanisms that a device vendor can build on their side to differentiate their capabilities where you're monitoring your media and your device for its health and you can take certain actions a priori depending on what you see with the device. So there are various ways you can address it. The spec provides you means to handle these situations. But of course, as there are new and interesting use cases, if there is something that you see that needs to be addressed in the spec, please bring that through your representative into the work group to address the specific things or if you have a proposal, you should bring that into the work group so we can understand the problem statement and then come up with ways to solve it. So that would be it. Rob, anything you want to add to that?
That was good. Maybe just one thing. We do want to be careful in CXL not to go overboard on adding very heavyweight features, so say like end-to-end protocol retry or something that would compromise kind of the lightweight low latency attributes that we get in CXL. So we have to be discriminating to make sure we address the right problems and we don't want to do Ethernet style reliability because that just is probably overboard for a CXL environment.
All right. Hey, so lots of interesting questions on peer-to-peer and those use cases. Maybe we go with the one from Mohan from HPE. A question is, does the GIM need to be CXL memory or it can be DDR5 memory directly attached to the host?
It can be either. So, yeah, GIM is a region of memory that's owned by a particular host and that memory could be native, DDR connected, or CXL attached below that host.
Okay. I'm going to group, Rob, a bunch of questions that are on UIO related. Maybe we could kind of address those. One is from Zaman from Rambus. What is the difference in benefit of host to H2H with GIM using UIO versus multiple hosts sharing a type 3 device?
Yeah, I mean, it's a different case. You have fabric attached memory or GFAM, which is a memory expansion device that may be shared by multiple hosts. That's a very flexible approach that I would expect would address many kind of high bandwidth use cases that maybe can be where the host also may want to cache that data directly. The GIM use case with UIO is the requester would be using the unordered IO. So, it wouldn't be expecting to cache that data. It would be more like a memory mapped IO region from the requester point of view and it would be coherent in the destination. So, kind of a different asymmetric data flow versus the fabric attached memory, which can be made more symmetric from multiple hosts.
All right. Thanks, Rob. Next question is from Mahmoud from Siemens. Do we need UIO? I think it's UIO to support host to host communication use case or the device peer to peer communications or switch to switch communications?
Yeah, maybe I largely addressed that maybe in the last response. Yeah, GIM is UIO only that host to host flow. I think device peer to peer can today would can use traditional PCIe kind of ordered flows. It can use unordered IO and the unordered IO specifically for peer to peer flows lets you access a CXL memory region without going through the host. So, within your switch hierarchy, you can use UI to directly access another accelerator's memory.
And then you address that through the symmetrical CXL.mem. You can still get to peer to peer, but that's only intended use cases to reaching out to memory expansion that's connected, right?
So, yeah.
There's a question from Randy Bright from Intel. Hey, Randy. For CXL switches, are there any unique needs besides the various types of command packets to be moved around? And like other switches, is there a sensitivity to a certain performance like latency?
Yeah, I mean, the CXL.cache and CXL.mem protocols are, you know, extremely focused on latency and cache line size access where, you know, UIO that we were talking about is it has variable length. It flows through the CXL.IO stack or PCIe stack and would tend to have higher latency because of that compared to the CXL.cache and CXL.mem protocols. So, yeah, we generally use CXL.cache and CXL.mem protocol or focus on that for latency optimization.
Yeah. And I would like to add to that, which is we're trying to keep and maintain node level properties as you're looking at anything of this, you know, as we're scaling. I think from a use model and from a system perspective, you need to take that into account in terms of for that solution, what do you expect from a performance perspective, right? Because at the end of the day, a lot of these protocols would be dependent on, you know, the resources that you have to, you know, sort of sustain the latency to get a certain performance, right? And that doesn't change. So, you have to carefully weigh your use case and your performance requirements. CXL is bringing a lot of solutions, but at the end of the day, it will depend on what problem you're solving, what are your performance requirements and then see whether you have adequately resourced to enable those use cases. We have about four minutes. Maybe we can take a couple of questions. There's one from Sanjay Goyal from Rambus. From memory read fill to send DRS back, which credit needs to be available? Is it S2M DRS or S2M NDR?
Yeah, maybe I'll just give a little more context here. Memory fill is a new command that was introduced for TSP. And it's intended to enable the host to do data merging when there's host based encryption. And it flows on the same channel that writes flow in the RWD channel the request does. And it does, it is unique in that it's a request. It flows on RWD channel, but does return a DRS from a memory expander. And it does use standard DRS credits. So, it uses the same response. The DRS is the data response, uses the same data response that any normal read would use to return data.
Okay, thanks. I think there was one question from Kulwinder Singh from Microchip. This was, you know, when you were talking about in the peer to peer, and the question was, you know, will the fabric manager take all the control? I think part of it is the fabric manager is responsible for sort of doing the, you know, enabling the setup, right, and the path. And, right, but then, you know, control is very different, right? I think that's it's more of the management plane. That's the way to think about the fabric manager. Rob, you want to add more to that?
No, I agree.
Yeah. Okay, I think there was one last question, maybe in a minute. It was just reference that was from Donald from Red Hat. He was just asking what is an MC? It was in one of the slides. Probably it was a memory controller, but I could be wrong without the context.
All right. Well, thank you, Rob and Mahesh for sharing your expertise. Being mindful of the time here, we will wrap up today's webinar. We weren't able to address all the questions we received today, but we will be addressing them in a future blog. So please follow the CXL Consortium on social media for updates. Once again, I'd like to thank you for attending the CXL Consortium Introducing the CXL 3.1 Specification Webinar. Good day.
Thank you.
Thank you.