-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path262
180 lines (90 loc) · 48.7 KB
/
262
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
Good morning. Thank you for joining us for the CXL Consortium introducing the Compute Express Link 2.0 Specification Webinar. Today's webinar will be led by Debendra Das Sharma, Intel Fellow, Director, I/O Technology Standards, and CXL Board Technical Task Force Co-Chair, and Ahmad Danesh, Manager, Product Marketing and Strategy, Data Center Solutions for Microchip Technology, and the CXL Marketing Workgroup Member. Now I will hand it off to Ahmad to begin the webinar.
Thank you. And hello everyone and thank you for joining today. My name is Ahmad and I'm excited to be here today with Devendra to introduce Compute Express Link 2.0 Specification. So just to touch on the CXL timeline a little bit, CXL 1.0 was first announced back in March of 2019. Very quickly CXL was incorporated and announced in CXL 1.1 was announced in September 2019 and excited that CXL 2.0 was announced actually exactly a month ago to the day on November 10th. And a quick plug for the previous webinars that we've been putting on. There's been three so far. Introduction to CXL, the second was exploring coherent memory and innovation, and the third, memory challenges in CXL solutions. If you have not seen those webinars, please go to computeexpresslink.org for more information. In today's webinar we're going to discuss a bit about the industry landscape and what's driving us to continually improve the CXL spec, learn more about the CXL consortium, touch on some background about CXL 1.1 specification to really get some good background there, and then some meat of the conversation, discussion into what's new in the future of 2.0 specification.
So when we take a look at the industry landscape and really what are the market drivers that are requiring us to improve this technology. We have the proliferation of cloud computing, the growth of AI and analytics, and the cloudification of the network and edge, and these all drive a need for a higher performance solution to process and analyze these larger data sets that are generated by these applications. So CXL, as a low-latency interconnect to connect compute, memory, and accelerated resources, is here to solve these challenging problems.
So we have wide adoption across the ecosystem and significant backing from quite a number of member companies. When we take a look at the board of directors, which is comprised of leading cloud OEM CPU providers, as well as the silicon infrastructure providers to enable the backend technology to deploy these types of solutions. Now, it's just not the directors that are driving this specification. We have over 130 member companies and growing.
We have two different types of memberships. We have adopter members, which is free to join and gets access and gets IP rights with that, along with getting access to all the specifications, as well as contributor members. The advantage of being a contributor member is you get access to the draft specification and really get to shape the future of the CXL specification. And with CXL 2.0 being completed, we're already now working on the CXL 3.0 specification and moving that forward, as well. So a big thank you to all the contributor member companies for helping to shape the CXL 2.0 spec and enabling us to launch this last month.
So when all these member companies are coming together, we're really doing so to deliver the right features and architecture to address these challenges. So those challenges that are imposed by those market drivers that I touched on earlier really address the need for faster data processing, heterogeneous computing, memory bandwidth and capacity expansion, as well as tuning of these attached compute and memory resources for a specific targeted application workload.
The CXL comes together to solve those challenges and is an open industry standard providing higher performance cache coherent interconnect between the processors, the memory, as well as the accelerators. And it does that through three ways. One is by being a coherent interface, which leverages the PCIe physical layer with three mix and match sub-protocols. We'll touch a little bit more about these sub-protocols in a follow-on slide. It's a low latency interconnect with near CPU cache coherent latencies, as well as reducing asymmetric complexity so that we can have different migration of CPU and endpoints so that they're not locked together for a specific generation.
And while CXL continues to evolve, it's very important and what we do is make sure it's backward compatible with previous generation to ensure interoperability with previous devices and make sure that you continue to use that investment that you've been putting into the technology.
Next slide.
So when we take a look at a data center, really where's the scope of CXL 2.0 over CXL 1.1?
So on the bottom left there, we can see that CXL 1.1 was really focused on single node connectivity right within the processor interconnect. And as we expand into CXL 2.0, one of the major changes and the focus for CXL 2.0 was the introduction of switching. And this was done so that we can enable rack level expansion to get out of just being a single node so that we can have multi-node connectivity for pooling applications as well as being able to branch out for a single node application into fan out applications to have larger and larger sets of data being processed by a single processor.
We'll touch on a little bit more on all the different features of course that are going into CXL 2.0, but it is good to know kind of the main framework of why we're driving this improvement is to expand out and have expansion within a rack.
Next slide please.
So before we jump into all the CXL 2.0 enhancements, it's important to have some background and touch on the CXL representative usages. So we have three mix and match sub-protocols.
There's CXL.io, .cache, and .memory or sometimes called .mem for short. CXL.io is essentially for management and is used by all three CXL device types. .cache really provides access to a processor's memory, so device accesses the processor's memory, while for .memory the processor is getting access to the device memory. So when we take a look at type one on the far left there, that's a device, something like a smart NIC, where the device is sharing the processor's memory. If we go all the way far to the right, we have type three, where it's using the .memory sub-protocol where the processor is getting access to the memory buffer's memory. And then in the middle, which you're actually using both of the sub-protocols, the .cache and the .mem, and that's a device like an accelerator where you have both the processor and the accelerator each sharing each other's memory.
Next slide. So we're going to spend the most of our conversation today looking at all the new features and users models that are coming into CXL 2.0. First, it's fully backwards compatible with CXL 1.1 and 1.0. And Debendra will walk us through a good table showing exactly all the combinations of how you can use 1.1 devices in a 2.0 environment.
We're going to touch on switching and pooling applications, which is a very big need of CXL 2.0 spec. This is extremely important for being able to provide fan-out to connect larger number of devices to a single processor, as well as having efficiencies in a system so that you can easily allocate different memory pools to different CPUs.
Hot plug support came in with CXL 2.0 as well. This is extremely important, especially when we're talking about pooling applications where we're migrating resources from one processor to another, or physically being able to unplug them from a switch and add them back in, depending on the application. And Debendra will walk us through a nice easy flow of exactly how those hot plug capability works.
A fabric manager API was also added to the CXL 2.0 spec. This is extremely important because now we have an industry standard defined way of managing a fabric. Those of you who have worked with fabric protocols in the past and have fabric solutions, you know one of the biggest challenges is making sure that you have the software available to manage that fabric. With CXL 2.0 introducing a defined manager API, we now can have software that's easily used across platforms.
With CXL 1.1, we introduced obviously the capabilities of attaching memory, and now we're expanding this to non-volatile memory as well, where we're now adding persistent memory support. And of course, within a data center, security is always important, and so there's important changes we've made to expand security and adding link encryption capabilities. And then CXL 2.0 is also adding a built-in compliance interop program, which is of course very important for new specifications to make sure that we have all the right vendors working together to deploy a solution to the market.
From here, I'll pass it off to Devendra to walk you through some more detailed specification details.
>> Thank you, Ahmed. Next slide, please. Greetings. My name is Devendra Das Sharma. I will delve deeper into CXL 2.0 specification, but before we get there, it will be good to recap some of the protocol aspects of CXL 1.0 and 1.1. As Ahmed was describing just now, Compute Express Link has been defined, ground up basically, to address the challenges in the evolving compute landscape. And we do that by making both heterogeneous computing as well as different types of memory efficient to sustain the needs of the industry for many years to come.
So Compute Express Link, it's built on top of PCI Express infrastructure and it leverages PCI Express. What it does is it overlays caching protocols and memory protocols, which Ahmed referred to as CXL.Cache and CXL.Mem, on top of PCI Express protocol which exists, which is called CXL.IO. All of these three protocols, they run on PCI-E PHY and PCI-E channels.
So let's look at them one by one. There is a picture here. On the left side, you have a CXL device, which may be attached to some optional memory. Some of them have, some of them don't, as we saw in the picture before. And on the right side, you have a representative host processor. This could be a single processor, could also be a multiprocessor system. That's what is on the right side and you've got the host memory there. So CXL.IO is the I/O part of the stack. This is identical to PCI Express, so we will use it for device discovery, register access, interrupts, virtualization, and most importantly, the bulk DMA transfer with the producer consumer semantics. It's mandatory in CXL. And notice that all three protocols are running on a common PCI Express/CXL logical PHY layer and most importantly, they're running on PCI-E PHY. Three of them are dynamically multiplexed. CXL.Cache protocol, it's optional for the device. It allows the device to cache the system memory as shown in the purple color here. We're representing the host memory with the purple color. So if you're a device like the Type I and Type II device, you would have a caching infrastructure where you want to effectively bring the contents of the host memory into your local cache, be able to access it quickly, and especially if you have locality of reference and then proceed. CXL.Memory protocol, it's also optional for the device. So for example, if you have a Type I device, you won't have CXL.Memory, but if you have Type II or Type III device, you will have CXL.Memory protocol. So what this allows is for the processor as well as other CXL devices that have CXL.Cache semantics to access the device attached memory and it is shown in the blue color here and it's able to do that coherently. As mentioned, Compute Express Link is a low-latency interconnect. Latency is a critical component to ensure system performance.
We expect a CPU-to-CPU SMB protocol link like latency for CXL.Cache and CXL.Memory semantics.
And the reason behind that is we do not want users to have a bad experience, especially when you are caching something. Latency is of paramount importance or if you're accessing memory as a coherent memory. Compute Express Link is an asymmetric protocol. What it means is that the protocol flows and message classes are different between the host processor and the devices. It's a conscious decision to keep the protocols simple and the implementation easy for our devices. Now we have experience with enabling the industry with symmetric cache coherency protocols. What happens is that invariably a vast majority of them will shy away from making it to the finish line because the complexity is huge, design effort is huge, validation effort is huge, and the frequency with which symmetric cache coherency protocols change over time in a non-backwards compatible manner, all for very good technical reasons, that makes it very difficult to build an ecosystem. Let's look into each of these components and see why that is the case. So if you're looking at the host processor, it can be multiple processors that are connected to locally attached DRAM memory. Host processor has a mechanism to orchestrate cache coherency between multiple caching agents. It is known as home agent functionality. So these caching agents can be your cores, can be root port that is supporting PCI Express or any kind of an I/O device. It could be connected to other CPU sockets. So typically home agent functionality involves taking requests from multiple sources, resolving conflicts that arise due to different caching requests in flight at different times, tracking the cache line states. And the home agent tends to be very tied to the individual microarchitectural choice. And it tends to be, as I said, different between different generations of CPU, even from the same company. And definitely it's very different across different companies. And personally I have not seen many examples of multiple generations of CPUs from the same company work across their own symmetric cache coherency protocol links. And as I said before, there are very good technical reasons not to do so. On the other hand, a device that needs to cache the contents of the system memory, it needs to work with multiple CPU architectures. It doesn't really need to get bogged down in the job of orchestrating cache coherency between different CPUs, between different caching agents. So you want to get the benefits of the cache coherency, but you do not want to pay the heavy burden of home agent functionality. So effectively what CXL has done is that it says, okay, host processor anyway needs a home agent functionality. Let's define a simple set of mechanisms so that you could do a caching agent functionality in a standard supply. You can make a cache line request in a shared manner, exclusive manner. You might want a snapshot. You may do a write back, all of that. And then what happens is that because you go through all of these, it's a simple set that doesn't change. Effectively, messy protocol hasn't changed. So CXL does that messy protocol mechanism.
It incorporates a few sets of commands on the link, and now the device can cache the system memory without having to orchestrate cache coherency. So this allows us to advance CXL in a backward compatible manner, and as we will see, we have done that with CXL 2.0, fully backward compatible with CXL 1.0 and 1.1, and we have also retained the low latency characteristics. Next slide, please.
So this is a list of features that we have done for CXL. Amit talked about it, and we'll go through more details in subsequent slides. Effectively, I would characterize it as support for hot plug, support for persistent memory, support for switching, and support for disaggregation or pooling. So in order to do that, we had to have a bunch of features that we implemented.
So one of them was CXL, effectively a PCIe equivalent, a CXL endpoint that looks like a true endpoint. So that way what happens is that the CXL device can be discovered as an endpoint, and support for the CXL 1.1 devices, which can be directly connected to the root port or downstream port, that will still be preserved, and we'll see more of that.
Switching, as Amit talked about, single level of switching for fan out. We also have single level of switching with multiple virtual hierarchy in order to support pooling of resources.
We support CXL memory fan out and pooling with interleaving, and we'll see some more of that. CXL.cache in the switching context is a direct route between the CPU and the device. In other words, we do not want the switches to act as intermediary of reinterpreting cache coherency actions. So it's a direct connect for CXL.cache, whereas CXL.MEMORY is easy, you can always interleave directly, and it's just a routing of, hey, address X belongs to this device, address Y belongs to that device. And then the downstream port on the switch must be capable of being a PCI Express port.
Resource pooling is to support memory pooling for type 3 devices. So you've got multiple logical devices, where a single device can be pooled across up to 16 virtual hierarchies, and we will see pictures of that. We also have CXL.cache and CXL.MEM enhancements for additional performance. We have added support for persistence. We have managed hot plug support. There are additional function level reset scope enhancements for CXL.cache and memory.
And most importantly, we have a memory error reporting and also quality of service telemetry built into CXL.
From a security point of view, we have authentication and encryption that we have added. We'll see more of that. And last but definitely not the least, software infrastructure and API, where we have ACPI and UEFI ECNs to cover the notification and management of CXL ports and devices, and we'll see more of that. Most importantly, CXL 2.0 is fully backward compatible with CXL 1.0 and 1.1. We want to have also predictable spec release cadence. As Ahmed was mentioning, we released 2.0 spec in response to membership needs, and again, 2.0 is not the end of the journey. We are working on 3.0. So we take the right amount of scope and we deliver things within the time that our membership wants. So that way they can plan their products better.
Next slide.
So from a backward compatible perspective, we got two sets of tables. The one on the top is from a CPU and from a device perspective, and the table in the bottom is from a switch perspective. So if you're looking from a CPU to device connectivity point of view, so if your CPU is a CXL 1.0 or 1.1 CPU, well, if you connect to a CXL 1.0 or 1.1 device, you'll work in CXL 1.0, 1.1. If you connect to a CXL 2.0 device, you are going to work as a CXL 1.0 or 1.1. If you connect to a PCI endpoint or a switch, you will be PCI Express. So what it means is if you are doing a CXL 2.0 device or an endpoint, you need to support both the root complex integrated endpoint mode. So that way you will work with a CXL 1.1 CPU as well as the endpoint mode so that you can work with a CXL 2.0 CPU.
If you had a CXL 2.0 CPU, which is the next row down, if you are connected to a CXL 1.1 endpoint, you are going to run it as a CXL 1.1. If you are connected to a CXL 2.0 endpoint, you will of course run as a CXL 2.0. If you are connected to a PCI Express endpoint or a switch, it is PCI Express. So 2.0 CPU also needs to be bimodal for backward compatibility.
Now let's look into the switch connectivity. Switch will of course have an upstream port, which is the CPU. If that's connected, it has to work as a CXL 2.0 because there is no 1.0 or 1.1 definition of the switch. And that's fine because if you are a platform provider, you put down a switch, you need to make sure that you are connecting your switch into a CPU that is capable of having switching capability.
Now if you are a downstream device, a downstream CXL switch port, and you are connected to a CXL device, you need to work as either CXL 1.1 or CXL 2.0. It depends on what kind of device is connected. So if you are a CXL 1.0 device or a 1.1 device and you put it downstream of a switch, it's going to just work. It's plug and play, full plug and play. Same thing on the next row, which is the PCIe, it will also just work because again it's full plug and play. And right now CXL 2.0 is defining switching to be a single level. So the downstream port connecting to a CXL switch, we don't have that. And I'll talk a little bit more about that.
Next slide, please.
So let's talk about switching here. So as we said, CXL 2.0 introduces the notion of switching and switching is needed for fan out. So that way you can enable multiple CXL or even PCI Express devices to connect to one root port. This allows us to increase the number of CXL devices in a platform. CXL 2.0, as I said before, supports a single level of switching. This is done basically to minimize the complexity as well as the latency impact.
Now one can have multiple switches, as we have shown in the picture on the right, each connecting to the host to connect multiple devices as we have shown. Now switching does increase the latency, but it provides the tradeoffs for a system designer to balance the needs between bandwidth, connectivity, and latency. So for example, a switch may be a great way to aggregate bandwidth over multiple persistent memory devices where an increase in access latency may be acceptable. But on the other hand, if you have a very latency sensitive type 3 device connecting to DRAM for memory bandwidth expansion, then you need to hook it up directly. So like anything, it's a tradeoff, right? One size doesn't fit all. And the job of Compute Express Link is to ensure that all the usage models are accommodated for. Next slide, please.
The next major attraction, and this is one of the major attractions which is enabled by the switching functionality, is pooling, also known as disaggregating resources. So if you look at the picture on the left, you see multiple hosts represented as H1, H2, H3, H4. Each of these hosts basically represents an independent server, also known as a node.
Now each server represented as a host here can be a single socket server, can be a two socket server, can be a four socket server, or any larger way SMP system that has its own dedicated memory and I/O devices. So you've got some number of servers on the north side of the switch. On the other side of the switch, what you have is multiple CXL devices which are represented as D1, D2, D3, D4, etc. And these devices are basically in a pool that can be assigned to any host depending on the need. So at any given instance, for example, you can have multiple permutations of resource assignments. So let's take a look at the current set where device D1 is assigned to host D2. D2 is assigned to H1. D3 is also assigned to H1. And these are represented by blue color and green color. So you can see that and see who is assigned to what. That's just a pictorial representation.
Now you can, as we'll see next in a couple of slides here, you can have D2 can be offline for example from H1 and be available to any host and at some later point of time, let's say if H3 needs D2, you can assign D2 to H3. In that case, you are going to have D2 be assigned to H3. Switching with pooling with CXL enables us to do that. Now in this example right on the left side, each of the devices is what we call a single logical device. What it means is that it can be assigned to at most one host at a time. Now in addition to pooling of single logical devices, CXL 2.0 also enables the pooling of multiple logical type 3 devices to multiple hosts as you can see in the picture on the right. So a type 3 device, which is again a memory expansion kind of a device, it's type 3 CXL.io and CXL.mem, it can be accessed simultaneously by up to 16 hosts. The spec allows for that. So for example, if you look into the picture on the right, D1 is completely allotted to H1. D2 on the other hand is allotted to H1 as well as H3 as you can see in the color combination there. So part of that memory is assigned to H1, the other part of the memory is assigned to H3. You might still have some memory left over that you can assign to another host in the future. D3 is assigned to H2, H3. Now when a device is assigned to multiple hosts, the amount of memory assigned to each host can be different. It's not necessary that they will be the same identical amount of memory. You can assign different amounts of memory to different hosts. Now when you have that, similarly for the hosts assigned to a device, they can also change over time. Like you can change the amount of memory over time, you can also change the hosts that are assigned to a device over time just like you could do on the picture on the left side. The same physical link is used to access by all the hosts to access the device. All of these follow the hot plug flows of hot add and hot remove. Anytime you are making resources, you are migrating resources between devices.
Now the device as well as the switch, given that they are supporting multiple hosts, they are responsible for ensuring quality of service as well as isolation between different hosts.
CXL 2.0 basically uses the same fleet format as CXL 1.1. What we have done is that we have used some of the reserved bits in CXL 1.1 to effectively disambiguate between these different hosts up to 16 hosts accessing a given resource.
Pooling basically, philosophically pooling enables disaggregation of resources. Now these resources can be any type of IO, they can be accelerator, they can be memory. The benefits of disaggregation is that we do not have stranded and unused resources in the platform. You can have a pool of memory and that way you are not having plenty of memory in each host just in case you need it. Every host can have some reasonable amount of memory and you can put the rest of the memory in the pool. What this allows us to do is it's great from a performance benefit point of view, it's great from a power efficiency point of view, it also allows you to have denser compute, it allows you better flexibility. Overall it results in significant reduction of total cost of ownership, also known as TCO.
Now while the notion of pooling is important and the hardware solution is there, we also have architected a standardized fabric manager as well as a manager for the memory so that customers can interface with a standard software API that has a well-defined software hardware interface and we'll talk more about that.
This slide basically is the same kind of concept as the previous one. The picture on the left here is what we saw in the previous slide. The only thing I want to bring out here is you don't necessarily, pooling doesn't necessarily have to go through a CXL switch and incur the latency penalty. You could also connect devices directly into the host as shown on the picture on the right. So that way, of course, this one, your reach is not as big as the switch. You don't really have a fan out, switches do give the fan out, but in this case you can have multiple hosts connected to multiple devices through independent links and that enables lowest latency and highest bandwidth for DRAM kind of devices. And again, it's the flexibility of the choice that we can provide to the ecosystem that's important.
So we wanted to make sure that all kinds of usage models are comprehended.
Next slide, please. Here we'll talk a little bit about the hot plug flows where a device can be offline from a node and later on online to a different node. So if you look at the picture on the left side, there is a managed hot remove, which is offlining, and then you've got the managed hot add, which is the onlining. So in the offlining one, we'll request the host to remove the device D3 from H1. So the device D3, which is green in color, will be removed from the host H1, which is green in color. Host H1 will do the quiet flow. It will quiet the traffic flow from the device D3 and indicate that it is safe to remove that device. These are all well-known standard hot plug flows. At that point, device D3 will be removed, not physically removed, just offline from H1 and added to the pool of available resources. On the right side, continuing with the picture from the left, at some later point of time, host H2 says that, "Hey, I can use a device because my compute need has gone up." And in that case, H2 now needs a device, so the Fabric Manager will look into, say, "Ah, I got device D3 that can be given to H2." So device D3, in that case, it's going to just ask H2 to initiate the hot add flow, which will also initiate the hot add flow device D3 is then added to H2, and now normal traffic starts between H2 and D3.
Next slide, please.
So now we'll walk through a memory access that's going through a switch here. If you look at the picture on the left, this is representing what is known as a single virtual hierarchy or things as seen by a single node. In a subsequent picture, we'll bring in multiple nodes to go through the concept of pooling. This is just introducing the notion of what it looks like from a single host point of view. So CXL 2.0 defines the memory decode mechanism, which as we said, supports memory expansion and any kind of I/O expansion fundamentally, but switches support interleave across devices. A switch can interleave between two-way, four-way, or eight-way, depending on what the user wants to do. It determines the downstream, what we call a virtual PCI to PCI bridge, VPPB. So this is a virtual switch, virtual CXL switch is VCS, and effectively this looks like architecturally like a PCI to PCI bridge, except it's a virtual PCI to PCI bridge. You get a request that comes in from the root port, the switch looks at it, this is a CXL.mem, and it says, "Ah, this belongs to device D2," after doing that memory decode. It sends it to VPPB2, which goes to D2. Then D2 is going to give a response, and that's going to go back to the root port. CXL.io, for those accesses, it's very similar to PCI decode, that's very well known. You've got all the bar registers, you're going to do the mapping, and then figure out which MMIO accesses go to which port. CXL.cache, recall that there is no fan-out that the switch provides. So software must enable a single device with CXL.cache in that virtual hierarchy. A switch does address lookup and forwards the request response between the downstream ports and the root port for CXL.io and CXL.mem.
And if it is CXL.cache, of course, there is one port and you know where to send it to.
Next question. Next slide, please.
This is a fabric manager view of a multi-host system. Here, VCS, again, just to recap, stands for Virtual CXL Switch. Each root port, there are multiple root ports here, each root port connects to a physical port in the switch, but underneath that you've got a virtual PCI to PCI bridge, just like you would have on a PCI Express switch, which has a PCI to PCI bridge hierarchy. And that one is owned, in air quotes, by the respective host. The fabric manager, or FM for short, the box on the right that you see there, it works through a fabric manager endpoint in the switch. And it basically does the magic underneath the scene. It binds the VPPB inside the VCS to a physical PPB or a physical PCI to PCI bridge, which will then connect to an endpoint. So multiple VPPBs from different hosts can be mapped to one PPB. For example, PPB2 has been mapped to two different VPPBs, which basically represent two different root ports, one colored in blue, one colored in yellow. So the fabric manager orchestrates that binding. The host doesn't see the actual PPB2. That's really owned by the fabric manager. Given that two sets of hosts are accessing the same physical link, it's critical that neither gets direct access to that infrastructure directly. What it sees is the virtual PPB. And underneath the hood, the fabric manager orchestrates all the assignments, all the error reporting, everything else it does that. So fabric manager in this case, it basically, on the other hand, if it's a single ported device, like the one that you see in one, it's a direct assignment. You can just access it directly. Fabric manager configures the pool device, in this case, the one that is called an MLD, the second one, second device, which is the type three pooled memory. It does that and it sets all the things like memory allocation. It performs a bind command so that VPPB2 in the VCS0 is bound to LD15. And then that's shown in the blue color. And the switch basically, same way it does the yellow color binding to LD0. The switch performs the virtual to physical translation such that all CXL.IO and CXL.MEM transactions that are targeting VPPB2 in VCS0 are routed to the MLD port with the logical device ID set to 15. So whenever the switch does send the transaction from root port two to the MLD, it's going to assign logical ID 15 on the CXL link. So that way the device knows that it is the blue colored assignment, does the accesses, and then any response goes back with LD ID 15 and that gets routed back to the root port. Now, as I said, the physical link cannot be impacted by the binding and unbinding of a logical device within an MLD component. So things like your hot reset, link disable, if you happen to get those from the root port, they will be terminated in the VPPB. It's only the fabric manager that is directly in charge of that physical link.
Next slide, please.
So CXL, switching gears a little bit and talking about persistent memory. CXL 1.1 and 1.2, we provided the connectivity to the DRAM type of memory. We could also use the same architectural constructs to access persistent memory, but what we have done in 2.0 is we have taken into account the persistence aspect of that. So we are enabling that segment and the market needs, which needs the persistent memory, where we are seeing a lot of innovation. So while you could use the CXL.IO, CXL.MEM transactions defined in 1.0 and 1.1, which are good things to access persistent memory, and you can also use the SRAT, HMAT table to convey their characteristics from CXL 1.1. What we have done in addition in CXL 2.0 is add the architected persistence flows. How do you know that something is there in the persistent memory? So those architected persistence flows have been added in CXL 2.0. Furthermore, software API mechanism has been defined for standardized management of the memory and interface to ensure seamless user experience. Persistent memory is also supported in a variety of industry-wide form factors, which is an advantage of being on a interconnect that's supported by a wide range of form factors, so users have a lot of choice. Next slide.
So CXL 2.0 enhances the security mechanism from CXL 1.0 and 1.1. We still retain the security that are provided by the IoT TLB, the device TLB. What we have added is link encryption across all three protocols, and CXL switches are also covered under those security enhancements of 2.0. CXL.io leverages the PCI Express link encryption mechanism that has come out. CXL.cas and CXL.mem also use similar mechanisms. All three protocols use the security protocol and data model, SPDM for short, from DMTF, that is the distributed management task force. Both PCI Express and USB use those. So what we are doing is we are providing similar mechanisms across different types of I/Os, so that way you get the same look and feel on the I/O side of things. Again, trying to innovate whenever it is needed and leverage the goodness of the rest of the ecosystem to all of our benefit. Next slide, please.
On to you, Ahmad.
>> Thank you, Devendra, for walking us through all those details. So to recap what we have been walking through today, we have a large ecosystem momentum behind the CXL Consortium, one of the leading member companies that have been collaborating to really respond to the industry needs and the challenges that we are facing today as compute and memory bandwidth needs continue to grow. And as a consortium industry members, we are responding to those needs. We saw today CXL adding some new features. We have switching for system expansion, memory pooling for increased efficiencies. We have a fabric manager API to make sure we have a standards-based way to manage those larger pools. We had hot plugs to either physically add and remove devices or to virtually migrate resources within a pool, persistent memory support to continue expanding beyond just volatile memory, and the enhanced security features that Devendra walked us through. And all of this while preserving the industry investment by supporting backwards compatibility with CXL 1.1 and 1.0. Now, while we didn't go into a lot of depth today about it, CXL does include a compliance and interop program with the goal of course being to improve interoperability and making sure that we have all the providers working together and making sure we have a smooth ramp into production. So we expect compliance really to pick up in earnest next year as devices and systems become available and then the ecosystem starts to ramp into production. The call to action here is join the CXL Consortium. With these new capabilities, we're certain that the ecosystem is going to continue innovating and presenting new interesting problems and uses models that we collectively as a consortium can solve together. As I said earlier, adopter members can join for free and get IP rights, but don't get access to draft specs or drive the changes in what we need to do for the next generation of this specification. And this is certainly not the end of the journey for us. We've already started discussions on CXL 3.0 requirements, and so please do join us as a contributing member to shape the future of the CXL specification.
So with that, I think we'll pass it on and go through some Q&A.
I think there are a few questions coming up here. Let's see. There's a question that is, "Is CXL 2.0 data rate and channel reach the same as CXL 1.1?" Yeah, I'll take it.
The CXL 2.0 data rate, that's a good question. The data rate is the same as 1.0, 1.1. To recap, the data rate that we started off with was 32 gigatrons per second. We would still link train, you know, it's linked trained as an alternate protocol on PCI Express, starting with the 2.5 gig 8-bit 10-bit. Very early during that link training, you would know that you are a CXL device, and then and there you would determine whether you are CXL 1.0 or 2.0. Channel reach is the same. So 32 gig with the same channel reach as what you have on PCI Gen 5 today or what you have on CXL 1.1 today.
The next question here is, "Does a CXL 2.0 switch always have a single upstream switch port like a PCI switch?" Maybe take that one as well, Devendra?
Sure. If you are doing pooling, then you will have multiple upstream ports, right? So that was the difference between using the switches as a fan-out device. So while traditionally switches have always had a single upstream switch port, given the notion of supporting pooling, we are allowing for multiple upstream switch ports, but those are two different what we call virtual hierarchies or different nodes or different root ports, whatever is the terminology that you want. But it's fundamentally to multiple different servers, but for each server, for things that belong to a given hierarchy or a given node, there is a single upstream switch port. You can have multiple ones, but each of them going to a separate node. The next one is, "Can you have different sets of links from a node pool to a different set of resources?" Can you have different sets of links from a node pool to a different – I see. So just like we showed in the picture on the switch side, you had a link that came through from a given host, and then you had the fan-out. You can have multiple links come out, and each of them can – you can draw the same picture that we had with the pooling and then just do multiple sets of pooling in parallel. So for example, if I have the host zero, it can give a x16 link to one set in the picture, and that gets pooled with a set of resources. You can have a second x16 that goes to a different switching infrastructure, and you can pool there.
Want to take the next one? "Is security optional?" Sure, yeah. "Is security optional?" Yes, so security is optional. The link encryption isn't necessarily needed by all applications as well. In some cases as well, keeping line rates, link encryption can potentially increase the power of the device as well, and so it's going to be important for an end application to be able to have the control to enable and disable that, whether or not they really want to be able to consume that additional power of the device, or be able to disable it if they don't need that for that application.
A lot of data centers, recall, contain security from a data center perspective as well.
There's a lot of applications where the actual CXL lanes, the container lanes, are within the box, and it's actually not effective to snoop those anyway, and those applications may not require that link encryption as well.
The next one here is a 32-lane CXL configuration as shown in the Direct Connect example supported, or is it showing a total of 32 lanes per host? I think even the next one is related, which is the CXL 2.0 support degraded mode speed, so 8 giga transfers, for example.
So the question here is referring to the Direct Connect where I showed a memory device connected to multiple hosts, and the total number of lanes was 32. The reason I chose 32 was not necessarily because we don't have a 32-lane monolithic CXL configuration. CXL links are 4, 8, and 16 on a given port. Now what I was showing in that example was multiple ports effectively, and if you think about that, if each is connected as a by 8, fundamentally they're different ports. So if you connect, let's say, four devices as by 4 to each host, then you've got 32 lanes, but those are all independent lanes. So it's an aggregate 32, but there is no correlation between those set of links. So in other words, we don't – and it's not a total of 32 lanes per host. You can have as many lanes as you want. The key there is the link width that CXL defines are 4, 8, and 16. You can have a 16-lane CXL port, be able to bifurcate it up to 4 by 4, in which case there will be 4 by 4 links, but there is no notion of you can club 32 lanes and they're going to work as a single entity. The same concept that we had in 1.0 and 1.1, the data rate is 32 giga-transfers per second, but if there is a link issue, you can degrade to 16 gig, and you can degrade to 8 gig, and it will still work as CXL. So that part hasn't changed. Maybe you should take the next one around the CXL switch. Is the CXL switch same as the PCIe Gen 5 switch?
Who will be the vendors for pooling – yeah. Will that be an external PCIe Gen 5 switch?
Yeah, that's a good question. So CXL does leverage the PCIe physical layer, but when we take a look at the protocol layer, things are different. And so you can potentially have a PCIe 5 switch that also supports the CXL protocols as well, but that's not necessarily the case. And so you'd have to take a look at vendor by vendor of whether or not their PCIe 5 switch supports CXL as well. Who will be the vendors? We're here really as a consortium to discuss the CXL 2.0 specification. We can't comment on individual vendors. As of right now, I'm not aware of any vendors that have publicly announced the availability of a CXL switch, so we'll have to keep an eye on that. And then the next question is for pooling.
Will that be an external PCIe 5 switch or an external CXL switch? There's nothing within the spec that would limit you for either having an internal enclosure with internal cable to address a pooling application or requiring that to be an external enclosure for switching.
So that really comes down to the final infrastructure and how it's developed. There are also re-timers that can help address longer reach challenges if you wanted to go outside the box and go further with cables as well. But from an implementation perspective, it can be an internal enclosure or an external enclosure with either internal or external cabling.
Do you want to take the next one, Dinesh?
Sure. Hot plug removal of memory device, how does the data wiped out for other server or users, especially when persistent memory is used for the pooling? So data at rest needs to be protected and that's a well-known thing, right? If you have, for example, an SSD today and you remove it, you plug it in elsewhere, you cannot access that data. So we expect similar mechanisms to be adopted. Did you want to add anything there, Ahmad?
No, I think you've covered that well. Perhaps I guess one thing to add is when we take a look at the end point types of devices as well as the host, it really comes down to a fabric type of solution for being able to migrate those. You've touched on a very important need there is as you move resources from one to the next, it will be important to be able to wipe those out. In some cases, especially with persistent memory, data at rest encryption capabilities have historically helped in different types of fabric solutions. When we take a look at fabric NVMe applications today, by just simply removing the NVMe drive because it's encrypted, you don't necessarily need to actually wipe out that data. The same type of capabilities can be done here with a persistent memory CXL device as well. There's nothing precluding that within the CXL specification to be able to support that.
Okay. Next question is...
Any physical layer changes for CXL? For CXL 2.0? Nothing beyond the negotiation of the capabilities, but no real changes on physical layer. Of course, you have to negotiate whether you're a CXL 1.1 or whether you're a CXL 2.0. Some of those capabilities get negotiated as part of the alternate protocol negotiation, but really no physical layer changes. It's the same physical layer. Want to take the next one, Ahmed?
I can read that. I'm not sure I know the answer, but let's look through it. I understand that CXL 2.0 does not specify cascaded switch support, but did it inhibit such a configuration?
No, we don't inhibit something. As a spec, we provide support for certain configurations, and of course, people always get creative and do things, right? As long as you are within the... You can guarantee interoperability, people can do certain things. So, we don't inhibit. It's a wire protocol, so you can do as long as you're sticking to that wire protocol and as long as you're sticking to the usage models that we have done. If you want to innovate on top of that, sure. In fact, we really welcome feedback of people's compelling usage cases that they might have, so that way we can service the requirements of the ecosystem.
Yeah, that's a good point. We'll take a look. Everything that we're doing is being driven by a need, and so certainly seeing those usage cases and if there's a need, then we can go and solve those more difficult challenges.
This looks like the last question on the list with one minute to go. It's a good timing here. Is there an election of a master fabric manager if all the hosts are running the fabric manager? I'll touch on this a little bit, and then Deben, you can add. When we have a multi-host pooling application, what will typically be the norm is to very likely have a sideband fabric manager, and the key part of that being so that one host is not necessarily stomping on another host. That will likely be the more common approach is to have sideband fabric manager, but specifically to the question, Deben, can you expand on this election portion of the question?
I mean, the reason you want to have an election is if the fabric manager itself fails, but I think I agree with you. The hosts running the fabric manager, you don't want that, but as a spec, we define a fabric manager, and if you want to, let's say, have two different fabric managers acting as pair and spare, so that way if one fails, you can have the other one that's really, you can have your own little sideband mechanism of deciding who gets to run that, but we have defined the access mechanism, and the notion is there is a sideband fabric manager independent of the hosts.
Thank you, Ahmad, and as I'm giving you the floor.
Thank you, Ahmad, thank you, Debendra, for sharing your expertise, being mindful of the time here. We will wrap up today's webinar, and unfortunately, we couldn't address all the questions we received today, but we will be addressing them in a future blog, so please follow us on social media for updates. The presentation recording will be available on CXL Consortium's YouTube channel, and we will also be uploading the slides on CXL's website.
We would like to encourage our viewers who's interested to learn more about CXL to join the CXL Consortium, download the evaluation copy of the CXL 2.0 specification, and engage with us on Twitter, LinkedIn, and YouTube. Once again, we would like to thank you all for attending CXL Consortium, introducing the Compute Express Link 2.0 specification webinar.
Thank you.