-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path269
70 lines (35 loc) · 40.3 KB
/
269
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Good morning and good afternoon. Thank you for attending the CXL Consortium, an overview of the Compute Express Link CXL 2.0 ECN webinar. Today's webinar will be led by CXL Consortium Technical Task Force co-chair Ishwar Agarwal and Software and Systems Workgroup co-chair Mahesh Natu. Now I will hand it off to Ishwar to begin the webinar.
Good morning, good afternoon. Hi, my name is Ishwar Agarwal. I'm the CXL Technical Task Force co-chair. My co-presenter today is going to be Mahesh Natu, the CXL Systems and Software Workgroup co-chair. So this particular webinar is about the CXL Consortium's 2.0 specification ECNs. As you guys probably may be aware, CXL Consortium released the 2.0 specification, which introduced support for switching, memory pooling, and persistent memory, all while preserving industry investments in supporting full backward compatibility back in November 2020. And since then, based on member feedback, the consortium has made a number of significant improvements to the specifications in the areas of device management, RAS, security, memory interleaving, and others. And this particular webinar is going to review the key CXL 2.0 specification ECNs and the new usages that they enable. Okay, so with that little introduction, I'll go ahead and get started.
So a brief introduction to where we are as an industry, right? So the way we think about it is that there was this era of cloud computing proliferation, and that is what I think about as generation one. And we are very much past that. It was very much accelerated with the pandemic. And we are very much in gen two, which is the growth of AI and analytics. And we are very shortly coming up on gen three, which is the ultimate cloudification of the network and edge. And with this kind of a trajectory that our industry is on, we have a number of new challenges as well as a number of new opportunities to scale to the next frontier of technology evolution. And that's really where CXL position itself to be.
So what is CXL? CXL is a consortium, which is defining an open standard for high speed communication. It's an open standard with 170 plus member companies today. It has most of the technology leaders that we know and recognize, some of whom are mentioned on this particular slide, but a whole lot of others as well. And it's a very vibrant ecosystem with very active participation in a number of technical work groups in the CXL consortium are introducing new features, new usages that are enabled by this new open standard for high speed communication.
So quick introduction to what CXL really is. CXL addresses a number of challenges that our industry today faces. And that is the challenge for driving demand for faster data processing, especially in the context of next generation data center performance. It addresses the challenges associated with heterogeneous computing and server disaggregation. Another very, very key and very fundamental usage of CXL is in the arena of memory, specifically memory bandwidth and memory capacity. And all of these translate into this need for a next generation interconnect that stitches together all of these different usages, these different challenges into one coherent standard that can actually help scale all of these different opportunities. So CXL defines an open standard cache coherent interconnect that can connect CPUs, memory expanders, and accelerators. It leverages PCIe as the base and introduces three mix and match protocols that provide low latency access to cache and memory as well as removes some of the complexity associated with traditional coherency and memory management.
So when we think about a CXL device, we really think about a CXL device in the context of these three separate usages. These are essentially at a high level of way of binning these devices into the three nomenclatures that we use within the CXL consortium. These nomenclatures refer to devices as type one devices, type two devices, or type three devices. The type one device is a device which is an accelerator with coherent caches. And this is a type of accelerator where key usages are things such as partition global address space mix or devices that need to do remote atomics to host attached memory. In this context, the device uses CXL.io protocol and CXL.cache protocol. The second type of device is the type two device, which is a type of device which has coherent memory on the device itself. And typical usages for such types of devices are GPGPUs, FPGA, any type of accelerator that wants to do dense computation. And the protocols that this type of device uses are CXL.io, CXL.cache, and CXL.mem. And the third type of device, which is last but certainly not the least, is type three device, which are really memory buffers. Now, memory buffers are really a way of extending system memory for either extending system memory bandwidth or system memory capacity or system memory capability. For example, adding new types of media or media with different characteristics such as persistence. And this particular type of device is CXL.io, uses CXL.io and uses CXL.mem.
So when we talked about CXL and CXL 2.0, it enabled a number of these very advanced, very cool new usages with CXL, including pooling, switching, etc. But as we talked through the work groups, the member companies brought forth a number of different usages, a number of different features that they would like to enable with CXL 2.0. And to address that, CXL actually has a very robust ECN process. ECN in this particular context stands for engineering change notification. And what this is, is a process to make changes to a release specification through a review and approval process where these changes then get directly integrated into the baseline specification. And this flowchart briefly outlines that process. The key takeaway from this whole flowchart is that the ECN is something which is a change proposal that can be brought forth by any member company or one of the board of director companies. It goes through a number of review steps, including through the technical work groups. It has an overall review process through an overarching body called the technical task force. Once it goes through all of the review through the technical work groups and technical task force, it's sent to the board of directors for approval. And once the board of directors has approved it, there is a review period for IP rights review as well as member review. And only after all of these thorough due diligence is done is an ECN then ratified and made part of the CXL specification. This robust process ensures that through all the steps of the ECN process, there is oversight as well as review and we are able to take into account member feedback. Okay. So with that process having been set, now what we will go through next are a number of different key ECNs that were brought forth with CXL 2.0 to enable different types of usages.
So the first ECN that we're going to talk about is called CXL error isolation. Now, error isolation is a particularly interesting and important topic for CXL. And the context over here is how do we make this standard a more robust standard, especially in the event of unforeseen failures. As we in the technology industry recognize failures are inevitable and especially on the data center scale. This is something that we have to live with, but there needs to be a way to be able to contain failures to as small a granularity as possible in order to impact the least number of tenants that co-reside on a machine. And this particular ECN addresses exactly that problem statement. So the error isolation ECN describes and defines a way for graceful handling of uncorrectable errors at or below a CXL root port. The context here is the CXL specification 1.0 and 2.0 had two mechanisms already defined for error containment and they were data poisoning and viral indications. However, through the review and feedback process, we found that the poisoning and viral indications did not cover all the usages that we would like to guard against in terms of error containment. Specifically, it did not cover error types, which put into things to events such as surprise link down. Surprise link down could be for any reason. For example, let's say you have a memory buffer connected in a data center through a cable and accidentally somebody kicks the cable out. Or if the device encounters a device specific fatal error, what happens? Or if the device has, let's say there's a pooling device which has a security violation detected at a CXL memory controller, what happens? In the baseline specification, unfortunately, such errors may have propagated up all the way up to the CXL host bridge and that would have led to a full system crash. The reason for a full system crash is because CXL is a coherent interconnect and it's very hard to deal with errors that happen in a coherent system because the containment is very hard to guarantee. However, CXL error isolation allows for such errors to happen on CXL.cache protocol and CXL.mem protocol and actually defines a way where such errors can be contained at the CXL root port and not allow the errors to propagate throughout the system. It also defines a mechanism for software to opportunistically recover from such an error without a full system reset.
So the solution that the consortium came up with was a mechanism where error isolation would get triggered at a CXL root port in the event that a request would timeout or if the link goes down. So CXL.mem transaction is going to be tracked at the CXL root port if the root port implements CXL error isolation. And let's say a CXL memory request or request with data times out, the root port would trigger CXL error isolation as an example. And once the error isolation is triggered, what the CXL root port then does is that it synthesizes responses for all new as well as pending requests on CXL.mem. And CXL.mem is just an example, the same thing happens on CXL.cache. And when we talked about CXL.mem transactions being handled gracefully by the root port, what that really means is that all the reads that were pending as well as new reads get synchronous error responses returned and the writes get dropped and completed. That's very important because what that does is that it allows the rest of the system upstream from the CXL root port to essentially continue functioning even though the root port and all the hierarchy below that root port has died. It also allows other CXL root ports that are co-resident on the same physical machine to also be functioning and not be impacted by an error that has occurred on a device below a different root port. So once we have handled the error at the root port, we also have mechanisms defined for software to investigate and log the, we have a mechanism for logging the errors so that software can come and investigate what exactly happened. We also have mechanisms defined for software to then trigger a link reset and a bus reinitialization so that it can try to recover from such an error. The other nice thing about the way this particular ECN was defined is that this ECN only impacts the host. That means all of these mechanisms can be implemented inside the CPU and it does not need any change in the CXL switch or in the CXL device. At this point, I'm going to ask Mahesh to take over and walk us through some of the other ECNs. Mahesh, do you want to take over from here?
Hey, thanks Ishwar and thanks for painting the big picture of what CXL is and the ECN process. That definitely helps with the technical details that I'm going to go into starting the next slide.
So actually before we get into my material, there was one question on the chat that I wonder if you want to address. I think the feedback is the flowchart is not readable and I believe they're referring to the ECN flowchart.
Okay. Yeah. Thank you for that particular question. We can try and make that flowchart available in a more readable format after the webinar.
Okay. So there's no other questions right now for your material. So I'll jump into memory interleaving and some other topics. So as you can see from the right hand picture, a system can have a number of CXL devices and as part of the CXL 2.0 spec work, we defined the ability to interleave memory devices. It's a pretty common capability that's available with, let's say, native DDR attached memory where the different DDR DIMMs are interleaved together to improve performance. So as part of the 2.0 spec, we defined methods to implement two, four and eight way interleaving options. We didn't really want to go overboard and define too many options to begin with. And we thought we'll add new options as needed. As Ishwar mentioned, there's just a lot of interest in the type three and memory expansion usage model that we witnessed. Just phenomenal interest. So we realized after we did in the 2.0 specification that this particular area needs to be beefed up a little bit. And that's why this is came about. So this is an adds the options to allow three, six, 12 and 16 way interleaving capabilities, which means we can take up to 12 or 16 CXL devices and make them part of a single interleave stack. Again, the key reasons were these options are today available with native DDR. And so when CXL is used to expand the DDR capability or the native memory bandwidth or capacity, it makes sense for CXL to have similar capabilities. And a lot of times also we noticed that system designers want to match memory capacity with compute capabilities such as number of cores or maybe other aspects, the I/O channels. And going from four to eight was a big jump for many of them given the cost of the memory devices as well as the number of I/O lanes that they consume. So the six way was considered a sweet spot between four and eight that folks were interested in. The way this is defined was in a way so there's no impact to switches and only impacts the devices and the host. It's easier for the ecosystem to swallow this change as an ECN. Unfortunately, the three, six and 12 way interleaving map requires doing a three way map, which means that the host have to implement more three like function to select one out of three or six or 12 targets. Previously with the two, four and eight way, life was simple. Host could just look at one, two or three address bits and select the target. With the three, six and 12, we had the added complexity of doing a mod three, which is extra work and not super trivial. The device also needs to do the matching by computation, which is like a divide by three operation, which divided by three, six or 12 on the HPA to compute the local address. So again, affects both hosts and devices. And in order to limit the complexity, scope and some other aliasing possibilities when you're mixing a two way and three way math together, we also limited some combination that are supported. For example, this ECN describes what interleave options are available at different levels and how they can be combined to get either three, six, 12 or 16 way interleaved device option. The base requirement that we had previously in that spec, namely you must use contiguous address bits to select interleave, that remains in place. So I mean, this is probably best illustrated with the example. So on the right hand side, I show an example where the host has six root ports. Each root port is attached to a CXL type three device. And all the six devices are now interleaved together as a single interleaved set. Again, I'm not showing six devices, but you get the idea at the bottom. The way this is done is that the host specific logic, all the way to talk, implements a three way interleave, let's say at a one kilobyte granularity. This is affecting the address range, 16 to 22 terabytes. So remember there's six terabytes of address range now, 16 to 22. Each device is one terabyte in size. And so six devices total will give you six times one, six terabytes of address space. So the host specific logic is going to pick one out of three host bridges at one granularity using the mod three math I talked about. When it comes to a CXL host bridge, each of them has two root ports. So it's going to use the standard two way math to calculate which of the two root ports the transaction should go to. So it's going to either go to the left one or the right one in each case. So that gives you three times two six way interleaving. And when the device gets the address, it will see effectively a six way interleave at a 512 byte granularity, which means every 512 byte address, chunk is going to go to a different device. So 0 to 512 will go to the first device, the next 512 byte will go to the second device, and so on and so forth. And after the six devices are completed, it will go back to the device, first device. Standard interleaving.
Then I will show a little bit more complicated picture. This is an example of 12 way interleave and involves multiple levels. So again, at the host logic, we're doing a three way interleave at 1k, just like we did previously. Same address range, 16 to 22 terabytes. In this case, each device is half a terabyte in capacity. So half a terabyte times 12 gives you a total of six terabytes for the address range. Again, at the CXL host bridge, the host bridge will select one out of two root ports, same logic as before. The additional step is that each root port sees a switch below it. So there's a total of six switches that are not shown in the picture, but you can imagine. And each switch now gets the same address range, 16 to 22 terabytes, and it's going to do a two way selection, meaning select one of the two downstream switch ports based on one of the address bits, which happens to be two 50 bytes. So that will be address bit eight and select the device below. So again, three times two times two, and you end up with a 12 way interleave. And the device, again, sees a 12 way interleave at 250 bytes per annularity. And it's going to divide by 12 math effectively to figure out the local address, because it's getting one out of every 12, 250 byte chunk. So I think hopefully these pictures clarify how this works better than the language.
The other interesting one, again, a good example of how we were able to get member feedback almost real time and react to it. So just the background, if you look at the CXL 2.0 spec, it states that a 2.0 device is required to interop with the CXL 1.1 host. And the addressing scheme for the device changed when we went from a 1.1 spec to 2.0 spec for scalability and addition of switches. So in the 2.0 connection scenario, the picture on the left shows how the switch register layout looks like. Again, it's showing a simple device with a single PCI function for simplicity. And thanks for the feedback. And so the registers look pretty much like a standard PCI register. So you have the config space here, as BAR, that points to the cache name registers. And cache name registers have things like the as link and html, standard stuff, defined in CXL 2.0 spec. The 2.0 spec required that when the same device is connected to a 1.1 host, the register layout looks quite different. If you remember, 1.1 devices have the notion of this memory mapped RCRB space in the downstream port. And that essentially points to the cache name registers. And that points to all the other as link and html registers together. And the config space only has BAR and some DVSEC registers. So if you look at the left versus right, some registers that appear in the config space here, now they need to be sent up into the memory space. And that morphing, the feedback from the members that are implementing this, the IP vendors or the IS vendors, was that this was particularly tricky. This was possible, but was particularly tricky and error prone. So what we did with that is we simplified the morphing to be very, very simple. So this ECN call 1.1 mode operation without RCRB essentially eliminated the requirement that the 2.0 device implement RCRB in order to interoperate with the 1.1 host. So if you look at what the ECN does, it allows the device to maintain the same address map that it had here on the 2.0 mode and map it the same way here. What we added was the device needs to be able to return all 1s when software attempts to read the RCRB space. That is how software determines that this device is a 1.1 device but is implementing the ECN. And it knows to look for all these other registers in the config space, like a 2.0 device. Again, this was a simplification for the device designers. Required a change to the software, but we were able to intercept this in a timely manner before a lot of software was written. So this was made possible. If you had waited a couple of years or a year for this, there would have been a lot of legacy software around that assumed the 2.0 spec behavior, and this would not have been possible. So it's really important for us to be able to react to member feedback in real time and provide a solution as soon as possible.
The other ECN.
Mahesh, there were some questions on the bridge. Do you want to take them now or at the end?
Sure. So let me take them. So I can read them out and you can see them. So, OK, one of the questions was does no RCRB requirement also mean no RCEC requirement? The answer for that is no, that's not what it means. So RCEC is required if the device is exposed as an RCIEP, a root compass integrated endpoint. And in this picture, that aspect hasn't changed. So this device on the right hand side needs to expose itself as an RCIEP. So this link is still not visible to the legacy software because the downstream port above still has the same layout as 1.1. Therefore, you still need to host implement on RCEC and the errors from this device always go to the RCEC. There's no difference that way. Hope that answered the question. If that didn't answer, then please post a follow up question and I'll try to answer that. And the second one is on the previous slide. So the three-way interleave. Yeah, I think the question is, is the three-way interleave only supported at the proprietary cross-host bridge level or is it supported at each level host bridge USB device? So since we didn't want to affect switches and we didn't want to just to do the awkward three-way mod operation that does add latency and complexity, we'll limited the three-way math to only the host in a host proprietary logic and the device. So there is no support for three-way interleaving at the USB, for example, right? But obviously there's three-way interleaving support in the device in order for the device to translate the host physical address to device physical address. Hopefully that answered the question. If not, please repost. So let me continue with the next ECN. Again, this is also a very interesting one. So as we saw, there was just a lot of interest in type two devices. And as we saw a number of customers looking into this option, a few things became very clear that in order for ecosystem to use these devices effectively, they needed a standard based and a symmetric in-band and out-of-band configuration interface. So in-band interface is important for OS to manage device. Out-of-band interface is commonly used by the baseboard management controller to manage the device when OS is not running or in a manner that's OS independent. And both are sort of standard expectations in the data center these days. It is also important for these devices to be able to do time to market, which means they needed some simpler design options. And there was also a lot of interest in the ecosystem to be able to take the current BMC firmware designs and being able to quickly spin those to take advantage of and manage the type two devices. So when we looked at this feedback, we realized that we needed to implement a way by which these type two devices could be managed out-of-band. So I mean, if you're familiar with the CXL.2.0 spec work, we did define an interface for in-band management of the device. That is in chapter eight of the CXL 2.0 spec, pretty extensive details as to how OS can manage devices. It does things like media management for different medias, handles memory errors, update of the firmware, data security, just to name a few things. So it's a pretty extensive set of capabilities that OS can use to manage the device. So in order to meet these needs, what we did is we did something very simple. We decided to map the same messages, the same command set that OS had over to out-of-band interface and the most industry wide use transport today is MCTP. So we essentially mapped the same messages over to MCTP. So allowing BMC to sort of issue the same commands using MCTP transport without having to redefine the message type. So if you look at the picture on the right, what we had originally was things on the left. This is what the 2.0 spec had. We had a type three device using CXL.io. We defined a mailbox interface that allowed the OS to send and receive messages to the device for things like managing the media and the type three OS driver, a generic driver could do that and use those messages and the mailboxes to manage these devices. What we added was for the BMC or Baseboard Management Controller, which is generally typically a microcontroller on the motherboard that is handling all of the out-of-band OS independent management to be able to send the same messages over MCTP. And MCTP is a DMTF standard and they are our, we collaborate very closely with DMTF on a number of items and this is one good example of collaboration between these two standards bodies. So by wrapping these over to MCTP, MCTP messages can be sent over different interconnects including PCI Express, SMBus. So we get all of those options pretty much for free, allowing the BMC to use existing methods to manage the type three devices in the system. So I think that was a pretty efficient way to address the problem and the industry came together, the member companies came together to build a solution in a relatively quick time frame.
So moving forward, so this was another interesting one. So we, in part of the 2.0 specification, we use the feature called IDE and IDE stands for Integrity and Data Encryption and allows the CXL.cache mem traffic that's flowing on the link to be encrypted and integrity protected, which means that any modifications to that can be detected and the data cannot be snooped by, let's say, any interposer that are trying to observe the bus traffic and try to potentially steal user data. So IDE will guard against that. What we could not do, again, just because of the time limit, is define how the software or the firmware will configure IDE. As you can imagine, in order to encrypt and integrity protect, both sides need a set of symmetric keys in order to process the data and encrypt it. So we didn't really have the flow defined as to how that is done. So this ECN sort of defines that by adding a set of messages. We call them CXL_IDE_KM and these messages are sent by the host software. It could be, or it could be BMC, for example, writing something I use to a device that supports IDE. And these requests are used to provision the keys, configure the keys, and then finally ask the component to either start the IDE session or tear down an existing IDE session. So all of that management of the session is performed through these messages. Again, the request and responses are protected. Again, it doesn't help you a lot if the keys are sent in plain text, because who can observe the keys can also observe the traffic that's going to be encrypted in those keys. So these messages are protected as well using, again, another DMTF standard called Expedia. So on the right-hand side, it shows a pretty impressive software stack. But again, it just shows that we are able to leverage most of the existing standards. So for example, if you look at all of the boxes that are colored white, those are DMTF standards. The sort of light blue are PCIe standards. So we're able to leverage all of that. The thing on the right, almost right hand, is the IDE protocol itself. These are changes to the protocol by encryption. They were part of the 2.0 spec. And the only piece we added was this new set of messages for configuring them. Everything else, we were able to leverage the existing industry ecosystem and just get things started. Again, it is really good to be able to leverage other standards, because you can leverage the architecture, all the effort that has gone into making sure these specs and these interfaces are stable and maintained. So I think a lot of across industry collaboration, across different standards bodies, including DMTF and PCIe, to make this possible.
So going, the other one, I think most of you are probably familiar that compliance is a very essential part of any standard. The main purpose in life of a standard is to make sure components from different vendors can interoperate. And they can only interoperate if they follow the spec as intended. So compliance tests allow a component to be tested against the spec requirements and is essentially a key piece of allowing interoperability. So some of the things that were added in the compliance to us with dedicated ECNs include things like an option to inject memory errors. So CXL device defined a pretty robust RAS capabilities, if you went to Poison and Viral earlier. But we didn't really have a way by which someone could compliance test those and make sure that device behaves the way spec intends. So by allowing the compliance test to inject errors, like things like errors in the data region, errors in the metadata areas, or being able to spoof health status changes, the device could be tested against those requirements. Similarly, there was a new test being added. So device could be put into viral mode. Again, this was something that was already part of 2.0 spec, the viral mode, but there wasn't really a robust way to test that. And some of these things we realized as we go on, as certain features in the CXL spec are pretty important, but at the same time, it could be hard to get to right. So that's where the compliance workgroup focused on and identified these areas where they needed to sort of bolster the current compliance requirements and cover this additional requirement. There was also changes to how the compliance data object exchange is handled to provide more scalability. Previously, it wasn't possible to query some of the capabilities. This third ECN, the DOE return value allows the compliance test to query what the device supports and then can use those to further write other tests. There was also some simplifications to be made as we went to the current compliance test. We identified certain areas where we had defined some requirements for the device that were really hard to implement. So some of this was also allowing the devices to have an easier design option and get to the market quicker. CXL 2.0 also added a feature called QoS telemetry rather late in the cycle. So we didn't really have time to go develop the compliance requirements around that. So this was sort of a diving catch to add and capture that requirement in the compliance test cases. And one thing I want to make sure that the other ECN that Vishwa and I talked about, they all have relevant compliance requirements added. So for example, when we introduced the interleaving or the MCTP ECN earlier, relevant sections to the compliance test were added to make sure these ECNs will be tested as we see devices implementing those ECNs.
Mahesh, there are a couple of questions on the chat window if you want to answer them now.
Sure. So I think one, I believe, already answered the question was what is BMC? So BMC stands for Baseboard Management Controller. That is often used to manage the system in an OS independent manner. So generally a microcontroller with its own memory and its own firmware and its own subsystem that's sort of isolated from the OS. And the second question is why are they indicated as SPDM messages in the IDE establishment ECN? So they actually payload to the SPDM vendor defined request and response messages. So yeah, this could be a matter of semantics. So SPDM vendor defined request and response are defined by SPDM specification DSP 0274 for vendor extension. So that's really what we mean there. If it's too confusing, again, please elaborate on what specific confusion it's causing and we'd be happy to either provide clarification either here or through the spec. Thank you. Okay.
So I think we're coming to the end. So I want to just summarize a whole bunch of other ECNs. Again, not saying that these are not important, but we just have a large number of ECNs that it was not practical for us to go through all of them in the time allowed. So I have a summary of things that we didn't cover today, maybe at a future time. All of these are also posted on the CXL Consortium site. We'll be providing the link to it. And so I recommend folks who are interested to go visit the site, download the ECNs and look at the details and provide feedback if there's any. So the first one in the list is what we call CEDT, lots of acronyms here. CEDT is CFMWS and QTG_DSM. So what this is in English is that the CFMWS is an extension to ACPI tables that allows the software to discover how the host address decoders are programmed. And this allows software to figure out, so going back to this previous picture that we had. So CFMWS would allow software to, the firmware to notify the software that this particular address range, 16 to 22 terabytes, is configured in a certain way in this decoder. So let's say at boot time, nothing is plugged in here, but all these show up later, a hot add it. So the OS software can know this range is configured this way here, and now can assign addresses to this device that show up later, but it knows what address is going to be targeted to this range, right? So that's what CFMWS does. The other piece of QTG is just providing software primitives that allow the CXL 2 QoS telemetry feature to be enabled, right? It allows the software to figure out which throttling group a particular device address range needs to be mapped to based on the platform policy. There were a couple of things that we did to add design flexibility as again, members started to design or think about how to implement the CXL devices or components. They were looking for areas where the spec was either too restrictive and causing a complexity design that wasn't really needed, right? So the mailbox ready time allows devices to take longer than the fixed amount of time that specs say, right? I think specs say two seconds, right? They can take more than two seconds to get ready again. Some devices may, based on the in-hand microarchitecture or the firmware design, right, may take longer. So this ECN allowed that option. The null-captured ECN here also allows some flexibility to allow the component to jump over one of the header entry, right? If they need to, without really having to do complicated shuffling of pointers in the hardware. So this was also a simplification of the option. The register-located DVSEC allows vendor extensions. There are certain areas where vendors wanted to innovate and add their own capabilities, but still leverage some of the base DVSEC structure that CXL had defined. So that was the located DVSEC ECN. And I think this one, I would like to cover in detail, but I don't have time for it. It's a very important one, be able to collect the component state. So this is extremely useful for ascent debug. As you can imagine, the CXL components, they're all new, right? A lot of new designs, industry is learning. And in order to debug any issues that may be discovered during the bring up or in the field, it is important to be able to observe the state of the component in the scenario of failure or behavior that's unexpected, right? So this defines the standard commands by which a BMC or the host can request a component crash log, meaning if the component were to fail, it would be able to sort of log some critical information about itself. So someone outside can figure out what may have happened, right? There's also a request for component to capture current state, current execution state. This is again useful for debug or bringing up a new system. So I think the last one is a pretty important step. And I think we found it's important to standardize this early on. So the devices that are being designed today can intercept it and not have to invent vendor specific mechanisms for these things. So allow us for uniformity across the industry and debug practices.
So just summarizing, so I think Ishwar did a pretty good job of going over what CXL consortium is. And so again, I want to highlight that the momentum is growing. As you can see, we have more and more members. I think the number of members, we're having a hard time keeping the count of the member count straight. If you can see the number here is different than the one that Ishwar showed. This is how added new members over the last week or so looks like. And we have been very responsive to industry needs. That's really the reason why I think we are seeing the momentum. As you can see from the ECN description, we are able to quickly get to the problem that industry needs us to solve. Work through different work groups, build a proposal, and then come out with a solution really quickly and not have to wait for the next spin of the spin. We are finding that there is a lot of excitement in being able to use memory attachment or even personal memory to the CXL devices. We have pretty robust driver software model. And so the call to action today really is if you are not, please join the consortium. There's different levels at which you can join the consortium and contribute to this movement. So again, if you have problems that need to be solved or if you have solutions that you think you can offer to the consortium, this is the best way to do that is by joining the consortium and contributing. Now I think we'll have time for Q&A.
Yes. Thank you, Mahesh. Thank you, Ishwar. So we will now begin the Q&A portion of the webinar. So please do share your questions in the question box.
There is one question there already that we can take. Do you want to take it? There is one question there already that we can take, which is how do we distinguish the original 2.0 spec versus 2.0 with ECN? Will the spec version be revised to be 2.1? Mahesh, do you want to take that?
I was hoping you would take that. But yeah, there's no plan to do a 2.1. I think we have the ECNs posted in the same spot as the 2.0 specification. And when we do the next spec revision, they will be rolled into the next spec revision. But there's no plan to do a 2.1.
So in general, the process with ECN is that once the ECNs have been ratified, they are considered part of the baseline 2.0 specification or whichever specification the ECN has been targeted at. So all of these ECNs that we are talking about today are ECNs to 2.0. So once the ECNs are ratified, they're posted separately. And whenever the document containing the spec is revised, the ECNs are folded into it so that they're not in separate documents anymore.
We do have a question here about, is there any support for OSX Linux for IDE?
So I would say a lot of the links work is happening in the open source community. I'm not aware of any code that's being upstream right now or perhaps that's been delivered that enables IDE, but I'm expecting that will happen pretty soon.As we start seeing devices, designs that take advantage of IDE.
One additional question about where can we find the compliance tests?
The CXL spec has a chapter, chapter 14 that lists various compliance tests and these ECN that we talked about sort of complement that, right? So the test themselves, I'd like to find what the link is, but the content of what they plan to test is in the CXL spec itself.Ishwar, do you know where the compliance tests themselves are posted?
I believe some of the content is under the compliance work group. They have some hyperlinks, but the content of the test is really spelled out in the chapters as you correctly said.
And the latest version of the 2.0 with the ECNs are available on the CXL consortium website, so it's computeexpresslink.org. When you go to download the specification page, you will be able to download all this 2.0 specification with all the ECNs.
Okay, so once again, we would like to thank you for attending the CXL consortium's overview of the Compute Express Link 2.0 ECN webinar. Thank you, good day.