269


Good morning and good afternoon. Thank you for attending the CXL Consortium, an overview  of the Compute Express Link CXL 2.0 ECN webinar. Today's webinar will be led by CXL Consortium  Technical Task Force co-chair Ishwar Agarwal and Software and Systems Workgroup co-chair  Mahesh Natu. Now I will hand it off to Ishwar to begin the webinar.

Good morning, good afternoon. Hi, my name is Ishwar Agarwal. I'm the CXL Technical Task  Force co-chair. My co-presenter today is going to be Mahesh Natu, the CXL Systems and Software  Workgroup co-chair. So this particular webinar is about the CXL Consortium's 2.0 specification  ECNs. As you guys probably may be aware, CXL Consortium released the 2.0 specification,  which introduced support for switching, memory pooling, and persistent memory, all while  preserving industry investments in supporting full backward compatibility back in November  2020. And since then, based on member feedback, the consortium has made a number of significant  improvements to the specifications in the areas of device management, RAS, security,  memory interleaving, and others. And this particular webinar is going to review the  key CXL 2.0 specification ECNs and the new usages that they enable. Okay, so with that  little introduction, I'll go ahead and get started.

So a brief introduction to where we are as an industry, right? So the way we think about  it is that there was this era of cloud computing proliferation, and that is what I think about  as generation one. And we are very much past that. It was very much accelerated with the  pandemic. And we are very much in gen two, which is the growth of AI and analytics. And  we are very shortly coming up on gen three, which is the ultimate cloudification of the  network and edge. And with this kind of a trajectory that our industry is on, we have  a number of new challenges as well as a number of new opportunities to scale to the next  frontier of technology evolution. And that's really where CXL position itself to be.

So what is CXL? CXL is a consortium, which is defining an open standard for high speed  communication. It's an open standard with 170 plus member companies today. It has most  of the technology leaders that we know and recognize, some of whom are mentioned on this  particular slide, but a whole lot of others as well. And it's a very vibrant ecosystem  with very active participation in a number of technical work groups in the CXL consortium  are introducing new features, new usages that are enabled by this new open standard  for high speed communication.

So quick introduction to what CXL really is. CXL addresses a number of challenges that  our industry today faces. And that is the challenge for driving demand for faster data  processing, especially in the context of next generation data center performance. It addresses  the challenges associated with heterogeneous computing and server disaggregation. Another  very, very key and very fundamental usage of CXL is in the arena of memory, specifically  memory bandwidth and memory capacity. And all of these translate into this need for  a next generation interconnect that stitches together all of these different usages, these  different challenges into one coherent standard that can actually help scale all of these  different opportunities. So CXL defines an open standard cache coherent interconnect  that can connect CPUs, memory expanders, and accelerators. It leverages PCIe as the base  and introduces three mix and match protocols that provide low latency access to cache and  memory as well as removes some of the complexity associated with traditional coherency and  memory management.

So when we think about a CXL device, we really think about a CXL device in the context of  these three separate usages. These are essentially at a high level of way of binning these devices  into the three nomenclatures that we use within the CXL consortium. These nomenclatures refer  to devices as type one devices, type two devices, or type three devices. The type one device  is a device which is an accelerator with coherent caches. And this is a type of accelerator  where key usages are things such as partition global address space mix or devices that need  to do remote atomics to host attached memory. In this context, the device uses CXL.io protocol  and CXL.cache protocol. The second type of device is the type two device, which is a type of device which has  coherent memory on the device itself. And typical usages for such types of devices are  GPGPUs, FPGA, any type of accelerator that wants to do dense computation. And the protocols  that this type of device uses are CXL.io, CXL.cache, and CXL.mem. And the third type of device, which is last but certainly not the least, is type three  device, which are really memory buffers. Now, memory buffers are really a way of extending  system memory for either extending system memory bandwidth or system memory capacity  or system memory capability. For example, adding new types of media or media with different  characteristics such as persistence. And this particular type of device is CXL.io, uses  CXL.io and uses CXL.mem.

So when we talked about CXL and CXL 2.0, it enabled a number of these very advanced, very  cool new usages with CXL, including pooling, switching, etc. But as we talked through the  work groups, the member companies brought forth a number of different usages, a number  of different features that they would like to enable with CXL 2.0. And to address that,  CXL actually has a very robust ECN process. ECN in this particular context stands for  engineering change notification. And what this is, is a process to make changes to a  release specification through a review and approval process where these changes then  get directly integrated into the baseline specification. And this flowchart briefly  outlines that process. The key takeaway from this whole flowchart is that the ECN is something  which is a change proposal that can be brought forth by any member company or one of the  board of director companies. It goes through a number of review steps, including through  the technical work groups. It has an overall review process through an overarching body  called the technical task force. Once it goes through all of the review through the technical  work groups and technical task force, it's sent to the board of directors for approval. And once the board of directors has approved it, there is a review period for IP rights  review as well as member review. And only after all of these thorough due diligence  is done is an ECN then ratified and made part of the CXL specification. This robust  process ensures that through all the steps of the ECN process, there is oversight as  well as review and we are able to take into account member feedback. Okay. So with that  process having been set, now what we will go through next are a number of different  key ECNs that were brought forth with CXL 2.0 to enable different types of usages. 

So  the first ECN that we're going to talk about is called CXL error isolation. Now, error  isolation is a particularly interesting and important topic for CXL. And the context over  here is how do we make this standard a more robust standard, especially in the event of  unforeseen failures. As we in the technology industry recognize failures are inevitable  and especially on the data center scale. This is something that we have to live with, but  there needs to be a way to be able to contain failures to as small a granularity as possible  in order to impact the least number of tenants that co-reside on a machine. And this particular  ECN addresses exactly that problem statement. So the error isolation ECN describes and defines  a way for graceful handling of uncorrectable errors at or below a CXL root port. The context  here is the CXL specification 1.0 and 2.0 had two mechanisms already defined for error  containment and they were data poisoning and viral indications. However, through the review  and feedback process, we found that the poisoning and viral indications did not cover all the  usages that we would like to guard against in terms of error containment. Specifically,  it did not cover error types, which put into things to events such as surprise link down. Surprise link down could be for any reason. For example, let's say you have a memory buffer  connected in a data center through a cable and accidentally somebody kicks the cable  out. Or if the device encounters a device specific fatal error, what happens? Or if  the device has, let's say there's a pooling device which has a security violation detected  at a CXL memory controller, what happens? In the baseline specification, unfortunately,  such errors may have propagated up all the way up to the CXL host bridge and that would  have led to a full system crash. The reason for a full system crash is because CXL is  a coherent interconnect and it's very hard to deal with errors that happen in a coherent  system because the containment is very hard to guarantee. However, CXL error isolation  allows for such errors to happen on CXL.cache protocol and CXL.mem protocol and actually  defines a way where such errors can be contained at the CXL root port and not allow the errors  to propagate throughout the system. It also defines a mechanism for software to opportunistically  recover from such an error without a full system reset.

So the solution that the consortium came up with was a mechanism where error isolation  would get triggered at a CXL root port in the event that a request would timeout or  if the link goes down. So CXL.mem transaction is going to be tracked at the CXL root port  if the root port implements CXL error isolation. And let's say a CXL memory request or request  with data times out, the root port would trigger CXL error isolation as an example. And once  the error isolation is triggered, what the CXL root port then does is that it synthesizes  responses for all new as well as pending requests on CXL.mem. And CXL.mem is just an example,  the same thing happens on CXL.cache. And when we talked about CXL.mem transactions being  handled gracefully by the root port, what that really means is that all the reads that  were pending as well as new reads get synchronous error responses returned and the writes get  dropped and completed. That's very important because what that does is that it allows the  rest of the system upstream from the CXL root port to essentially continue functioning even  though the root port and all the hierarchy below that root port has died. It also allows  other CXL root ports that are co-resident on the same physical machine to also be functioning  and not be impacted by an error that has occurred on a device below a different root port. So once we have handled the error at the root port, we also have mechanisms defined for  software to investigate and log the, we have a mechanism for logging the errors so that  software can come and investigate what exactly happened. We also have mechanisms defined  for software to then trigger a link reset and a bus reinitialization so that it can  try to recover from such an error. The other nice thing about the way this particular ECN  was defined is that this ECN only impacts the host. That means all of these mechanisms  can be implemented inside the CPU and it does not need any change in the CXL switch or in  the CXL device. At this point, I'm going to ask Mahesh to take over and walk us through some of the  other ECNs. Mahesh, do you want to take over from here?

Hey, thanks Ishwar and thanks for painting the big picture of what CXL is and the ECN  process. That definitely helps with the technical details that I'm going to go into starting  the next slide.

 So actually before we get into my material, there was one question on the chat that I  wonder if you want to address. I think the feedback is the flowchart is not readable  and I believe they're referring to the ECN flowchart.

Okay. Yeah. Thank you for that particular question. We can try and make that flowchart  available in a more readable format after the webinar.

Okay. So there's no other questions right now for your material. So I'll jump into memory  interleaving and some other topics. So as you can see from the right hand picture, a  system can have a number of CXL devices and as part of the CXL 2.0 spec work, we defined  the ability to interleave memory devices. It's a pretty common capability that's available  with, let's say, native DDR attached memory where the different DDR DIMMs are interleaved  together to improve performance. So as part of the 2.0 spec, we defined methods to implement two, four and eight way interleaving  options. We didn't really want to go overboard and define too many options to begin with. And we thought we'll add new options as needed. As Ishwar mentioned, there's just a lot of interest  in the type three and memory expansion usage model that we witnessed. Just phenomenal interest. So we realized after we did in the 2.0 specification that this particular area needs to be beefed  up a little bit. And that's why this is came about. So this is an adds the options to allow  three, six, 12 and 16 way interleaving capabilities, which means we can take up to 12 or 16 CXL  devices and make them part of a single interleave stack. Again, the key reasons were these options  are today available with native DDR. And so when CXL is used to expand the DDR capability  or the native memory bandwidth or capacity, it makes sense for CXL to have similar capabilities. And a lot of times also we noticed that system designers want to match memory capacity with  compute capabilities such as number of cores or maybe other aspects, the I/O channels. And going from four to eight was a big jump for many of them given the cost of the memory  devices as well as the number of I/O lanes that they consume. So the six way was considered  a sweet spot between four and eight that folks were interested in. The way this is defined  was in a way so there's no impact to switches and only impacts the devices and the host. It's easier for the ecosystem to swallow this change as an ECN. Unfortunately, the three,  six and 12 way interleaving map requires doing a three way map, which means that the host  have to implement more three like function to select one out of three or six or 12 targets. Previously with the two, four and eight way, life was simple. Host could just look at one,  two or three address bits and select the target. With the three, six and 12, we had the added  complexity of doing a mod three, which is extra work and not super trivial. The device  also needs to do the matching by computation, which is like a divide by three operation,  which divided by three, six or 12 on the HPA to compute the local address. So again, affects  both hosts and devices. And in order to limit the complexity, scope and some other aliasing  possibilities when you're mixing a two way and three way math together, we also limited  some combination that are supported. For example, this ECN describes what interleave options  are available at different levels and how they can be combined to get either three,  six, 12 or 16 way interleaved device option. The base requirement that we had previously  in that spec, namely you must use contiguous address bits to select interleave, that remains  in place. So I mean, this is probably best illustrated with the example. So on the right  hand side, I show an example where the host has six root ports. Each root port is attached  to a CXL type three device. And all the six devices are now interleaved together as a  single interleaved set. Again, I'm not showing six devices, but you get the idea at the bottom. The way this is done is that the host specific logic, all the way to talk, implements a three  way interleave, let's say at a one kilobyte granularity. This is affecting the address  range, 16 to 22 terabytes. So remember there's six terabytes of address range now, 16 to  22. Each device is one terabyte in size. And so six devices total will give you six times  one, six terabytes of address space. So the host specific logic is going to pick one out  of three host bridges at one granularity using the mod three math I talked about. When it  comes to a CXL host bridge, each of them has two root ports. So it's going to use the standard  two way math to calculate which of the two root ports the transaction should go to. So  it's going to either go to the left one or the right one in each case. So that gives  you three times two six way interleaving. And when the device gets the address, it will  see effectively a six way interleave at a 512 byte granularity, which means every 512  byte address, chunk is going to go to a different device. So 0 to 512 will go to the first device,  the next 512 byte will go to the second device, and so on and so forth. And after the six  devices are completed, it will go back to the device, first device. Standard interleaving.

Then I will show a little bit more complicated picture. This is an example of 12 way interleave  and involves multiple levels. So again, at the host logic, we're doing a three way interleave  at 1k, just like we did previously. Same address range, 16 to 22 terabytes. In this case, each  device is half a terabyte in capacity. So half a terabyte times 12 gives you a total  of six terabytes for the address range. Again, at the CXL host bridge, the host bridge will  select one out of two root ports, same logic as before. The additional step is that each  root port sees a switch below it. So there's a total of six switches that are not shown  in the picture, but you can imagine. And each switch now gets the same address range, 16  to 22 terabytes, and it's going to do a two way selection, meaning select one of the two  downstream switch ports based on one of the address bits, which happens to be two 50 bytes. So that will be address bit eight and select the device below. So again, three times two  times two, and you end up with a 12 way interleave. And the device, again, sees a 12 way interleave  at 250 bytes per annularity. And it's going to divide by 12 math effectively to figure  out the local address, because it's getting one out of every 12, 250 byte chunk. So I think hopefully these pictures clarify how this works better than the language.

The other interesting one, again, a good example of how we were able to get member feedback  almost real time and react to it. So just the background, if you look at the CXL 2.0  spec, it states that a 2.0 device is required to interop with the CXL 1.1 host. And the  addressing scheme for the device changed when we went from a 1.1 spec to 2.0 spec for scalability  and addition of switches. So in the 2.0 connection scenario, the picture on the left shows how  the switch register layout looks like. Again, it's showing a simple device with a single  PCI function for simplicity. And thanks for the feedback. And so the registers look pretty  much like a standard PCI register. So you have the config space here, as BAR, that points  to the cache name registers. And cache name registers have things like the as link and  html, standard stuff, defined in CXL 2.0 spec. The 2.0 spec required that when the same device  is connected to a 1.1 host, the register layout looks quite different. If you remember, 1.1  devices have the notion of this memory mapped RCRB space in the downstream port. And that  essentially points to the cache name registers. And that points to all the other as link and  html registers together. And the config space only has BAR and some DVSEC registers. So if  you look at the left versus right, some registers that appear in the config space here, now  they need to be sent up into the memory space. And that morphing, the feedback from the members  that are implementing this, the IP vendors or the IS vendors, was that this was particularly  tricky. This was possible, but was particularly tricky and error prone. So what we did with  that is we simplified the morphing to be very, very simple. So this ECN call 1.1 mode operation  without RCRB essentially eliminated the requirement that the 2.0 device implement RCRB in order  to interoperate with the 1.1 host. So if you look at what the ECN does, it allows the device  to maintain the same address map that it had here on the 2.0 mode and map it the same way  here. What we added was the device needs to be able to return all 1s when software attempts  to read the RCRB space. That is how software determines that this device is a 1.1 device  but is implementing the ECN. And it knows to look for all these other registers in the  config space, like a 2.0 device. Again, this was a simplification for the device designers. Required a change to the software, but we were able to intercept this in a timely manner  before a lot of software was written. So this was made possible. If you had waited a couple  of years or a year for this, there would have been a lot of legacy software around that  assumed the 2.0 spec behavior, and this would not have been possible. So it's really important  for us to be able to react to member feedback in real time and provide a solution as soon  as possible.

 The other ECN.
 
Mahesh, there were some questions on the bridge. Do you want to take them now or at the end?

Sure. So let me take them. So I can read them out and you can see them. So, OK, one of the  questions was does no RCRB requirement also mean no RCEC requirement? The answer for that  is no, that's not what it means. So RCEC is required if the device is exposed as an  RCIEP, a root compass integrated endpoint. And in this picture, that aspect hasn't changed. So this device on the right hand side needs to expose itself as an RCIEP. So this link  is still not visible to the legacy software because the downstream port above still has  the same layout as 1.1. Therefore, you still need to host implement on RCEC and the errors  from this device always go to the RCEC. There's no difference that way. Hope that answered the question. If that didn't answer, then please post a follow up question  and I'll try to answer that. And the second one is on the previous slide. So the three-way interleave. Yeah, I think the question is, is the three-way interleave  only supported at the proprietary cross-host bridge level or is it supported at each level  host bridge USB device? So since we didn't want to affect switches and we didn't want  to just to do the awkward three-way mod operation that does add latency and complexity, we'll  limited the three-way math to only the host in a host proprietary logic and the device. So there is no support for three-way interleaving at the USB, for example, right? But obviously  there's three-way interleaving support in the device in order for the device to translate  the host physical address to device physical address. Hopefully that answered the question. If not, please repost. So let me continue with the next ECN. Again, this is also a very interesting one. So as  we saw, there was just a lot of interest in type two devices. And as we saw a number of  customers looking into this option, a few things became very clear that in order for  ecosystem to use these devices effectively, they needed a standard based and a symmetric  in-band and out-of-band configuration interface. So in-band interface is important for OS to  manage device. Out-of-band interface is commonly used by the baseboard management controller  to manage the device when OS is not running or in a manner that's OS independent. And  both are sort of standard expectations in the data center these days. It is also important  for these devices to be able to do time to market, which means they needed some simpler  design options. And there was also a lot of interest in the ecosystem to be able to take  the current BMC firmware designs and being able to quickly spin those to take advantage  of and manage the type two devices. So when we looked at this feedback, we realized that  we needed to implement a way by which these type two devices could be managed out-of-band. So I mean, if you're familiar with the CXL.2.0 spec work, we did define an interface for  in-band management of the device. That is in chapter eight of the CXL 2.0 spec, pretty  extensive details as to how OS can manage devices. It does things like media management  for different medias, handles memory errors, update of the firmware, data security, just  to name a few things. So it's a pretty extensive set of capabilities that OS can use to manage  the device. So in order to meet these needs, what we did is we did something very simple. We decided to map the same messages, the same command set that OS had over to out-of-band  interface and the most industry wide use transport today is MCTP. So we essentially mapped the  same messages over to MCTP. So allowing BMC to sort of issue the same commands using MCTP  transport without having to redefine the message type. So if you look at the picture on the  right, what we had originally was things on the left. This is what the 2.0 spec had. We  had a type three device using CXL.io. We defined a mailbox interface that allowed the  OS to send and receive messages to the device for things like managing the media and the  type three OS driver, a generic driver could do that and use those messages and the mailboxes  to manage these devices. What we added was for the BMC or Baseboard Management Controller,  which is generally typically a microcontroller on the motherboard that is handling all of  the out-of-band OS independent management to be able to send the same messages over  MCTP. And MCTP is a DMTF standard and they are our, we collaborate very closely with  DMTF on a number of items and this is one good example of collaboration between these  two standards bodies. So by wrapping these over to MCTP, MCTP messages can be sent over  different interconnects including PCI Express, SMBus. So we get all of those options pretty  much for free, allowing the BMC to use existing methods to manage the type three devices in  the system. So I think that was a pretty efficient way to address the problem and the industry  came together, the member companies came together to build a solution in a relatively quick  time frame.

So moving forward, so this was another interesting one. So we, in part of  the 2.0 specification, we use the feature called IDE and IDE stands for Integrity and  Data Encryption and allows the CXL.cache mem traffic that's flowing on the link to be encrypted  and integrity protected, which means that any modifications to that can be detected  and the data cannot be snooped by, let's say, any interposer that are trying to observe  the bus traffic and try to potentially steal user data. So IDE will guard against that. What we could not do, again, just because of the time limit, is define how the software  or the firmware will configure IDE. As you can imagine, in order to encrypt and integrity  protect, both sides need a set of symmetric keys in order to process the data and encrypt  it. So we didn't really have the flow defined as to how that is done. So this ECN sort of  defines that by adding a set of messages. We call them CXL_IDE_KM and these messages  are sent by the host software. It could be, or it could be BMC, for example, writing something  I use to a device that supports IDE. And these requests are used to provision the keys, configure  the keys, and then finally ask the component to either start the IDE session or tear down  an existing IDE session. So all of that management of the session is performed through these  messages. Again, the request and responses are protected. Again, it doesn't help you  a lot if the keys are sent in plain text, because who can observe the keys can also  observe the traffic that's going to be encrypted in those keys. So these messages are protected  as well using, again, another DMTF standard called Expedia. So on the right-hand side,  it shows a pretty impressive software stack. But again, it just shows that we are able  to leverage most of the existing standards. So for example, if you look at all of the  boxes that are colored white, those are DMTF standards. The sort of light blue are PCIe  standards. So we're able to leverage all of that. The thing on the right, almost right  hand, is the IDE protocol itself. These are changes to the protocol by encryption. They  were part of the 2.0 spec. And the only piece we added was this new set of messages for  configuring them. Everything else, we were able to leverage the existing industry ecosystem  and just get things started. Again, it is really good to be able to leverage other standards,  because you can leverage the architecture, all the effort that has gone into making sure  these specs and these interfaces are stable and maintained. So I think a lot of across  industry collaboration, across different standards bodies, including DMTF and PCIe, to make this  possible.

So going, the other one, I think most of you are probably familiar that compliance  is a very essential part of any standard. The main purpose in life of a standard is  to make sure components from different vendors can interoperate. And they can only interoperate  if they follow the spec as intended. So compliance tests allow a component to be tested against  the spec requirements and is essentially a key piece of allowing interoperability. So  some of the things that were added in the compliance to us with dedicated ECNs include  things like an option to inject memory errors. So CXL device defined a pretty robust RAS  capabilities, if you went to Poison and Viral earlier. But we didn't really have a way by  which someone could compliance test those and make sure that device behaves the way  spec intends. So by allowing the compliance test to inject errors, like things like errors  in the data region, errors in the metadata areas, or being able to spoof health status  changes, the device could be tested against those requirements. Similarly, there was a  new test being added. So device could be put into viral mode. Again, this was something  that was already part of 2.0 spec, the viral mode, but there wasn't really a robust way  to test that. And some of these things we realized as we go on, as certain features  in the CXL spec are pretty important, but at the same time, it could be hard to get  to right. So that's where the compliance workgroup focused on and identified these areas where  they needed to sort of bolster the current compliance requirements and cover this additional  requirement. There was also changes to how the compliance data object exchange is handled  to provide more scalability. Previously, it wasn't possible to query some of the capabilities. This third ECN, the DOE return value allows the compliance test to query what the device  supports and then can use those to further write other tests. There was also some simplifications  to be made as we went to the current compliance test. We identified certain areas where we  had defined some requirements for the device that were really hard to implement. So some  of this was also allowing the devices to have an easier design option and get to the market  quicker. CXL 2.0 also added a feature called QoS telemetry rather late in the cycle. So  we didn't really have time to go develop the compliance requirements around that. So this  was sort of a diving catch to add and capture that requirement in the compliance test cases. And one thing I want to make sure that the other ECN that Vishwa and I talked about,  they all have relevant compliance requirements added. So for example, when we introduced  the interleaving or the MCTP ECN earlier, relevant sections to the compliance test were  added to make sure these ECNs will be tested as we see devices implementing those ECNs.

Mahesh, there are a couple of questions on the chat window if you want to answer them  now. 

Sure. So I think one, I believe, already answered the question was what is BMC? So  BMC stands for Baseboard Management Controller. That is often used to manage the system in  an OS independent manner. So generally a microcontroller with its own memory and its own firmware  and its own subsystem that's sort of isolated from the OS. And the second question is why  are they indicated as SPDM messages in the IDE establishment ECN? So they actually payload  to the SPDM vendor defined request and response messages. So yeah, this could be a matter  of semantics. So SPDM vendor defined request and response are defined by SPDM specification  DSP 0274 for vendor extension. So that's really what we mean there. If it's too confusing,  again, please elaborate on what specific confusion it's causing and we'd be happy to either provide  clarification either here or through the spec. Thank you. Okay. 

So I think we're coming to  the end. So I want to just summarize a whole bunch of other ECNs. Again, not saying that  these are not important, but we just have a large number of ECNs that it was not practical  for us to go through all of them in the time allowed. So I have a summary of things that  we didn't cover today, maybe at a future time. All of these are also posted on the CXL Consortium  site. We'll be providing the link to it. And so I recommend folks who are interested to  go visit the site, download the ECNs and look at the details and provide feedback if there's  any. So the first one in the list is what we call CEDT, lots of acronyms here. CEDT  is CFMWS and QTG_DSM. So what this is in English is that the CFMWS is an extension to ACPI  tables that allows the software to discover how the host address decoders are programmed. And this allows software to figure out, so going back to this previous picture that we  had. So CFMWS would allow software to, the firmware to notify the software that this  particular address range, 16 to 22 terabytes, is configured in a certain way in this decoder. So let's say at boot time, nothing is plugged in here, but all these show up later, a hot  add it. So the OS software can know this range is configured this way here, and now can assign  addresses to this device that show up later, but it knows what address is going to be targeted  to this range, right? So that's what CFMWS does. The other piece of QTG is just providing  software primitives that allow the CXL 2 QoS telemetry feature to be enabled, right? It  allows the software to figure out which throttling group a particular device address range needs  to be mapped to based on the platform policy. There were a couple of things that we did  to add design flexibility as again, members started to design or think about how to implement  the CXL devices or components. They were looking for areas where the spec was either too restrictive  and causing a complexity design that wasn't really needed, right? So the mailbox ready  time allows devices to take longer than the fixed amount of time that specs say, right? I think specs say two seconds, right? They can take more than two seconds to get ready  again. Some devices may, based on the in-hand microarchitecture or the firmware design,  right, may take longer. So this ECN allowed that option. The null-captured ECN here also  allows some flexibility to allow the component to jump over one of the header entry, right? If they need to, without really having to do complicated shuffling of pointers in the  hardware. So this was also a simplification of the option. The register-located DVSEC  allows vendor extensions. There are certain areas where vendors wanted to innovate and  add their own capabilities, but still leverage some of the base DVSEC structure that CXL  had defined. So that was the located DVSEC ECN. And I think this one, I would like to  cover in detail, but I don't have time for it. It's a very important one, be able to  collect the component state. So this is extremely useful for ascent debug. As you can imagine,  the CXL components, they're all new, right? A lot of new designs, industry is learning. And in order to debug any issues that may be discovered during the bring up or in the  field, it is important to be able to observe the state of the component in the scenario  of failure or behavior that's unexpected, right? So this defines the standard commands  by which a BMC or the host can request a component crash log, meaning if the component were to  fail, it would be able to sort of log some critical information about itself. So someone  outside can figure out what may have happened, right? There's also a request for component  to capture current state, current execution state. This is again useful for debug or bringing  up a new system. So I think the last one is a pretty important step. And I think we found  it's important to standardize this early on. So the devices that are being designed today  can intercept it and not have to invent vendor specific mechanisms for these things. So allow  us for uniformity across the industry and debug practices.

So just summarizing, so I think Ishwar did a pretty good job of going over what CXL consortium  is. And so again, I want to highlight that the momentum is growing. As you can see, we  have more and more members. I think the number of members, we're having a hard time keeping  the count of the member count straight. If you can see the number here is different than  the one that Ishwar showed. This is how added new members over the last week or so looks  like. And we have been very responsive to industry needs. That's really the reason why  I think we are seeing the momentum. As you can see from the ECN description, we are able  to quickly get to the problem that industry needs us to solve. Work through different  work groups, build a proposal, and then come out with a solution really quickly and not  have to wait for the next spin of the spin. We are finding that there is a lot of excitement  in being able to use memory attachment or even personal memory to the CXL devices. We  have pretty robust driver software model. And so the call to action today really is  if you are not, please join the consortium. There's different levels at which you can  join the consortium and contribute to this movement. So again, if you have problems that  need to be solved or if you have solutions that you think you can offer to the consortium,  this is the best way to do that is by joining the consortium and contributing. Now I think we'll have time for Q&A. 

Yes. Thank you, Mahesh. Thank you, Ishwar. So we will now begin the Q&A portion of the webinar. So please do share your questions  in the question box. 

There is one question there already that we  can take. Do you want to take it? There is one question there already that we  can take, which is how do we distinguish the original 2.0 spec versus 2.0 with ECN? Will  the spec version be revised to be 2.1? Mahesh, do you want to take that?

 I was hoping you would take that. But yeah, there's no plan to do a 2.1. I think we have  the ECNs posted in the same spot as the 2.0 specification. And when we do the next spec  revision, they will be rolled into the next spec revision. But there's no plan to do a  2.1. 

So in general, the process with ECN is that  once the ECNs have been ratified, they are considered part of the baseline 2.0 specification  or whichever specification the ECN has been targeted at. So all of these ECNs that we  are talking about today are ECNs to 2.0. So once the ECNs are ratified, they're posted  separately. And whenever the document containing the spec is revised, the ECNs are folded into  it so that they're not in separate documents anymore.
 
We do have a question here about, is there any support for OSX Linux for IDE?

So I would say a lot of the links work is happening in the open source community. I'm not aware of any code that's being upstream right now or  perhaps that's been delivered that enables IDE, but  I'm expecting that will happen pretty soon.As we start seeing devices, designs that take advantage of IDE.

One additional question about where can we find the compliance tests?

The CXL spec has a chapter, chapter 14 that lists various compliance tests and  these ECN that we talked about sort of complement that, right? So the test themselves, I'd like to find what the link is, but  the content of what they plan to test is in the CXL spec itself.Ishwar, do you know where the compliance tests themselves are posted?

I believe some of the content is under the compliance work group. They have some hyperlinks, but the content of the test is really spelled out in  the chapters as you correctly said.

And the latest version of the 2.0 with the ECNs are available on the CXL  consortium website, so it's computeexpresslink.org. When you go to download the specification page, you will be able to download  all this 2.0 specification with all the ECNs.

Okay, so once again, we would like to thank you for attending the CXL  consortium's overview of the Compute Express Link 2.0 ECN webinar. Thank you, good day.