-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path304
132 lines (66 loc) · 41.7 KB
/
304
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
Hello, welcome. Good morning, good afternoon—wherever we are in the world. So, I am Kurt Lender. I would like to welcome you to the Q1 CXL webinar. We are going to take a look at the CXL ecosystem and some evolution of use cases. We're gonna try something different, also. I am co-chairman of the marketing work group. Kurtis Bowman will come on in a few slides and set up and talk about some of these cases and demos. The key here is, we are going to show demos that we did in SC22 at the end of last year. And we want to show that CXL, in three short years or so since its conception, has gone from sort of PowerPoint and specifications to real hardware. And you can actually start developments and go from there. So, welcome, and let me start.
You should be seeing slides now; I hope. The other thing is, we actually will be answering questions at the end, so you can submit questions. We will actually have panelists from about nine of the 12 different companies, too. So again, if you have questions about CXL or the demos, please submit those. So let's talk about the organization real quickly. We continue to be a vibrant organization. We are represented by all the CPU manufacturers, by the large cloud providers, OEMs. We have memory vendors in here. The one thing, if you notice, if you compare this to like last year, we actually have a few floating at-large contributor seats. They rotate, I think, on a two-year basis. So that's where some of the differences are when you look at the board. CXL is up to 240 members, and that's been growing at a very steady clip. And again, I expect that to continue to grow as we migrate from sort of adding card providers into system type companies as you'll see later in the demos. The other thing I'll mention is again, we continue to have about six different technical work groups that you can join if you join the consortium as a contributor. Contributors are about 10K, and they get to participate in the work groups. And then the last are adopters. Adopters basically are zero dollars. You get the IP rights. So again, please look at joining CXL. And again, contributors can actually contribute to the specification moving forward.
The other thing about CXL that is key, in the sense of being a focal point in the industry, is this: if you think back about three years ago, there wasn't a coherent I/O standard, and there were a few that were emerging and starting to be looked at by the industry. CXL has emerged as the focal point. Last year, we had Gen Z, early in the year, give their assets to CXL. Late in the year, OpenCAPI did the same. So, again, if you're looking at, and wondering what will be the coherent interface moving forward, it is CXL. Those specifications—meaning Gen Z and OpenCAPI—will be available on the CXL website for a while, but expect a lot of that work to be incorporated into CXL. And as we talk about CXL 3.0, you'll see that fabric and different things like that have been incorporated in why this merger, or sort of consolidation, I'll say, actually physically happened.
The other thing that CXL is doing is reaching out to other organizations. We do have some technical liaisons going on; this one is more in the marketing sense. So, DMTF for manageability-type standards, SNIA for some of the storage-type work, and PCI SIG, with whom we're obviously locked hand in hand, since we layer on top of them. We will be doing different marketing and expect to see more co-marketing here as we get into the solution space and start pulling in a lot of these different consortiums and other SIGs into the industry. So again, our reach is expanding, and you'll start seeing us do co-marketing with these organizations.
Real quickly, I'm not going to go into a lot of detail about what CXL is. That's not the intent of this presentation. I hope you know what it is already. If you don't, go to our website. This is basically just the one slide on the challenges that we are facing and still are facing in the industry. You know, the disaggregation of the servers is starting to happen. We need increased memory capacity and bandwidth. You'll see these as we talk about CXL. And then, on the right side, really, what CXL is—it's three different, mix-and-match protocols. We layer them on top of PCIe, right now PCI5, and use all that ecosystem in the sense of electricals. And then, the real key to CXL is latency, low latency. You'll hear us continue to say that. We set it for 2.0. We set it for 3.0. We'll continue to drive that. And then, the other key here is that we will be doing backward compatibility, much like PCIe. So again, you'll be able to migrate your solutions forward in that sense.
So, real quickly, I'm going to build this out. CXL 1.1 really was designed for single-node, in-node type computing. CXL 2.0 added switching and sort of expanded the reach of CXL. And, it also added the concept of pooling, where different devices could be placed into different segments. Those segments could be attached to different hosts. Sharing—and Kurtis will go into that a little bit more—but again, it expanded the reach of CXL.
And then, at FMS at the end of last year, or in the second half of last year, we announced CXL 3.0. CXL 3.0 introduced fabric-type connections, now allowing for connections of over 4,000 devices. And again, that's the growth of CXL.
And then, lastly, again, I'm not going to go through each of these. If you want to do your comparison of the versions, this is that checklist of some of the major features. A lot of what you're going to see today with the demos are centered around CXL 1.1 and 2.0. We're up to single-level switching. Again, 3.0 was just announced. That will be a couple of years out in the normal sort of cadence of specification, and then work, and devices coming from the ecosystem. So, if you want more of these, there are webinars on 1.1, 2.0, 3.0, and I'm sure we'll be doing more technical deep dives as we move forward. So, with that said, I'm going to hand the floor over to Kurtis, my co-chair, so he can talk about the demos and get to the palace.
Hey, thank you, Kurt. Really appreciate that. As he mentioned, my name is Kurtis Bowman. I work for AMD and co-chair the marketing work group with him. What I want to do is really go over the ecosystem and how we've grown. So as Kurt mentioned, in a short period of time, we've been able to really increase what CXL can do. And we're able to have our members now showing that off.
Kind of the key here is, "What's the ecosystem of CXL?" And what you can see in this simple slide is, we want to allow CXL the ability to touch many different areas. So, you see in the top kind of the compute elements: GPUs, CPUs, accelerators, FPGAs. Then we get to people who provide IP, and you'll see some of that in our demos. One of the key pieces right now is memory expanders, and you'll see that through our demos. Switches, as Kurt mentioned, as we grow the ecosystem, we want to move to the switches. In order to do the proper work to debug and to analyze our solutions, we need those analyzers, traffic generators. We need the software solutions, and then we need the memory itself. And we've been able to put all that together, and we'll show you that; we have the people from the companies who've done the work here to talk about that. And that's where we really encourage questions to come in. Some of the things that you see here on the right-hand side that's really important to us is full backwards compatibility. So, that if you invest in the early CXL, the 1.1, 2.0 that Kurt showed, it's also compatible with 3.0 in forward-looking specs. Our goal is to help reduce the overall cost of the system, and then make sure that everything works well together. So, we have a nice compliance program that allows vendors to test against each other. And we actually had our first version of that take place middle of January, so just a week ago.
And then, as you talk about the use cases, we start off on the left-hand side, where it's fairly simple. You've got a host, and then you want to add more memory to it. And so, you're able to add memory. And when you do that, you can increase your memory capacity, if that's what's important to you, but you also increase the memory bandwidth going into that host. In a lot of cases, we see the number of cores increasing, or number of computer elements increasing, and this gives you the ability to feed the beast a little bit better with those. Another opportunity with that, as we kind of go from the crawl to the walk, is being able to look at different types of memory than what your host would maybe natively use. So, if you look at current CPUs that have been announced, they come out with DDR5. DDR5 is a fantastic memory, but it has a higher price tag on it than some of its other counterparts, DDR4 or others. And so, you could actually use CXL to expand your memory with a lower cost memory type that gives you sufficient performance. And so, you improve your memory bandwidth, you lower your dollars per gigabyte on your memory, and as we continue to see here, we help to lower that total cost of ownership. Then, as we move out into what's next, you start to look into the CXL 2.0 feature set, and that's where you start to look at these expanders that you see in this middle block where we can go to pooled memory. And we get to 3.0, Kurt mentioned, we could go to shared memory. And I'll talk about those on the next slide. What it does is starts to reduce your stranded memory. That's kind of the key; now I can start to move memory around between systems. I can really improve my whole memory utilization and continue to lower the cost of ownership. And then finally, if you think about where our vision is right now with the 3.0 spec, you can see multiple hosts working through a fabric environment, which we call the CXL switch here, and then getting to multiple devices on the other side, which those devices, if they're memory, allow for both pooling and sharing. But they may also be another type of device, an accelerator, for example, that could be put on this and start to be shared across multiple hosts or assigned to a particular host at any given time.
And the graphic is a little funky here, but let me try and explain it anyway. The top is your hosts. So, think of them as CPUs in a standard server. And while the colors have shifted a little bit on the top, what we're trying to show there is you can have areas that are shared across CPUs. So, that blue S1 is in host one as well as host three. And then, it's also in what we call host pound. And so, it's shared across multiple systems. And then, you've got the more salmon color—pink color—in host one and host two. When you look down into the memory that's below it, you can start to see that those are shared. So, there's one S2 copy in device one. There's one S3 copy in device pound. Those are shared, and it's coherently shared among those processors that are in that coherency domain. There are other opportunities that you see with the solid colors in the devices. So, purple, you see it in device two, three, four, and pound. That is all memory that's assigned to host three. As compared to, say, maybe the host one has got dedicated memory in device one and device four. And so, it allows you to assign memory. It's just for that system. It's secure in that the host two cannot get the memory that's assigned to host one. And so, that's the power here: I can allocate that memory to a particular host and increase its capacity. I can increase its bandwidth by going across multiple devices. And additionally, if later on I don't need that much memory, I can deallocate it from host one. I can allocate it to one of the other hosts. And if you have questions there, feel free to type those into the box, and we'll answer those as we get toward the end.
Now, let's get to the demos. We had a lot of CXL memory solutions that we showed off at SuperCompute. AMD had a demo that showed you can use your CXL memory in the exact same way you use your directly attached memory to do confidential containers. Asterilab showed expansion using their silicon in multiple systems. Elastic Cloud had the rack memory pooling, so that they could share memory across multiple servers. Microchip had a great solution of being able to share—or, excuse me, being able to expand your memory through CXL. Rambus did the same thing. And then Samsung took that expanded memory that they got with CXL and showed how they could improve their AI/ML application because they had the higher capacity memory. And then, UnifabriX had a smart node that they were able to show off; they demonstrated how they could use CXL to expand their memory as well.
And then, other demos that were shown was—Intel had a demo using what we call a Type II device, or an accelerator. And they were able to generate some traffic with that, and use memory on both sides of the CPU. Synopsys came in and had a demonstration where they showed their IP. And Teledyne LeCroy was there with their analyzer; and so, they showed off why you need that traffic generator analyzer to be able to test the IP. Inteliprop was on board showing a fabric manager. And then, Xconn came in with a pooling solution. And MemVerge showed off some software that they had, which allows you to really manage an environment where you have this disaggregated memory that you want to be able to attach to various devices. So, I gave the kind of a high level. We have people on board from each of these that can give you a little bit more detail.
And then, the last thing I wanted to talk about was the fact that we really are improving the ecosystem, as I mentioned earlier. If you go back a year to SuperCompute 21, there were five live demos that we were able to show. And then we came to this year, last November. We had 12 vendors, all squished inside a nice 20 by 20 box. We didn't expect that large a showing, but we were very pleased to see the membership taking initiative in driving solutions. We were very fortunate that HPC Wire came through, looked at the solutions that were there, and their editors voted on who's got the best types of devices at the show. And the editors decided that we had earned the best interconnect product for all of SuperCompute—a very prestigious award, very pleased that they picked us. And then, as we go through these demos, we're not actually going to be able to show you the demos, but we do have this link where you can go and look at the videos for the demos and get a better idea of exactly what they were doing in those demos.
So now, it's time to move on to our Q&A. Our first set of panelists will be joining us here shortly. Let me make sure that they're on. It looks like we've got the group here, and we've got some questions. So, let's start with this group of people and their questions. Let's see.
So, the first question that I see here is in the example we showed: Is interleaving across multiple devices allowed? If so, is interleaving across devices allowed in a shared memory region that spans multiple devices? And so, yes, interleaving is allowed across those multiple devices, and it's really up to the memory controller provider to decide how they're going to do that interleave.
OK, there's another one. I'll ask, I guess, someone in the memory space—maybe Esther. It says the question is, "Assigned memory is statically assigned, correct?"
Yeah. So, it's a shared memory, and it's a pooling concept. So, in the pooling and sharing concept, it doesn't necessarily need to be statically assigned. It can be dynamic in the sense that when you have a part of the address range of the memory allocated, once a processor is done using it, it can dynamically deallocate and allow the other processor to avoid memory stranding kind of issues. Herein, if you have statically allocated memory, then you're really not using it fully; you're not utilizing the memory very well. Memory utilization across different processors really comes into the picture when you're able to dynamically allocate and deallocate memory, which is really what CXL does with its coherency aspects.
Thanks, Sandeep. I have another one: I'll ask Samsung. Actually, can a CXL device retain its persistent nature? They use an example like Optane DC if it is used as a memory expander.
Yeah, so this is Kapil from Samsung. So, as we know, CXL, when the protocol was designed, was made agnostic to the underlying memory technology. So, yes, a CXL device can, in fact, have a memory type which is persistent, which can retain the data. Having said that, Samsung, particularly, has worked on some solutions. We call it memory semantic SSD, which is a CXL memory expander using SSD technology, which acts as persistent memory.
Thanks. I see one here that might be good for Elastic Cloud to answer. George, it says, "Can shared memory programs use CXL memory and work across systems?"
So yes, that is one of the targets and the intent of the CXL memory. One of the targets and the intent of 3.0, right? With the ability to have the various devices that are connected, be able to exchange data and exchange information. A lot of the protocol that will be used to do those types of transfers is currently in development and in discussion in the consortium. But, the expectation is that there will be a means for these devices to access the same memory, and also to share that memory across multiple devices. So, it is coming as part of 3.0, and we're trying to enable as much of that as we can as we look at the 1.1 and the 2.0 solutions as well.
Thank you. I'm just gonna throw this one out. I'm not sure who would answer it. Is there work in the Linux community for tiered memory scenarios and software-defined memory use cases?
So, I can take that. This is Danny from UnifabriX. So, yeah, there's a lot of work around CXL being done at the Linux community and being actively pushed by multiple vendors and core developers—multiple vendors and consortium members. And, some of it is already being pushed to the recent kernel releases. And, some of it can be actually actively viewed in the relevant branches. So, the short answer is yes. And, the longer answer is there's still more work to come, a lot of work.
Thank you. Let's see. Rita, let me ask you the question that just came in: It seems that all the current demos, CXL 1.1, 2.0, and current use cases are all dependent on some form of controller, bridge, switch, or endpoint device. Will any of these functions be available natively across CXL from the host platform without adding another chip? So, my takeaway is, what do you need in the path between your host and a CXL memory, if anything?
Yeah, so I can take the example of the AMD demonstration that we used in the C22 and talk about the feature that was actually natively implemented on the CPU. So, what we demonstrated in the AMD solution is the security support with CXL. The security features that we have, we support on the direct-attached memory extend to the CXL memory too, which means that we can run the SEV, the secure encrypted virtualization, the confidential containers on the CXL-attached memory, irrespective of the device implementation that's natively available on the CPU. And so, this implementation becomes more independent of the device side, and there's nothing needed in the switch controller for this. So, this is one of the examples of having that native support. And, as we continue looking at the CXL memory use cases, the expansion is, of course, when you're looking at the CXL-attached memory being viewed as a system memory, and then it can be used in form of tiered memory. There are going to be certain optimizations coming into the software framework and in the CPU as well, which would help enhance these use cases across the devices. And, I would invite others on the panel to add on to examples if they would like to.
All right. Let's see—the couple of questions that I guess I'd like to ask for UnifabriX. Danny, how challenging is it to showcase that end-to-end real-life use case scenario in front of a live audience, like you did at SC22?
Well, basically, we've been working on our offering for quite a long time, and we work together with terrific partners. So, right from day one, we decided that we will be workload agnostic and move the complexity of the optimization to our engines. And that approach actually served us well when we decided to exit stealth at SC22. We simply had to decide on the workload and then basically take care of the graphic user interface part. And the audience reaction was simply amazing. So, we had literally hundreds of visitors at our stand, and seeing a live working system running familiar workloads really made the difference. At one point, I actually had a professor from MIT that visited our stand, together with some of his students. And he actually took over in the middle of my explanation, started explaining to his students how this demonstrates the effects of memory bandwidth bottlenecks, and how to solve it. So, we also got a lot of follow-up requests for both media and customers, which actually shows how thirsty the market is for CXL-based solutions.
Thanks. I'm gonna ask Elastics.cloud. George, you demonstrated rack-scale memory pooling at SC22. Can you tell us more about that and some of the use cases?
Absolutely. So, I mean, we were working with a number of companies over the last year or so that are looking at, you know, being able to expand memory for these large database-type workloads. Everywhere from, you know, Wall Street, that is looking at processing, you know, days of trading data, to people who are developing solutions around autonomous driving, where they're gathering a significant amount of data every day. And then, you know, also, if you look at some of the web 3.0 applications that are out there looking at, again, expanded memory solutions. So, the key with rack-scale is that ultimately, it will allow the ability for the multiple processing elements that we're now starting to see with these heterogeneous architectures. Those multiple processing elements, whether CPU, GPU, TPU, or XPU, you know, want to be able to access data. And a lot of that time today in data centers is spent moving that data around. And so, what we did was shown how with CXL, we could have these pools of memory that can be accessed by these devices and not have to move the data around, but really move the compute to where the data is. This ultimately will result in, you know, a significant amount of better performance for all of these, you know, large footprint-type workloads. And we see the footprints of these applications just getting bigger and bigger every year. And so, with more and more research that's being done in AI and machine learning, you know, we start to see these databases get larger and larger. So, being able to share memory, being able to have these heterogeneous compute devices access that memory and to work coherently to be able to process that data, I think, is going to be a big push for CXL in the future.
Okay, thanks. Astera—actually, Sandeep—can tell us about the Astera Labs demos that you showed. I think they're in the memory space, also.
Yeah, absolutely. As Kurt has mentioned, what we were able to demonstrate was our Leo Smart Memory Controller, which is a real silicon running actual workloads on both an Intel-based Lenovo platform, as well as demonstrating on AMD Epic-based Supermicro platforms. And what we demonstrated is how we can solve performance bottlenecks for AI and ML type of workloads, which is really the problem that CSPs and data centers are trying to solve in today's world. We were running real-world workloads by using an in-memory database application, as well as highlighting interoperability with both Intel and AMD CPUs on actual customer platforms. Basically, some of the use cases that we can demonstrate are the things that we were showcasing, which includes expansion, pooling, and sharing kind of cases that people are really interested in for the CXL advisors.
Thanks. - Yeah, Rita, the AMD says they support CXL 1.1 Plus. What does it mean when they talk about a "Plus" after that 1.1?
Great question! So, when we look at the CXL memory solutions that we want to enable, CXL is going to be part of system memory. So, it is crucial for it to have a comprehensive solution, which we'll look at using as system memory. What we did was we looked at the feature set and considered some of the important aspects that would be crucial to enable this product level support on memory. Some important facts were making sure that the advanced RAS support, which is critical for the data center memory, is pulled in from the CXL 2.0 definition to our CPU. We support the advanced RAS, which can enable the firmware-first and OS-first error handling. Another thing that is of interest is being able to support the memory persistency if the CXL endpoint has a persistent media behind it. We do have hardware support to enable the CXL memory persistent flush flow and enable the CXL persistent media device if there is an endpoint existing on that. And, pulling these features from CXL 2.0 into a CXL 1.1 host that AMD Epic is, that's how we call it, CXL 1.1 plus.
Excellent, thank you. So, Samsung, there's a question that's come in. It says, "For SMDK, there are two modes: one generic and another for Samsung devices. What are the advantages of this mode, and do you see other vendors creating value-add or differentiation?"
So, to give some background, SMDK is called a scalable memory development kit. It's our memory development kit for heterogeneous memory systems. Basically, it helps users to use our natively attached DRAM memory, like DDR5 or DDR4, and the CXL-based memory expansion in a single system. Now, the way SMDK is designed, the goal of SMDK, first of all, is to accelerate the adoption of CXL memory, to accelerate the ecosystem adoption. Now, SMDK actually can work with any kind of memory expander. It doesn't need to be Samsung's memory expander; it could be an outside device as well. And I would request users or end users to go and play with SMDK. It's available on GitHub, and all the documentation is available on GitHub as well. So, please go ahead, download it, use it, and let us know if you have any suggestions and feedback.
Excellent, thank you. I think this next question almost anyone of us can answer. I might take a first stab at it, and let Kurtis take a stab at it too, but do you see CXL going mainstream to on-premise, or would it be more of CPS cloud folks trying to repurpose old DIMMs? And the answer is yes to all of them. Again, CXL will go mainstream. You're seeing that with the development of this hardware. And I mentioned it before: you're going to see more solutions based. At SC22, the interesting question from the HPC cloud was, "When will I see CXL in my solutions?" And the answer is soon, as all this ripples out. So you'll start seeing a lot of different use cases. The DDR4, the tiering, and persistence is just one of them. So that's my opinion. Again, it will be over years here as it rolls out. And when we get to CXL3 type of implementations, that will really start expanding. I'll open it up to anyone, I guess, on the panel right now.
Yeah, this is George from Elastic.cloud. We're also seeing what we call a tiered data center model start to appear, right? Where we have the cloud guys that exist today. But with 5G rolling out, and with the addition of more and more IoT devices generating data, we're starting to see the need to have equipment at the 5G base stations, as well as on-prem where the data is actually being created. And we've got a number of customers who are considering the equipment that needs to go out in those places, the efficiency of that equipment, both in terms of performance and power, as well as cost. And CXL addresses all of those issues. So, we see CXL really expanding out into the fog and on-prem as well.
All right. So Rita, Sandeep, George, Kapil, and Danny, thank you very much. Really appreciate your insights. Let's move on to the next group and work with them on some of the questions. So welcome Tracy, Steve, Rehan, and Gordon.
Okay, I'll ask Gordon a question. Is CXL 1.1 or 2.0 compliance test software available today?
Hi, yes, it is available already. I guess we made a major milestone last week in having the first FYI compliance event. So, that software and the compliance program are moving along swiftly. So, it's a, I can say, major milestone. So, that software is available partly from ourselves, Teledyne LeCroy, and the CXL CV tool from the consortium directly. So, both of those are part of that. And, we're in the process of defining the actual bar for what will be compliance for CXL 1.1. CXL 2.0 is a little further out, but we're starting the preliminary stages of that.
Great, thanks, Gordon. So, I think—you know—Teledyne LeCroy and Synopsys had a great demo together. So, let me follow up with a question to Synopsys. Let's see if we got another question. Let's see, Steve, I'll reach out to you. Yeah, MemVerge actually has kind of got a, had a unique demo for all the demos that we had there at SuperCompute. Can you tell us a little bit about what you were showing and the area that your software fills for customers?
Yeah, certainly. So, we had a couple of demos. The most visual one was using our memory viewer software. And this is the ability to visualize the performance of your main memory, so DRAM, as you see it today. And if you happen to have Optane DC-Persistent memory in there or CXL, we can also show you the bandwidth utilization of that memory. And the intent is a couple of fold here. Currently, in an environment that does not have additional memory, so you just DRAM only, you can identify the hot working set size of your applications and use that to design the next system. So, you can understand if you need as much memory as you think you need, or if you can use tiering technologies. And then, once you've made that jump to a CXL environment where you have heterogeneous memory available to you, we can continue to tier with our memory machine technology. So, we sit above the operating system in user space. We work with all the hardware vendors, and we will intelligently tier and create in-memory snapshots for fast recovery of your application. We can do remote replication of your applications if you want to do disaster recovery or just do a backup. So yeah, a lot of things going on at the demos.
Thank you. Appreciate that insight. Back to Synopsys. I was asking: how are people using the Synopsys IP for CXL, and what are you seeing as the key reasons people are reaching out to you for that IP?
Sure. So, we're seeing increasing interest in CXL IP from all places in the market. And, in fact, we just, I think, put out a press release yesterday showcasing our customer win with Xconn Tech, who, of course, is also here. They're showing off their CXL switch. And in fact, they are using Synopsys IP, CXL 2.0 IP there for their 256-lane switch. So, the key for us, the value proposition that we like to make, is reliability. We've been in this business for quite a long time. We have a lot of experience with PCI Express, which carries over, of course, to CXL. Reliability, we focus on being early with our customers, trying to target people who are doing kind of cutting-edge design. So, reliability and being first to market are kind of the reasons that people like to look at us. And we've done the joint demo with Teladon McCroy over at SC22. We thought that went quite well. We're showcasing passing tests in the CXL 2.0 specification alongside them. So, we continue to show a lot of development in this field and looking forward to fielding more of these workshops, questions, and panels in the future.
Great, thanks. Tracy, IntelliProp, what are the use cases for a memory fabric consisting of thousands of nodes? And I guess, is that years away? Can I get hardware to demo it or look at it today?
Thanks, Kurt. Yeah, so I think we touched on this with some of the previous panelists, but the early visions were the composability and saving memory efficiency and using applications, particularly container applications like Kubernetes or AppTainer. I think where the bigger potential lies, and again, this was touched on earlier, I think, is with the AI and specifically with the deep learning and machine learning applications. These applications are getting larger and larger databases. If you keep that database—the entirety of the database—in memory rather than swapping to storage, say, that greatly increases your performance. And then, the ability to share that memory across thousands of nodes. And when we say thousands of nodes, we don't just mean memory, but we mean CPUs, GPUs, DPUs, all those processing units that want to work on that memory. Now, you have the ability to share that memory, have a single dataset, have all those processing units access that same dataset. And I think, in support of that, you also need the ability to communicate between these processing units and the ability to run something like MPI through the fabric to communicate who's doing what, synchronize those compute processes. As far as where can you look at that today, at IntelliProp, we've had customers take our solution, and we enable folks to do this in their own labs, stand up their own system, maybe not thousands of nodes, but enough nodes to characterize this idea of tiered latency and how the caches versus the latency to a fabric memories can interact. And then, we've also opened up our lab via VPN for folks to come in and test their software on the CXL servers connected to a memory fabric. And then, a part of this that we really haven't discussed, we've talked a lot about the hardware, but there's a big component here that supports this, and that's a fabric manager. We've developed a fabric manager that not only lets you inventory everything on the fabric but lets you dynamically compose that memory. It also gives you the information regarding the tiers, the latency tiers, so that your composability manager can make better decisions on where that memory should be composed. So, if you have an application that needs very fast memory, the fabric manager should be able to communicate the tier to memory that is close by, and also the memory that is further away. So, I think the applications, to summarize, will be this exploding dataset size in AI, and then, probably applications we haven't even thought of, right? So, this is a paradigm shift in how memory is deployed. And so, you got a lot of really smart people out there who are going to look at this and say, "Oh, I can do something really cool and new here."
Thanks, Tracy. Great answer! We've got a question in about Intel, Kurt. So, let me ask you: It seems Intel is among the few that are working on type two devices. What's the expected timeline for type three and type two to be in the market in volume?
Yep, thanks, Kurtis. So, let me start with type three. I think you see that by all these different demos and all the different companies engaged in type three. I think we're eminently, in the near term, a couple of years away from that. And it will be just folks really kicking the tires—probably the wrong wording—but really flushing out solutions, validating them, and putting them into implementation. The type one and two devices are a little more interesting in the sense that the accelerator space is a little more tied to applications and what companies are doing. So, without going into proprietary or NDA-type information, I know there are companies out there working now that there are FPGAs from us and our competitor that run at gen five type speeds. Folks are looking at those, developing with those. And as you see, we have IP that supports type one and two. There are others out there that have that also. So, those developments will occur. I think you'll see them more in the sort of—not niche—but company-based solutions first before they go mass in that sense. But there will be SmartNICs or other things that start rolling out from companies also.
Thanks, Kurt. I'm gonna call you a pessimist. I'm gonna say that type three might be there earlier than two years, but maybe I'm just a "glass half full" kind of guy.
Yeah, it's a matter of getting through validation and whatever. And I'm not saying that it was mass—I guess was the word that was there. So yeah, they're definitely going to be sooner than the type one and two. So at least more visible in the market; let's put it that way. The other thing I'll say on the type one and twos, from the CXL marketing group side, is we have not done a good job of distinguishing where we are different than PCIe, for example. And I know our chairman, Jim, always doesn't want us to compare negatively or put PCIe down. But we need to be succinct on where PCIe can be used, where CXL, what the advantages are, and then roll it up. It'll be say low latency, but what does that mean for a user? So we're going to answer those as we look more into use cases and solution-based marketing this year and beyond.
Excellent.
Maybe I'll flip this one and ask you, Kurtis. The CXL org has made tremendous progress in scaling out; however, direct-attached use cases are still quite important. In that case, nanoseconds. IBM has demonstrated the potential to use the OpenCAPI OMI interface for main memory. What are your thoughts on the CXL org optimizing the direct-attached latency in general, or perhaps using OpenCAPI contributions?
So, I definitely agree that, as you do direct attach, latency becomes important. And OpenCAPI did a fantastic job of really optimizing that in a point-to-point solution. I don't want to speak for the technical task force that puts all this together, but I do know that they are always looking at the IP the OpenCAPI brings, and what's possible with that. But they have to balance that out against maybe a fabric environment, where you would go through switches to get to that memory. And so, I think you will see latency optimizations as we move forward in our specification. I think you'll also see a tiered approach that is really important for the ability to take that pool and shared topology that I showed and make it a reality. So, there are both use cases. Most of our focus, CXL so far, has been on the latter, which is: how do I create the tiered environment that has some advantages in the total cost of ownership and the ability to kind of take memory and share it among multiple devices.
Let's see; let me ask you a question—maybe to the panel—see who wants to pick it up. The CXL planning: for fault tolerance or redundancy, to remove single points of failure, as we're moving toward resource sharing and pooling. Native host memory is not hot swappable, but CXL memory or storage could be, should be. Is there, or what are the plans that CXL has around fault tolerance?
Yeah, this is Steve from MemVerge. So, I think that—well, the specs certainly add error handling that is then passed through to the memory controller, and AMD said they already have some of that baked in. There's work currently going on inside the kernel community to add that functionality to the kernel as well. And then, from our perspective, from the software side—since we're in the data plane—our next version of the product will also be in the control plane. We have the ability to detect those errors and move the data away from the failing device before it fails, and replace it with another one from the memory pool. So, we definitely have solutions for fault tolerance, particularly with our snapshot technology. You can instantiate in different instances using a complete in-memory instance, a copy of your applications. So the application never crashes. So, you can do a lot of things in software for sure. We're certainly leading the charge here.
Thank you. Let me ask LaCroy, Teledyne, Gordon. Gordon, can you sort of quickly say what's the overlap between PCIe testing, compliance testing, and CXL testing?
Yeah, the overlap in terms of the, so we have the electrical part of the testing for PCIe is fundamentally the same as it is for CXL. You know, as the way CXL rides on the PCI Express physical layer, but the compliance testing for the protocol part of it is really quite different. I mean, the concepts of it are the same, you know, around the logical, physical layer, the data link layer, and so on. So those tests were written with that same kind of mindset, but it is a completely different set of tests. So there is the overlap of the electrical layer, but that's as far as it goes. So, the compliance program for CXL really is its own independent program. And there's quite a number of tests. I think, you know, we have probably 70 or 80 tests already for CXL 1.1, so pretty good test coverage.
Thanks, Gordon.
We have a question for Xconn, but Xconn is not here, so we'll table that one. And actually, I guess, as we get close to the end here, any questions that we don't answer, we will be sending out in a blog—a wrap-up blog. We'll try to wrap up all the different answers and add the ones that we didn't get to, also.
Yeah, great call-out, Kurt. I guess, you know, opening up to the panel, is there any last-minute thing you'd like to add or bring to the attention of our listeners?
Kurtis, I think something we should point out is—the consortium has a lot of work groups working on hardware, fabrics, fabric managers, and the more folks that get involved in that, the better the overall solution ends up being.
Absolutely agree! We're always looking for new members—active members—which makes a big difference in both what we can focus on and where we can focus.
All right. Well, let me maybe start the shutdown. I'll say, "Thank you to everyone who signed on; really appreciate your time. Thank you to all the panelists for coming on board, sharing your insights and solutions that you've brought in." And Kurt, any last words?
Just thanks for everybody, thanks panelists. For the audience, look for CXL to be presenting at many shows this year; by doing demos late in the year at SC again. And like I said, look for solutions-based type marketing as we start seeing that coming out from different vendors, with all the different panelists' companies as the base hardware. So again, thanks for everything, and we'll close the webinar. So, thank you.