87


All right, thanks for you guys come for OCP again. Last year, I think when I was here, I'm the only guy talking about CXL switch, and this time we have additional two. That's very good to see the industry is really shaping up, and the good news for XConn is that we are the only vendor, we are leading the industry wave. So today, basically, I will talk about our solution and the roadmap as well. So the current state for the CXL switch from XConn is that we start sampling our chip about in early March this year. Production version is going to be come to the market in the middle of the next year. Let's dive into it.

So one of the main things for the CXL 2.0 and the 3.x, the switch is obviously enabled to composable computing and also, of course, for the AI computing. So in terms of the wave, from our understanding by working with our, we have about more than 20 customers in the ecosystem. We've been building this in the past two years. So our understanding about this market is that, so this picture shows like three faces and my former colleague, the Broadcom folks and Microchip, they are leading the first wave which is using the PCIe to enable the memory pooling and memory, not memory, mainly is for like SSD, the NIC and the GPU, this type of pooling and resource sharing. And today we are in a wave basically industry is transitioning to using the CXL, especially for the CXL memory transaction to enable this memory pooling and sharing. So the leader is XConn, we are enabled to working with the CSP providers, the hyperscalers and also those system vendors which developing using the AMD or Intel's processor. Our chip can deal with both CXL 1.1 and 2.0 as well. So this solution has already started about almost a year ago. So we saw a lot of trend, those companies are moving in that direction. So any of you guys in this market, please, you have to really jump into this wagon. So this wave is going to be enabled, the composable memory computing. And the next wave is going to be, as my colleague, the industry colleague Brockham and the microchips mentioned that, is going to be CXL will enable the AI computings for the memory pooling, go through the GFAM, and because of the main thing is that the LLM, the ChatGPT, all these AI applications, they are very memory thirsty. So in order to build this kind of very large AI servers, definitely the only way to go is to have this composable memory structure.

So I think this graph is pretty straightforward. So the composable, that's the only way to go. And today's standalone with the servers is number one is wasting the memory resource and also is going to consume a lot of the power. So it's just going to be under the way, it's going to be replaced by this composable memory system, which is the nickname called the JBOM, that basically means just bunch of memory. These memory compliance, memory appliance. So the different hosts can be massively connected through the CXL switch and the CXL switch can be cascading to a much larger fabric. So that is a way to enable all these hosts have an opportunity to get a very large amount of memories as needed. So any of this memory allocation is going to be based on a needed base, and after the application completed, then the memory can be released back. So this is called composability and disaggregation.

So to achieve this goal, and today the industry obviously is in the CXL because the processor only supports CXL 1.1 and 2.0 is around the corner. So the XConn, our switch, we are able to enable even for CXL 1.1, we can enable the memory pooling for the composability. So we are the first one to have this very large capacity, like more than what our competitor has in terms of the lane count for both 1.1 and 2.0. One of the very good advantages for the large capacity is that a lot of our customers take advantage of the large capacity of our chip to build a very large memory pool. Because for the first wave, the composability to have a very large memory is very essential. So that's why our chip offers a very high capacity to enable this.

So the next wave, in about two years, because CXL 3.1 is still undergoing the final review. So the 3.1 mainly is going to enable the AI memory sharing and the memory pooling, go through the global attached fabric memory, which can be, for example, you can see that these, so those GPUs can connect with our chip and also GFAMs can connect with our chip as well. So the GFAM basically is a very large global memory, which can be shared by the host and the GPUs. So since the GPU needs a lot of memory, the HBM, they only have a limited capacity to host the large memory model. So basically a lot of the data storage is going to be geared to the GFAM. And the 3.1, they define those cache coherency protocol called the back invalidate. So basically the GPU and the GFAM can use this new CXL protocol to keep all this sharing to be consistent. So for 3.1, they can build a tree structure like Microchip folks mentioned that. And the most exciting thing is they can build a much more large mesh, this kind of fabric. So that will open the door for people to build a very large AI system. So our next generation of the chip is going to fully support these functionalities.

So this is our product and also our customers basically are ordering this product to enable their developing their software system. Because in this ecosystem, software is the key. Some people say hardware is hard. I think software is pretty hard too. So that's why our ecosystem partners, they purchase our SDK to build memory pooling systems based on what the market has, which is CXL 1.1 servers. And they are able to start developing that software. So we provide the retimer riser card and also the baseboard is with our switch chip inside. And also we have those connection expander as well. So we are working with the industry leader for all the CXL modules to get this ecosystem to be able to build up.

So how do we accelerate the CXL adoption? Obviously XConn is a pioneer in this industry. We started working with the leading CSP providers and also memory vendors about almost two years ago, and we collect all of their inputs that really help us to shape our first revision of the CXL switches. And we are also working with the memory vendors, with retimer vendors, software vendors to get this ecosystem to be able to move ahead. And some of our pioneer customers have already started demoing this; it's almost like a production-ready system to their own customers as well.

So this is our product line. So today we are custom sampling our first silicon about six months ago. So our production version with a 2.0 is going to be on the market in the middle of the next year. So that's going to be one of the cornerstone flagship chips to enable our current customers to start production into their CXL memory pooling system. And our next generation is going to be PCIe 3.1 with a PCIe Gen 6, 6.1. So we're expecting to receive our sample in Q1 2025. We are working with our CPU and GPU partners on this Apollo 2 as well. Please you are welcome to visit our website and to order our products. Thank you.

Any questions?

Hi, I have a quick question. So you showed about in your switch about the scaling, like how it can scale to for pooling devices, right? But in March 2023, Microsoft, Google and CMU, they came up with a paper saying that multi header devices might be a better option when you are sharing and pooling memory for the manageability in a data center. How would you respond to that?

Sure, that's a good question. So basically, the ultimate solution for the industry is to use the CXL switch because CXL switch is much easier to cascade. So even for our 2.0 switch, we do have the proprietary mode to enable the cascading between several of our switches so that our customers can connect many more hosts and also a large volume of the CXL modules to build a very large system. For the multi-head, we will feel like this is like an intermediate solution because you cannot scale, you cannot share well with a large number of hosts. And it's also very hard to build a very large memory system as well, like a few hundred terabytes or even bigger than that.

The point is they don't want to build a very large system because the manageability will be very difficult.

So they want to have a smaller set of hosts which are sharing some devices so that they can have kind of islands of things. They don't want to have 4K nodes sharing one piece of memory. That was the point of how I took that paper.

Think about all these applications, right? These cloud applications, there's OpenAI, there's ChatGPT, the LLM, right? So those guys are sucking so much memories. So those guys are the one drive this demand to have to build much larger memory systems. I agree with you, if you're running small applications in a small scale, that multi head, that's a good, sufficient, easy to manage. But the challenge is, and also the industry is working towards, is to solve this manageability so that the ultimate goal is for the data center to be able to build very large AI servers.

So just a quick one. In Samak's talk this morning, he showed tree structure topology in which it was almost like when you are in this pooled device configuration, you can just come in and replace a multi-headed pooled node with just a switch and expand your topology. Is your switch compatible with the idea that Samak showed?

Actually our switch is pretty simple. We are the CXL/PCIe switch. So for the multi-headed, obviously you can connect with our switch as well since multi-head also has a CXL interface, right?

I'm just clarifying it's an investment protection question. So if somebody builds pooled direct wired thing right now, and then they just take out that pool and replace it with a switch from your company, it will work, right?

That will work right away, yes.

Good answer. A quick question about latency. So if we cascade multiple switch, and yeah, so you have a lot of memory for the OpenAI kind of-- Right, right. But however, for those like a model, you also for inference, then latency is very sensitive. Then in that case, if you cascade multiple switch, how do you handle that limitation?

Right, yeah, that's a good question as well. So typically in the CXL world, because memory latency is very sensitive, you don't want to cascade that much. So that's why we build this chip with much higher capacity so that if you don't need to cascade, just don't do it. Like if you use Broadcom chip, you probably have to use two chip heads cascading with each other, right? So then you will add in the latency. So that's a good advantage of ours.