286

YouTube:https://www.youtube.com/watch?v=oC2pF9W4-pY
Text:
Hello, everybody. So my name is JP Jiang. I'm the SVP and the co-founder of Xconn Technology.So Xconn is an IC startup company focused on the connectivity solution for the AI computing and the data center.Our main product is the PCIe switch and the CXL switch.So today, my topic is about the CXL switch to enable a scalable and a composable memory pooling slash sharing solution.

So the scalable memory pooling and the sharing solution is based on the CXL 2.0 spec.So people generally...have seen this picture, which is being, you know, presented by the CXL consortium.So in this picture, the host, which is the H1, H2.So you can see there's multiple hosts.Where in the middle of these multiple hosts are connected to a CXL switch.So Xconn today has a silicon.And we're bringing that silicon to production very soon.Then on the downstream port, a number of CXL controller or CXL memory devices are attached to the switch.Since there are many of these controller and the memory device, they form a large memory pool.Then on the very left side, here is a management host, which is running a function called a fabric manager.So this is a spec defined function in the CXL spec.So with this kind of architecture, the memory pooling slash memory sharing can be enabled.So right now, our switch, a single switch, can have up to 32 ports.Each port can be by 8 or by 16.We basically support by 8 bifurcation as a current product.And then we fully support the CXL fabric manager.And also, we are, at this product, we are supporting a simple fanout of switch cascading.To enable a very big, big memory pool.Okay, next slide.

So let me talk about some of the application about memory pooling slash sharing.So CXL is not new anymore.And during the years, we've been working on our silicon in the past four years.And in the past two years, we especially started working with our customers.So we're mainly seeing the application that utilizes the memory pooling, which is supported by the CXL 2.0, are mainly in the following.First is the in-memory database.The in-memory database normally occupy a very large memory size.And because of the performance requirement, these database have to reside in the memory space to be able to support the kind of, you know, the performance requirement.So these database normally can run, you know, from 10 terabyte to, you know, even 100 terabyte.And this is a very, very hard to support with the existing kind of server architecture.So with the CXL large memory pool, that can be easily supported.And also the database application, typically the, they are, they basically needed memory sharing as well, because the database can be shared by multiple running applications.Also, the solution, you know, before CXL memory pooling is using RDMA based solution.As we know, you know, RDMA is a very complicated protocol and RDMA's performance is a, not very good in terms of latency.And in contrast, the CXL is a, a very simple protocol.It just follow a load store kind of programming.And also the CXL from the very beginning was designed to be low latency, high bandwidth.So the CXL memory pooling spits really well for this in-memory database application.Also, the, we've been working in the past, especially this March, this past March, we work with Samsung in the MEMCON for a demo using the memory pooling.And Samsung already demoed with, you know, the CXL switch supported memory pooling.The large in-memory database, such as, SAP HANA can have a great performance improvement over the RDMA or the, you know, non-CXL based in-memory database.Then another application is for AI inference.So AI is getting to the application stage where the application means you're running the AI inference.However, you know, AI, a lot of AI, they have a very large model.So very large model means when you do the AI inference, that model require a huge memory footprint.However, a lot of AI inference are running on some of the, you know, not the top of the line GPUs or the systems.So which are typically don't have enough.HBM or the system memory.So we've been working with some pioneering companies, for example, Manverge to explore how to use the CXL memory pooling to enhance the AI inference.So the CXL memory can be utilized to do the AI inferencing.We utilize the large memory pool and a huge memory expansion provided by CXL memory.And the last is basically the original solution to the memory wall issue that is the computing industry have been facing for many, many years.And also the lowering the TCO due to, you know, the benefit from the CXL memory pooling.So the today, you know, CXL memory expansion pooling.And already grew to a very large size memory pool.So, you know, originally the memory wall issue was there just not enough memory, which can be can be resolved by this big memory pooling.And also multiple hosts sharing a large memory pool through this kind of memory pooling sharing, the utilization of memory can be greatly improved.There are some other user cases that people think, you know, to reuse some of the some of the DRAMs in the in the outdated system, for example, DDR4, and continue to reutilize it through the CXL memory pooling.

All right, so I'm going to briefly talk about Xconn's product.The CXL switch from Xconn.So we are the first company in the world that has a working silicon that support CXL 2.0 and also, you know, PCIe Gen 5.The switch has a total 256 lanes, which we can bifurcated to x16 or x8.And the chip we initially was designed to.Address this memory wall issue.So we really take a lot of work to optimize the port to port latency and the power consumption as well to make sure it's a competitive product.So the current product is already working with the shipping CXL 1.1 processor from both Intel and AMD.And we're also getting ready to support the upcoming CXL 2.0 processors.And another good thing to worth to mention is our switch also working so-called hybrid mode.The hybrid mode is, you know, this switch can connect to both CXL devices, host slash, you know, PCIe device and a host at the same time.And all these devices can work in simultaneously.so this is a good this this kind of hybrid mode provided a good environment when you know you see in the market a lot of pcie devices they are transitioning to the cxl device 

So, a typical memory pooling box—um, the design diagram would look like this picture. So, in this chassis or box, typically there is a CXL switch, and there's a BMC and a management host. Then, the management host, running the fabric manager, can be either an ARM processor or an x86 processor. The BMC and the management host work in sync to provide all the needed functions to support the working of this system. And the switch has so many ports; some of the ports are used to connect to the host, and the rest of the ports can be used for connecting to the downstream devices, whether those are the add-in card or the E3.S card. In this diagram, there's four ports being utilized to connect to the host, and the rest are being utilized to connect to the CXL memory devices. But this can also change; it depends on the customer's design.

So, just an example would be, you know, the demo Samsung did in the March MemCon this year. So, this is a rack-level, rack-scale memory pooling system where there are three servers and one of the memory, uh, appliances. So Samsung calls that a CMM-B device. So, these servers will connect to the memory box, which provides a large memory pool up to 16 terabytes. So, the servers don't need to have, you know, a huge amount of local DIMM. They, when they need the CXL memory or need the memory, will go to the CMM-B, the memory pool, to request the memory resource and keep the application running. So, this is a system that Samsung used to demo SAP HANA in-memory database, and show a big improvement because of the CXL memory pooling, provided all the benefit of low latency and high bandwidth memory access.

So we also mentioned the Xconn switch also supports a simple fanout cascading.So in this diagram, you see it's a two-level switching.So at the level one, the switch connected to three level two switch.So the advantage of this kind of cascading is you can build an even bigger size memory pool because there's more downstream port to connect to the CXL memory device.

Okay, so just want to give you some performance result of Xconn's CXL switch.So this is a bandwidth testing result using the switch where the switch is attached to a CXL memory device.That CXL memory device is a x16 memory device.So in this configuration, you see the performance of read and write in the different ratio.So one to one read and write go up to 52 gigabyte per second, and the 100% read will be close to 30 gigabyte per second throughput.

We also show the latency performance.So in this setup, there is a two-socket host, which has a CPU zero and a CPU one.So the CPU will connect to local DIMM, ER DIMM, and it can also connect to a directly attached CXL device.And here, it also connects to a CXL switch, and the switch has a downstream port attached with a CXL memory device.The latency from a CPU zero, which is directly attached to the switch, and then go into the CXL memory.So the read, round-trip read latency is about 500 nanoseconds.Where you see some other numbers for the directly attached CXL device is about 260.And because it's a two-socket system, if the other CPU, CPU one, through the link to CPU zero, and to access the switch-attached memory, that latency adds some further inter-CPU latency to become around 743 nanoseconds.

All right, so I just want to be a little bit forward-looking.So today, we are working on the CXL 2.0 switch, but Xconn is already working on a future product, which is the CXL 3.1, also the PCIe 6.1 switch.So we believe this kind of switch and the interconnect solution will fit into the future AI computing systems really well.Because at that stage, the fabric connection will be formed by the CXL 3.1.And all the computing component, including the CPUs, memory devices, or some of the in-memory device. And the network card, they can be either CXL or the PCIe and accelerator also support the PCIe interfaces. So this kind of system will allow the AI computing system have a bigger, you know, fabric attached memory network. So this will allow the system to do a lot of efficient AI computing. 

So that is my last slide. So in the clothing, in the clothing, the CXL switch enable memory pooling, and it already provide a solution to the memory wall issue. And this memory, and this solution can also be deployed to the AI computing as well as the traditional, you know, HPC and data center computing work. And then, even at the CXL 2.0, the software enabled memory sharing, for example, some solution from MemVerge, the Gizmo software, it would already enable the memory sharing of CXL memory. So that kind of memory sharing solution will have a lot of usage case, a lot of application to take advantage of.And Xconn definitely working hard with our partners and the ecosystem partners and the customer to make the solution to be able to deploy to the real usage case for, you know, memory pooling and sharing.Okay, thank you.