-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path99
46 lines (23 loc) · 13.1 KB
/
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
All right. Good morning. It's good to start with me because we'll save the more important content after the breakfast when the people are here. My presentation is more of a summary of what has happened so far in the CXL ecosystem, what is happening right now, and what we think will happen here. So it's past, present, and future of CXL and big memory computing.
Let's start with the past. So over the last four years, CXL Consortium has done some awesome work. We have Siamak here, herding over 200 members, 250 now, and it's not an easy job, but it successfully published the specifications of 1.1, 2.0, 3.0, now 3.x coming out, and it really mobilized the entire industry to working on the CPU, on the memory devices, controllers, now into CXL switches, and now software and system as well as I go into it. So there's a lot of the groundwork that has been done by the standards body and the companies involved in the standards bodies.
And in 2022, last year, we are starting to see the first samples and sample devices coming to market. There are the initial controllers, initial expander cards that are available for POCs and for testing. This is a report that just published by YOLE Group, and they estimate there were a couple million dollars of equipment or devices that were produced by the vendors last year.
and these include processors, memory expander card, controllers, and some of the initial switches as well. And then there is various software work that happens in the operating systems, in the hypervisors, and companies like MemVerge, like us, who start developing system level software for CXL.
So as we exit 2022, we are really just have the specification and some of the initial technology development that took place. Over the last 12 months, we are seeing really a much increased activity in the area. In fact, if you come to OCP last year, and if you come to OCP this year, there has been a great increase in the number of sessions and presentations that's covering CXL technology as well as various use cases, in particular in AI and machine learning of the CXL technology. And at the same time, there has been various collaborations for joint solutions between hardware and software that has taken place over the last 12 months.
So these are kind of moving towards what we call the co-engineered concept cars, not quite for production, but they are demonstrating what is possible with CXL technology. And it often requires collaboration between multiple vendors to come up with these joint solutions together.
So let me give some examples of such joint effort. The first one is what we call the Project Emory. This is just demonstrating the memory expansion capabilities of the CXL Expander Card. There was a collaboration between Astera Labs here, Samsung, Supermicro, and MemVerge, basically having the E3.S Samsung Expander Card plugged into Supermicro and demonstrating the acceleration for various benchmarks. And I think what's being shown here is running TPCC on MySQL databases, demonstrating with a memory expansion that the database can provide a higher transaction per second and with lower CPU utilization. And there are various other memory expansion joint solutions that have been created based on this as well.
And here is another example of a collaboration. And with this one, it's the collaboration with Lightelligence, which is creating a photonic solution that can provide longer distance memory expansion. And what they are doing here is doing the memory offloading of GPU memory from the high bandwidth memory from GPU to the CXL memory and effectively improving the performance of AI models. And these demos, by the way, are available at the cxlforum.com website, which I'm going to go into.
And the next one is an emulated environment that we have created using the QEMU virtual machines. And this is to really emulate a pooled memory environment. The actual physical hardware of memory pool are still very difficult to get. They are still in the early stages, and there will be more of them next year. But today, it is difficult to have access to a real hardware that allows multiple hosts to access the same memory device or memory pool. At the same time, memory pooling and memory sharing is an area of very high customer interest. And so this environment is designed so that without the hardware, it allows the developers to have an emulated environment where you can develop or modify your software in terms of how you can use a pooled memory environment. It is now available for you to use, and it is good for your software development and testing before you have access to the physical environment.
And now, some early hardware is becoming available, and one of them is the Niagara system from SK Hynix. I think you will hear about it later today from Jungmin here. The Niagara system is a memory appliance that can be connected to four servers. And in fact, there is a next generation Niagara 2, which will become available that can be connected to eight servers. And this essentially allows the memory to be dynamically allocated using dynamic capacity devices mechanisms, and that is going to be fully supported by Niagara 2. We did some early integration with the first Niagara system, and we are able to run the in-memory analytics benchmark and demonstrating performance improvement by real-time allocating more memory to the servers.
And last but not least, there is a project called Project Gismo. Gismo stands for Global I/O Free Shared Memory Object, and this is a method to allow the applications on multiple servers to have access to the same memory region on top of CXL 2.0, not 3.0 protocols. The reason this can be done is because our software is doing the necessary cache coherency between the CPU caches of the different nodes. Now, this would be very expensive if it is for general purpose. So Gismo presents an object API where you can create objects and read objects. You can seal the object and unseal the objects. And what it does is it ensures at any one time there is only a single writer to the memory region so that the cache coherency and coordination can be simpler and less expensive to achieve. Therefore, the performance can be achieved with ease. And we have done some initial integration with some software, and here is an example that we collaborated with the Ray community, where Ray is an AI framework that's gaining popularity among the AI developers. And we were able to run the Shuffle Benchmark on top of Gismo integrated with Ray. It replaces the single node in-memory object with a multi-node shared memory object on CXL, and we're able to demonstrate higher performance. And in this case, when the object is being accessed locally, it has the same performance as before, even though the object is on shared memory rather than local memory. When the object is on the neighboring node, in the baseline case without Gismo, it is much slower because it needs to copy the data from the next node to this node before it can be accessed. With CXL shared memory plus Gismo, it is the same direct memory access to that shared CXL memory, so it's much faster, about seven times faster there. And when we run the Shuffle Benchmark, about 40 gigabytes or 50 gigabytes here, it also is 280% faster for that benchmark. So this is some initial demonstration of the performance improvement as a result from CXL enabled shared memory. And we are continuing to work with Ray as well as a number of other software to explore the power and benefit that shared memory could bring to us.
So now as we are in the fourth quarter of 2023, I think we have some initial technology and initial POCs, and we are getting close to creating the actual working systems that can explore both memory expansion, memory pooling, and memory sharing cases.
Now let's look at what could be happening in the coming years. Clearly, there has been a boom in demand for larger memory bandwidth and memory capacities that's driven by AI, in particular generative AI and large language models, where the number of parameters are increasing dramatically, and it's continued to do so. And computing is becoming more heterogeneous with GPU really dominating, but also other types of AI accelerators emerging as well.
And this will be among the big drivers, we believe, to create a large market for CXL technologies. This is from the same report from YOLE Group, where it is giving a very positive forecast of the market size for CXL in the next five years. It is projecting by 2028 the market will reach $15.8 billion. It consists of the actual memory devices that's going to be connected through the CXL protocol, the actual controllers for the memory expanders, as well as the switch and the fabric that's interconnecting between these memories and computing devices.
And here it gives some other breakdowns as how the YOLE Group analyst sees it, where a portion of this market will be drives and the other will be the add-in cards. And there will be a flip coming from today where the majority is the add-in cards, where it's going to be becoming more of the actual E3.S or the more plugging in the drive form factor. And also it's predicting over the next five years, the direct attached expander will give way to the more fabric attached memory expansion and memory pool enabled by a switching fabric.
And what this will enable, I think, is a more disaggregated, composable architecture where the memory will become a first class citizen or a system of its own, just similar to what happened to storage 30 years ago. And then there will be new storage, in this case, memory software being created that enables memory systems with data services that specifically enabled for memory and enabling interesting capabilities that was not possible before. And this would also enable a better utilization of the memory resources and potentially making the data on those memory devices more available and better protected as well.
So I think, as YOLE group said here, this change of paradigm could give birth to a new industry on CXL, memory fabric software systems and services. And that is an area that MemVerge is working with all of our ecosystem partners, trying to enable this memory as a service, enabling the software that's necessary to create this memory as a service.
And this memory as a service, it cannot be only done by us for sure. It needs layers of capabilities enabled by all the partners in the ecosystem, including the standards bodies, including the partners who are creating the devices and the systems, including the operating systems, the hypervisors, the Kubernetes schedulers, and specialized software vendors who are developing the specific data services.
And in particular, at MemVerge, we have been working on a number of technologies here. There is a memory viewer that provides a better visualization of your memory infrastructure and that is available today for free download for you to evaluate internally. And this can be a very good sales tool, and it can also be a very good utility for the end customers as well. And our flagship product has been memory machine that are running on the actual hosts that can do intelligent placement and mapping between the data and the memory. And it can do intelligent tiering to maximize either bandwidth or minimize latency. And we have also developed interesting data services such as in-memory snapshot that can run on all types of memory, including CXL memory. And on the right-hand side or left-hand side here, we are developing a new component that we call the memory composer, and that essentially can manage the fabric, manage the switch, and enable intelligent dynamic capacity management of a memory pool. And these things combined, plus software such as Gismo, can enable a really fabric-attached memory appliance for pooling and for sharing.
So here are some of the links where you can act upon. So the first is the memory viewer. As I introduced, it's software that's available today. Anyone is welcome to download it. And we can also discuss how we can collaborate on this to help sell and troubleshoot your CXL devices to the customers.
And CXL Forum, as Frank alluded to, is expanding from just an event to a community that we just launched, cxlforum.com, where a lot of content is available and more content will become available. It includes an academy. There will be various tutorials that's contributed by the community and available to the community that can essentially educate the user base about the technologies that are coming. And it has blogs. It has information about the various joint projects. And we will launch also a Discord channel, which allows the early adopters and the vendors to socialize and discuss topics related to CXL.
And so if you go to cxlforum.com, you'll see this and you just click subscribe and you'll get into those channels and you have all the access to the various projects, solutions, the academy, and so on. And we'll continue to have events next year as well. So that's a place for all of us to hang out.
And then in this presentation, I referred to a research report on the market sizes and other information. And I think YOLE has published this report and this can be accessed through this link as well. So that's all I have, the past, the present, and the great future of CXL. I look forward to the rest of the presentations today. Thank you.