-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path92
77 lines (39 loc) · 17.3 KB
/
92
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Thanks, OCP and all the team members here for attending this one. I think a lot of talks on the CMS from the software side, the API side, protocol side, so a little bit hardware right now. How do you basically see people, see the CMS in the memory form factor, correct? So, we want to basically talk about a couple of things. Introduction has been done, so I just don't want to spend time on that.
So, why the need for this CXL memory expansion? Second thing is, why do you really need a new form factor, correct? That's what are the use cases which basically can be put onto the new form factor. Then, the electromechanical for that. So, this is a joint presentation between Kiran and myself. He's from the OCP side. I'm coming from the SNIA, but we want to contribute to the OCP side a lot.
So, if you look at it basically, the computing paradigm is completely shifting. Whatever used to happen on the top with the CPU-centric, memory, storage, peripherals, everything was there. But the pieces of the servers are basically getting decomposed and every piece is becoming extremely intelligent piece in the sense. So, if you look at the memory tiers, they are becoming extremely intelligent because you are looking at the confidential computing, a lot of other stuff which can migrate to the memory with a local controller, which is very much like a memory servicing broker type of thing. So, what form factors which we can think of that, rather than attaching directly to the CPU, is the talk we will basically focus on.
So, this is why we see basically that the memory expansion beyond the CPU attachment is required. Every new generation, whenever the core count goes up, you basically go through the cycle of this memory bandwidth per core kind of thing. So, it basically goes to us up and down, up and down, up and down. But the pace of the innovation on the direct attach memory bandwidth starts slowing down as we basically are approaching towards the 6.4 gigahertz and all other things. So, the capacity tier becomes very important for that. A fast tier which is attached to the CPU will remain, but then the capacity tiers will start coming into the picture, where the additional bandwidths can be provisioned as the core count starts increasing. One due to spatial restriction because you cannot increase beyond whatever the channels of the CPUs, as well as the packaging size of the CPUs fundamental right now at 6,000 pin and above. So, these are the two things which are very important, and we really need to see the memory which is beyond what is attached to the CPU. So, that's one of the driving factors for what are the alternate form factors which we can really use it for providing more bandwidths on this one.
So, this is the tiers we are talking about, correct? You have a zero protocol overhead tier, which is the directly attached, which is you can call as an HBM, the direct attached DRAM channels. And then we start slowly, slowly migrating away from the CPU with a little bit latency overhead on that. So, the tier one, tier zero, tier one, tier two, tier three, is basically every tier one numbers is going to add a little bit latency on top of that. But this is where the lot of interesting stuff, which is going to come, where the how to hide the latency of the interconnect. The controller basically, a sequencer has to really improve upon how to cache it, and that's where the hot page algorithms coming from the software side. So, those migrations will happen even so. From the software perspective, this will become a transparent for a lot of the application side.
So, why new form factor, correct? So, if you look at the existing form factor has been designed, most of them from the PCI world, which is non-pluggable. So, it is inside the server. So, servicing is an extremely difficult form factor for that. And there is a lot of waste on the PCI side, on the electromechanical, the way it has been designed. Then came basically the need for the E3, which is the storage centric. So, the E3 does solve a lot of interesting problems, which was happening on the storage side, from thermal as well as the capacity point of view and the bandwidth perspective. But it is not that great form factor from the capacity tier perspective. Of course, you can scale by putting more E3. But then, basically, you are increasing the total cost of ownership. If you really want to have a certain capacity, then you need more and more E3s into the picture. So, the mezzanine form factors are always going to remain that these are all proprietary. There is a missing element here, is that edge computing is going to happen. So, what are the form factors? So, we looked at from this, all this different stuff, and say, hey, this server disaggregation is requiring memory to be treated as really one of the pillars, and then create a new form factor around that.
So, we basically put four use cases. The middle one is the second one-- sorry, the third one is the memory cassette, where you could expand the memory beyond typically 256 gigabytes to, say, let's say, two terabytes and four terabytes of memory. And that is the driving factor for the use case for the next one.
So, these are use cases. These are all CAD drawings, just to basically say how much area is required for the controller, how much area is required for the standard DIMMs. These are driven from the standard DIMMs. So, we are not asking the people to put the chips on the board. You could use a standard RDIMM ecosystem. The reason for that, any time today, DIMMs are likely to fail. So, replacement is extremely important for that. And any time E3s or any other thing, any chip basically fails, you have to basically throw that chip, throw that module away. So, the idea here is to basically use the RDIMM as a basic element of that, component of that. And anything that goes wrong with that, you can just replace that.
So, what is PMM, basically? So, the PMM is basically a pluggable multipurpose module. This is being defined as a definition in the SNIA. And what were the tenets for that? So, the one thing I already mentioned is that it is driven from the standard DIMMs. So, standard DIMMs, if you look at the length perspective, it is six inches. So, it has to be at least minimum six inches in the depth side. So, that is one thing. Second thing is the networking side, a QSFP connection should be also available. Then the ASIC side, how much is the ASIC width we are looking at? Around 50 millimeter of ASIC width is what we are looking at. So, given all these, the connector footprints, the ASIC footprint, and the standard DIMMs, we came with certain formula, basically, is that, hey, this is the volumetric space required, which can address for the next generation of the form factor or the requirements for that.
With that, basically, this is the form factors. We have defined three form factors. One is the longest one, which is on the left-hand side, is the largest one, is the long depth, which is called as a 2T, 230 millimeter or nine inches sort of thing. This should accommodate your standard RDIMM, the ASIC, the power required for that. And we should be able to basically give the capacity from 2 terabyte to 4 terabyte based on the model density, because it depends upon model density. In addition, we define, basically, the model form factors to be certain use cases. It doesn't require 2T form factor. We created a 1T, which is a smaller form factor for accelerator market. You could define another form factor, which is short depth based on the mechanical of the server for edge computing and all of the things. So there is very much flexibility from choosing a lot of these form factors. The biggest change from the E3 and all other thing is the power requirement. If you look at it, the power requirement is 200 watt, 400 watt, and 600 watt. So these are the three profiles. Of course, memory cassette may not require 600 watt, but 200 watt is definitely the starting point for most of the memory on that.
So this is the host side connector. So we have defined-- Amphenol is-- I don't know if Amphenol is here. They have actually created the form factors and the host side connectors. You can take a look at it. I'll pass on after this one. So this is real. So I'm not trying just doing a PowerPoint and all of the things. So there will be, hopefully, systems next year we can build around that and use it. So if you look at it, this is basically take the 1002 Z type of connectors, basically 4C connector, and extend it to 4C plus. So additional x16 connection has been added. PCI connection has been added. And those PCI connection or CXL connection, we expect it to go to Gen 6 and Gen 7 type of thing.
So this is the server front view. You can basically mix and match E3s as well as PMM2Ts and 1T. It depends upon the server configuration. So we want to basically coexist with the existing development, which has happened on the E3 side.
So this is the card form factor comparison. If you look at, these are the new form factors which we are basically introducing in between. The PCI will remain. The E3 will remain for their old thing. But the biggest advantage of this one is that you can use the standard RDIMMs. And it's a pluggable form factor, provided we basically write a software which will allow you to dynamically add and remove the memory and all other things. Hardware-wise, everything is designed for that.
With that, I just want to basically pass on to my colleague Kiran. He has fantastic use cases for this type of memory expansion. And he's going to talk about it.
Thanks. Thanks, Kiran. Yeah. I don't know if this-- this is working. Yeah, thanks, Anant. So let me move to a-- we have seen all the fancy CXL use cases with fabric and multiple hosts and multiple memory devices. So I have a lot of hope for the future. But to have something that we can adopt quickly, we already have the CXL memory expanders in the market. And we have the servers. What I'm showing is a very simplistic case for early adoption. We have the direct attached memory. Most CPUs are only supporting DDR5. There are still DDR4 DIMMs if you want to use them. Or if you want to just take the-- expand the capacity with DDR5, we can use the CXL memory expander, connect it directly to the server. We have a simple memory increment with the CXL.
So this is the first form factor where we are able to use the existing DIMMs and build a memory expansion. We tested this last year. And we were able to show that the performance to be on parity with some kernel transparent page placement. And with the edge connection, we can go to PCIe Gen 5. And that's about where it saturates. After that, we started looking at reducing it to pack more of the CXL expansion in front of the server.
So that brings to the next one, which we just powered on last week. So one change is we moved from the edge connection to an MCIO multi-track connection. With that, we are compatible with the DCMHS connectors. And by reducing the size, now we can pack more while retaining the four DIMMs in the CXL expansion. So these are a couple of form factors which enable people to start using CXL right away. And in due course, we'll have pooling and all that, which will enable even more savings. But this is an early savings that we can realize right now.
This is another form factor where we're able to pack two CXL ASICs next to each other and also add a storage device. Again, quickly adding CXL memory and realizing the savings that it brings. And again, same using MCIO connections, we can go up to gen 6 speeds. So in a way, future proofing.
Since CXL is so pervasive now at OCP, obviously all the OCP tenants are being met. But just to illustrate on the form factors, the PMM form factor is already at SNIA. And both that and the form factors I presented will be submitted. The specs will be submitted so others can leverage and build on that. In terms of efficiency as more memory, there are certain use cases where memory is the bottleneck, the capacity. So there, by adding the memory, we can realize a lot of savings and gain on efficiency just because you're not bringing up a whole new system just for a little bit of memory. Depending on the application, there could be huge DCO gains. For those who saw yesterday, especially the inference, we are seeing a huge bottleneck in the memory capacity. So by adding CXL, we can increase the capacity using the existing PCIe interfaces. In terms of scale, we are looking at deploying on a large scale. For meta, that would be anything deployed in a data center usually runs into pretty high volumes. From a sustainability perspective, as Samir introduced at the beginning, we are decoupling the CPU from the memory, which means CPUs can go at their own memory technology and the server systems can have their own memory technology by using CXL. What that means is all the memory that's available today, like the DDR4 DIMMs, they could be reused. Wherever there is excess memory, all of that could be put to reuse. So that leads to a lot of savings in terms of power and hardware.
I think that brings-- yes, in terms of call to action, form factors are the ones that enable that option. So we can collaborate on using the same or similar looking ones to have more leverage in the industry. And then the expansion board spec should be available, and the product should be available in 2024. And the information should be available on the CMS group. All right. I think we have some time for questions. Any questions?
Sorry, I'm not ready to ask a question, but I just wanted to let you know that Amphenol has a booth at A20. If you want to get a touch and feel of those connections, the PMM connectors, you can visit the booth at A20. Thank you.
Sure, go ahead.
I'm trying to figure out what problem is this solving that E3 2T CXL memory is not solving.
Yeah, so this is fundamentally from the reuse of the RDIMM is the first thing. Second thing is the capacity bandwidth here is what CXL is targeting. So you could do horizontally scale or vertically thing, correct? So those are two questions. E3 has been defined from the memory perspective, storage perspective, correct? The DRAM power is going big time, correct? So there's no question. So if you're talking about four to five different channels at 25 watts and above, E3 probably would be a fan is not a right choice is what my thinking is. It was defined more from the storage side. Of course, you could put the chips on the board. There's no question in my mind. But this is another form factor which can be used to reuse the RDIMMs, which is on the existing ecosystem.
Its motivation is to reuse the RDIMMs.
Yeah.
Also, your form factor, the one you showed with the gold fingers on the edge, seems to be different than you showed. So how many new form factors have you suggested here?
Yes. What Anant presented is more like a pluggable multipurpose module. What I was showing was not a pluggable one, but one that's built in. So they're not exactly compatible with each other, but showing the multiple options that are out there for adopters.
Right.
So let me add on top of that. Basically, if you look at the DCMHS connector from the cable perspective, all these form factors are designed with the panel mount from the cable side also. So you could actually use a DCMHS cable to connect to this form factor too. So it doesn't prevent you from that. So this is a more holistic approach, what we are talking about, to define a form factor for the future generation, rather than shoehorning the existing form factor to try to put the use case on that. I think that's probably a better answer for you, I think.
My concern is a new form factor is one thing, but five new form factors.
I personally don't think there are five different form factors. I mean, you would think that PCI is another form factor, but it's not a pluggable form factor. The need for the front pluggable form factor is definitely there today. I don't think anybody's shying away from that if it existed today. A lot of people's accelerators would be migrating towards that.
One more question.
What do you envision for mid-plane, backplane designs? If you look at DCMHS and MSIF, they call out blade-like type architectures. So I'm just trying to understand, because you showed some MCIO connectors. I think HTM is starting to define like Examax connectors. And so I just want to get your take on what that's going to look like in the next year.
You want to take that, Kiran?
Sure.
Yeah, so the question was-- is the question more how about using Examax-like connector?
It's kind of open-ended. It's not just the connector, but also backplane, logic, management. What do you envision there?
Yeah. So if you're looking from the backplane perspective, typically backplane is where the server connects to. And here we are looking at more like a mid-plane, where if it is a PMM module, it connects into the server front or the mid-plane. And if it is a cable version that I was showing, that would be positioned in front of the server. So that would be like a cable option. In terms of the management, if it is a direct attach, the BMC that is managing the server can also manage the memory modules. If it is like a pooled memory-- so typically a pooled system would also have like a management controller that is managing the switch and all the modules. So there typically would be a low-speed connection, like an I2C or a USB, to all the modules. Most of the controllers-- all the controllers have I2C, and some of them also have a USB. So we have two options to manage the CXL controllers. Thank you.