-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path95
42 lines (22 loc) · 11.9 KB
/
95
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Hello, thank you very much. This is Mohamad El-Batal. I'm with Seagate, but I'm also chairing the OCP extended, PCIe extended connectivity work stream within the server group. So as Brian and others showed, there are a lot of competing connectors out there that are still in play. The OCP did not try to specify a connector. Our task was to put the requirements for those connectors and external or extended connectivity. We specifically use the word extended as to also include internal connectors within OCP enclosures.
So what we looked at is the ecosystem requirements for Gen 5 and Gen 6. We also will extend the work that we're doing within the team to look at Gen 7 as well as time goes on. But at this point, the look, the view around the timeframe of pooling of memory is the start of it will start in 2024, potentially in the industry. And we will see quite a bit of adoption by 2026 timeframe.
So again, we looked at within the work stream, our task was to look at the various potential pooled solutions with CXL as well as NVMe. The connectivity will have a PCI physical layer underneath both of those protocols to connect between compute nodes, GPU nodes for acceleration, memory, pooling, appliances that will start showing up in the data center, as well as networking connectivity and storage. So all of that encompassing, ideally, we would like to see the same connectors, the same type of cables used to connect the physical layer of PCIe across all those protocols.
What we did not want to do, again, is specify a specific connector because the industry will have to prove itself that it will meet the specifications of OCP. It's not about, you know, we're not about to pick one yet just because it's hard to predict what can be done better. At the same time, we're looking for the best TCO in the market, leveraging economy of scale and hoping that the right solution will present itself through the help of the SIG as well as the SNIA SFF.
So the critical aspects, the ask, the premise that we started from is that we asked the SIG to drive an optical solution as well as a copper solution, which they are already driving. There is a group within the SIG who started working on the optical specification requirements without impacting the existing PCIe layer significantly. So the hope is that for Gen 5 and Gen 6, there will be no impact. Maybe Gen 7 could improve certain things if needed. We looked at 19-inch racks and 21-inch racks specifically to satisfy some of the enterprise situations. We looked at stacking within the rack and through an adjacent rack connectivity. So not only within the rack but also an adjacent rack. And we looked at latency and power requirements that are restrictive to avoid having to fix the problem later. So if we put out the requirements correctly, hopefully we don't have to go back and try to reduce the power and latency later. So with that in mind, with PCIe Gen 5 and Gen 6, we started with about 180 to 200 nanosecond limit. That's from root port all the way to endpoint and back. So that's a round-trip latency. And that 200 nanosecond is what is typically understood to be acceptable from a NUMA node latency adder expectation. So with that, looking at what's needed as the components to those, to that latency, you've got the root port latency round-trip, you've got the endpoint latency round-trip, you've got the cable time of flight, speed of light, about 5 nanoseconds per direction per meter. And then you have the FEC in the time frame of PAM4. We're going to start having to add FECs so that there's an added latency. So we believe that within PCIe Gen 5, if we don't add retimers at this time, we can actually meet the 200 nanosecond given the available components that could be made available. That basically puts us in the region of 200 nanoseconds with Gen 5. With Gen 6, we're going to exceed it a little bit. So is that OK? We're asking the CPU guys to keep improving their latency coming out of the CPU, the endpoints to be reduced. You can't reduce the time of flight. We came up with the distance of 7 meters is the minimum we're going to need to go between racks. And that's really came in from about getting an inquiry from about 15 different main companies who are going to use these external cables. And we kind of came up with the analysis that led us to about 7 meters would be the minimum for CXL and NVMe at this point. So optical will play a role in that because trying to do more than 3 to 4 meters as Molex just showed and Brian, is going to be hard. So we're going to need active electrical with retimers. We need active optical with optics built into the connector.
Why are we looking at the rack details in this picture? There are some aspects of the rack in terms of where the front panel pluggable is in the near end next to the rack cable routes or in the middle. So we did quite a bit of analysis to come up with the various enclosure sizes of servers and potential memory appliance blades and potential storage systems and arms for the storage system to be extended. So all that was taken into account for both NVMe and CXL.
And we're focused on CXL today and as you look at within the rack expansion or across racks, the calculations that we came up with, again we came up with storage needing about 10 meter maximum including the actuating arm that allows you to extend the storage out and 7 meters for memory appliances.
So the team's effort concluded that 7 meter would be a good target for CXL. At this point extending with different width of the cable was also looked at in terms of DAC, AEC, and DOC, direct attach cable or connectivity or active electrical connectivity and then active optical or direct optical depending on where your optics are. You would use either a passive optical just simple connection or an active where the optics are within the connector.
So we did quite a bit of analysis, a channel analysis on that adding the loss parameters, the jitter parameters, all the details and we did worst case scenario. As I was asking Brian, you know, did it include this, did it include that? So in a DAC we cannot make the assumption that always the retimer is there. There's always the possibility of somebody wanting to save money and not add retimers and go all the way from the root port to the end point and that adds quite a bit of loss on both ends.
So what we did is worst case and we came up with, you know, one meter is comfortable in the analysis and two meters with DAC is pushing it. So we did not want to put a very aggressive target for DAC. We said one to two meters but one meter was what we've shown to be under worst case conditions doable. And for, at least for PCIe 6.0 and 5.0.
The type of connectors in terms of x4, you know, one x4, one x8, one x16 and then being able to bifurcate to different type of ends on both ends of the connector.
So you can have a x16 bifurcating to four x4 or two x8.
So we added all those in the specification that were in the requirements document that we actually contributed last August.
We also looked at, you know, putting the retimer outside of the connector which is in the case with the CDFP for example, you know, the retimers have need a heat sink, need additional space, need a single board to put on it with CDFP has two boards so it's kind of problematic for us to put the retimer or the optics within the connector.
But we looked at other form factors that could potentially house the retimer or the optics within the connector itself and came up with, you know, three to four, potentially we can push it to four meters in the analysis but we kept it at three meters right now as the limit of a cable that has retimers on both included within the casing of the connector. So that's a theoretical, you know, assumption that all the cable parameters are worst case, the board parameters are worst case, and we're adding up the loss parameters to come up with those limits.
And then we came up with also AEC active electrical solutions that could have distributed connections like a fan out connection where you have a x16 to 4 x4 and et cetera in that requirement.
Then we looked at direct optical connectivity where you have either co-packaged optics within the root port and the end point or you have basically near package optics where somebody has an optical, a copper to optics converter that is built into a chip that sits very close to the connector, not inside the connectivity. So you can also use standard available DOC connectors like SN or MPO connectors and we have some examples in the requirements as, you know, hoping that those manufacturers of those cables and connectors will come in and participate in the specification within the SIG effort around optical connectivity of PCI. So again, this one is an example of an MPO connector. The beauty of this solution is you can actually have a parallel interface. These connectors today support anywhere from 12 to 24 fibers. So if you have a near package or co-package optics with 64 lanes coming out of that root port of PCIe Gen 6 and you want to reduce the number of cables, you can actually put that copper to optical conversion chip and take the 64 lanes of copper PCIe, change them into the optical world. You can create multi-wavelength fiber connections. So each fiber connection today the industry showed examples of eight wavelength per single fibers. So now you need eight lanes, eight actual fiberglass connections that you can actually run eight wavelength and end up with a single pluggable MPO connector that has eight fiber bundled together for 64 lanes of Gen 6. So you can actually reduce the number of plugs dramatically. Now there's a cost associated with those chips there, but if you add it to the number of passive cables that you would need to run 64 lanes, they make a lot of sense. So a lot of these things we tried to put in the specification to look what is the best TCO equation that we can actually come up with.
And then from there we also looked at the AOC where somebody puts the active optical device within the connector because if you put it outside, you change the equation. You can no longer have the optics cost not bundled in to the end point or to the root port and it becomes part of the cable. It's a trade-off, but at this point we have both options in the requirements. Similarly we had the fan out included in the requirements.
Again, ongoing changes. The current CDFP is still the official SIG connector for copper, but again for optical it requires everything to be outside. For active electrical it needs to be outside. So it basically, the requirements that we put out there is we need to look for a connectorization solution that serves all three modes, DAC, AEC, and we want the AOC as well. So it's kind of a good solution for today.
We need to look at other solutions out there like QSFP, QSFPDD. I agree with Brian's comments. They're long. They're bulky, but they're in production. They're being used in Ethernet and produced significantly for 800 and 1.6 terabit. My hope is that we can actually get to the point where we can unify on a common infrastructure that meets the requirements for both optical and copper, passive and active.
We've done some analysis within the work stream just to see how big that market is. Well, if you really look at storage and memory pooling, you could be in the billion dollar a year potential revenue, potential market size for cables. So we're talking about a large market. We encourage people to look into how can we make this more amenable to the OCP community and to the overall enterprise.
Call to action is please join the group if you haven't joined already. The work stream is still ongoing. We've already contributed version 0.7 of the specification. We are working on releasing the 1.0 spec, but we're waiting for everybody's input. We've given it three months since we actually released the 0.7, and we're hoping that we can actually get input from everybody before we're done. Everybody within the OCP community who has a need for pooled connectivity of CXL, please follow the link and add to our requirements. Any questions? Thank you very much.