-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path94
50 lines (25 loc) · 20.8 KB
/
94
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
We have the dreaded right after lunch session. So I'm sure people will continue to trickle in. We'll continue the discussion that left off before lunch with respect to expanding connectivity for PCI interfaces. And specifically, the focus of this presentation is to do that with optical solutions. So I'm Sam Kocsis. I'm a director of standards and technology with the Amphenol Corporation. My co-author, Chris Cole, an advisor with Coherent. Chris doesn't usually concede the mic too often. So I took the opportunity to -- if any of you know Chris, I'll take the opportunity to present on behalf of both of us. But this has been an effort that we've been working for several months now, really since the beginning of 2023.
Today, we'll talk a little bit about the motivation for why we're looking to expand PCI links to support optical interfaces. Talk about some of the limitations of copper-based solutions, expanding on some of the presentations that happened this morning. Talk a little bit about the trends for interconnect in PCIe links, what's going on in PCI-SIG and other standards bodies. Trying to build on the momentum that Mohamed had talked about with the workstream that was started in OCP to extend PCI connectivity by releasing a requirements document. And if Open Compute Project is the shiny Ferrari, PCI-SIG is often looked at, I think, like the BMW. You know, it moves quite slowly. But it's absolutely integral if we're going to be developing PCIe solutions that we work with the PCI-SIG. And so Chris and I have found that to be one of the key cornerstones for the work that we've been doing. And we'll talk a little bit about where we are in that process as well. We'll also spend a few minutes just talking about the basic framework for how we will continue to move forward, establishing technical feasibility, enabling solutions that will hopefully lead to broad adoption for optics within PCIe. And then leave with a little bit of discussion on implementation considerations. There's been some form factors that were discussed before the break. In addition to that, I think it's important that we establish the use cases, the types of architectures, how that works with modular designs, and how that works with management, management of the modules, management from the host to the endpoints. And all that I think is important to enable a situation where people will adopt this in high volumes.
So to back up a little bit, and this is covered a little bit with what Mohamed had talked about in the requirements document for the OCP, extending connectivity for PCIe links. It's not just about PCIe, but it's about the application of these links. And really where PCIe comes in, it's largely at the physical layer. So we use terms like lanes and link to define the interfaces that we'll use. Oftentimes, the actual requirements for what those links need to appear to sort of achieve on the physical side is dependent upon the application. The CXL environment, very different than an NVMe express environment. I think the presentation from the TE folks earlier highlighted the major use cases for these types of solutions. It's kind of putting all of them together to really recognize that there is a swell of momentum to transition these solutions into PCIe interconnectivity that we haven't seen on the scale before.
So from a fundamentals perspective, a link can be established across a number of lanes. And generally, we try to -- this can happen within a common form factor. It is also possible and sometimes necessary to have form factors that are specific to specific lane widths. But generally, you can support anywhere from 1 to 16 lanes in a PCIe link. And those devices are going to be expected to support data rates all the way back to the beginning of the 1.0 base specification. So all the way down to 2.5 gigatransfers, up through where we see as a key intersection point at 6.0 and 64 gigatransfers per second. There really isn't an allowance to skip those rates. Backwards compatibility, you must be maintained all the way back to the slowest data rate. The other aspect to think about is the clocking. Generally, these systems were not meant to go outside of the box and connect things within a rack, rack to rack. So they were supporting things like common clock and independent ref clock, either way. And having to do that for external cabling in a copper sense has led to fairly large modules, fairly large form factors to carry all those sidebands across the signals. We're hoping to be able to streamline that. I think that's absolutely necessary if we're going to do something to support optical interfaces. Just as a frame of reference, as these slides will get posted on the bit error rate targets as you transition from the NRZ data rates to the PAM4 data rates at 6.0. And I think most importantly is the last bullet. As these links are really defined as end to end. But it's important to know that they could contain as much as two retimers. And I have a little bit of a picture to kind of graphically show how that link budget is apportioned, how it's interpreted in many different implementations. But that's going to be important as we try to optimize, one, the need for those retimers. And two, how those are going to play into the architectural decisions for form factor or device capabilities as we look to bring this out.
So from my experience with PCI-SIG, I also chair the cabling work group, which has been focusing on the mini SAS HD form factor. Some have interpreted what Chris and I are doing as a proposal to bring OSFP XD into the PCIe space. That's certainly a part of our proposal and what we're doing. But it's really not the only thing that we're doing. And my experience with mini SAS HD is we've had a spec within PCI-SIG that has enabled capabilities and compatibility with active cabling with optical solutions. But it hasn't really been standardized how those implementations are brought to market. And one of the reasons that we felt like it was important to take on this effort is because many have looked at that form factor and some of the challenges that go into implementing optical solutions and said it simply just won't work. It's two paddle cards. It's a press fit connector. You have a number of different challenges with a module that sits 50% outside of the chassis and cooling the capabilities of a module like that. And so this picture kind of shows exactly where PCIe interconnect fits into the system architectures today. What we see is sort of a trend to use the root complex or the CPU traversing through a baseboard to an add-in card. And it's that add-in card that generally you see the re-timers placed or connectivity start to extend outside the box. As we look at what possibly could be done as we transition to these optical links, these architectures may evolve. We expect them to evolve. And I think it's important that we certainly acknowledge the implementations and the topologies that have been used in the past. But also be open to evolving architectures that would hopefully lead us to something that is more energy efficient, is leading to lower TCO, and will ultimately help bring down the cost and the burden that we've seen in implementing these types of solutions in the past.
So one of the things that, you know, we spoke about very early on was that this disaggregation of moving the resources around in the rack, outside of racks, pooling things together, CXL memory pools, transitions away from traditional SAS storage architectures, connectivity of accelerators. I think the last two or three presenters have all kind of touched on this in some way or another. We obviously have to acknowledge that as well. And we looked at this as sort of a boom for the expanding capabilities of inter connect. So the picture on the right, you know, that is something that came out of the PCI connectivity specification that Mohamed talked about. We've used the picture on the left to sort of generically show exactly what we'll be connecting between and do some of the similar exercises to what Mohamed has done in terms of physical reach and what we're trying to enable as we go from rack to rack.
But it's also important to think about the devices and the systems that we're connecting to often aren't as simple as having a device in a module or having a device right outside of the module. There is a burden of designing the interconnect on the electrical portion to get to the optical transition. And these pictures here first kind of show the generations and the progressions of PCIe data rates, but also break down the budget for what goes into a PCIe link. And we look at the transition between even 5.0 and 6.0, the Nyquist frequency obviously stays the same, the budget slightly different as we transition from NRZ to PAM4. But the add-in card and the root complex, the package associated with these devices still creates quite a big burden on the total link budget. And I think as we look at moving forward into something that could support optics, we may have to revisit exactly where these allocations are defined so that we could significantly optimize the overall end to end implementation and try to standardize on form factors and topologies that would work for most if not all of these target applications.
So from an optical link perspective, you know, one of the key things that I think a number of people have talked about today is interoperability is key, the ability to transition between passive copper solutions as they're needed and active electrical solutions or active optical solutions. You don't want to be having a limitation architecturally that forces you down a path that forces your hand in terms of platform architectures. Sometimes that leads to a discussion of whether it would be retimed or un-retimed. As I mentioned, the PCIe links today allow for up to two retimers. But depending on where that retimer is placed, you would be then limited as to where you would want to place an additional retimer. For example, you wouldn't want a retimer on an add-in card and then another retimer on a module that you're plugging into that add-in card. It would be sort of a useless waste of those two interfaces. So as we think about how that relates to modular system architectural design, we will probably need to think about breaking these channels into topologies that can be reused into different platforms, into the CXL space as well as the NVMe space. And I think that's really going to help us move forward. I don't know. I show here in the picture, it's somewhat of a chip to module type architecture topology that has been used in Ethernet. That would be a large leap for what PCI-SIG has done in the past for sure. But I think somewhere in there we can compromise and start to look at where it best makes sense to put a retimer if it's necessary, what are the absolute limitations if we don't have a retimer, and try to take best advantage of that electrical to optical conversion to maximize the longer physical reach and hopefully lower power that those systems might offer.
So I mentioned the target BER is 1E to the minus 6 for this would be a 64 giga transfers, a Gen 6 so to speak channel. And so die to die you have 32 dB of electrical budget and from a BER perspective you would have 1E to the minus 6. The picture below kind of shows where you might put a retimer. It's not shown exactly to cover both of these cases here, but you could have a situation where your retimer is on the baseboard and then you transition through a pluggable module and have an electrical to optical conversion there inside of the module. You could put the retimer in the module. If the retimer is not on the baseboard it could also be on the add-in card. All of these architectural decisions are going to limit what you can do and sort of achieve in terms of implementation flexibility, transition between passive copper and optical solutions. But certainly if we look at that bit error rate, the addition of the retimer resets the bit error rate as well as the link budget. So if we add these retimers in the picture below, I have 1E to the minus 6 from the CPU to the first retimer, 1E to the minus 6 between the two retimers and 1E to the minus 6 from the retimer to the endpoint. So if you add all those up you're not going to get to an end-to-end BER of 1E to the minus 6. And I think it's flexibility or ambiguities in terms of the PCI and the PCI-SIG spec and the way it's implemented that allows for flexibility in terms of these implementations, but it isn't clear to everybody who is working on these solutions exactly how to best take advantage of that. So we've talked about this for a number of sessions now inside of PCI-SIG, inside of OCP and in the connectivity group that Mohamed had started. And what we found was what really needs to happen next and our call to action later on, but I'll talk a little bit about it here, is to now dig in and start doing some of the link evaluation work as we try to figure out exactly where retimers are needed, what we can actually do with this 1E to the minus 6 bit error rate. Is it even feasible? Because one of the things we don't want to do is go back to the PCI-SIG and say we would need to completely change or offer a completely different proposal for some of the protocol and some of the error correction mechanisms that they've put in place already for Gen 6. So that would be a last resort. I think the first step is really to figure out what we can do living within that budget and figuring out how best to take advantage of the different architectural possibilities.
Now obviously that also leads itself to talk a little bit about form factors. A couple of presentations ago we saw a slide very similar to this. I go back even further because of my history, I guess, with MiniSAS-HD. But if you look at MiniSAS-HD and CDFP, the two solutions side by side are very, very similar I would say. Two paddle cards, a press fit, although now both are offering a surface mount option. You have a number of side bands that may or may not be used. It's very difficult if you have a large number of side bands to make everybody happy in all types of implementations. So a very strong management interface is required to identify what those capabilities are and make sure that your endpoint is working well with your host. Not to say that it couldn't be done, but I think at this point as we try to look for maybe a better way, a more efficient way of doing things, it certainly makes sense to think about doing things with limited side band use, some in-band discovery mechanisms. And it's been very helpful to see the response within the PCI-SIG in terms of enabling support with the existing workgroups, the protocol workgroup, the philological workgroup, can look at these things and help offer us ways where we can take advantage of all that PCI-SIG has built without completely disrupting everything to enable optical solutions.
If we look at these form factors, so the three probably most commonly talked about is CDFP, QSFPDD, and OSFPXD, the slightly lower profile version with the writing heat sink. And we kind of look at this. This is in a PCI-SIG add-in card application. We show the different profiles. I think it's important to look at not only the volume consumed within the card space, within the faceplate, but also the volume of the module outside of the faceplate. If you have a very thick module that's protruding on both sides, it limits your ability to stack these add-in cards next to each other. It also blocks airflow channels if you're trying to implement better cooling mechanisms for these modules. And so it's important to think about all of this as we think about how we want to connect from these devices and these boxes as they connect to each other inside of a rack and rack to rack.
Some of that's going to lead itself to different form factors. OCP has created a number of different NIC form factors, and they've done a great job of putting that out there for everybody to take advantage of. And I think we expect to see that continue to evolve. I know there's a number of discussions now within OCP NIC to make a wider form factor. And some of that's going to, I think, help lend itself to these larger ports, larger modules that would go in to support greater connectivity.
So I'll go through this rather quickly. These are statistics used to kind of compare the form factors against one another. And I think at this point, both Chris and I are in agreement that we're not specific to any one form factor. Everybody will have a chance to sort of make their case for their form factor or why it makes sense to keep that implementation on the table. But it's really important that we think about this architectural paradigm that we're in. As we think about front panel pluggable solutions, there's obviously a big push to go to something more of a co-package or near package optic solution. And some of that may actually lend itself to being able to do things even more readily available without retimers. And so I think as we look at where that add-in card sits in the architectures today, it makes a lot of sense to continue with the front panel pluggable options. But as we look into the future, as we see these sort of shifts happening at the protocol level, at the phylogical level within PCI-SIG, some of that we're hopeful will lend itself to supporting co-package optics as well. And on top of that, we've done a lot to sort of work within PCI-SIG in the existing form factors, making way for these new solutions to be compatible with CMIS. CMIS has taken off, and the OIF is doing a great job managing that specification, and I think that will really help things in terms of deployment as there's sort of a common framework and format for these modules to identify themselves, what their capabilities are, and go through the solution space of interacting with the different boxes within the deployments.
So Chris and I had really hoped, or I should say we thought at this point when we were going to come talk here that we would be asking for support to come to PCI-SIG and make this a formal workgroup. A mere four to five weeks after we started this effort, the PCI-SIG said that they would create the workgroup. So the optical workgroup has started. It's about a month old. There's going to be an optical sub-team that Chris and I will lead to dig through some of these details to focus on the link analysis, the link evaluation, and we'll present those solutions to the larger optical workgroup. So the meetings will go kind of back and forth. If you have any questions or are interested to participate in the PCI-SIG activities, certainly you can reach out to me or Chris. The PCI Extended Connectivity Workgroup is going to continue to operate within OCP as well. And I think it's important that we use all of the discussions that are happening in these groups. There's a number of energy efficient interface discussions happening in a different wing of this building this week. A lot of that work and those discussions are built on the framework and the discussions coming out of OIF. All of these types of activities I think are interrelated and I think the communication and the call to action is really to get involved and bring your feedback and implementation perspectives, not only to OCP but also the other groups that are working on this. And as I mentioned, I think OCP is the most flexible and sort of adaptable to take a solution and carry it over the line. So we're very encouraged to continue working within OCP to help bring these solutions to market. So with that, any questions or feedback?
Hey Sam, a great presentation. One thing I wanted to clarify here for the larger audience is we are not talking pluggable optics here. Optical link is a bookended link, at least the way it's being discussed in the PCI-SIG. You agree, right?
Yeah, exactly. The key thing is to make the protocol for PCIe more optical friendly, whether that manifests itself as pluggable solutions or co-packaged solutions, I think is very much.
Let me--
No, I think the clarification--
I think I know what you mean, which is we're not going to write an optical specification. We're not going to expose the optical specification. It's an AOC model.
Yeah, we're going to specify an electrical interface specification with these three segments, and the optical implementation is left to the implementer's choice. Just wanted to clarify.
Yeah, and I think the board was very clear. They don't want to preclude any optical solution. So whether it's VCSEL, silicon photonics, plasmonics, whatever is out there, as long as you meet the end to end specs, that's fine. But there will be no optical spec on this iteration. By the way, that doesn't mean you can't have a pluggable module with fiber. It just means you're at your own. We're not going to help you with that. You can go do that.
Yeah, it has to be on both ends, the same--
That's right. It's the guy that's--
We're not talking interoperable optics at the optics level. That's the key.
That's exactly right. Yeah. Thank you. That's a very important point.