-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path127
42 lines (21 loc) · 15.6 KB
/
127
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Hi, everyone. I'm here to talk to you today about how we are enabling multi-vendor interoperability at 200 gigabits per second, driving towards 1.6T optics in the data center. I'm Michael Klempa from AlphaWave Semi. I'm in product marketing, and I'm active in forums like IEEE and in the OIF, where I am the PLL interoperability working group chair. AlphaWave Semi has a strong and growing presence in these groups to drive, define, and implement leading-edge technology like 200 gig links. And the forums are doing everything they can to bring 200 gig to market as quickly as possible, because we all recognize the need for optical connectivity in use cases like AI clusters, especially as data rate demands continue to increase.
So in this presentation, I'll touch on modulation formats, error correction schemes, and the bandwidth requirements of the applications which are being defined to bring in the era of 1.6T ethernet connectivity. I also want to talk to the potential generations of optical technologies, from pluggable optics to near-packaged to ultimately co-packaged optics, because as data rates increase, we are moving towards more optical I/O. And finally, since it's fresh on my mind, I also want to share my experiences of multi-vendor interoperability from ECOC 2023 a couple weeks ago. So the focus of this presentation is to give a snapshot of where we are in the evolution towards 1.6T interoperability and where we can go. There are a lot of new tools I wanted to get people aware of coming out of the standards bodies, whether your focus is low latency, low power, or high performance. And 200 gig isn't as far away as some might think. And I want to show you why and leave you with how to get involved and any more information you might need.
So a brief history of interoperability. I don't think I need to convince anyone here on the needs and benefits of open standards, but I just want to remind everyone who aren't working directly on them that interoperability is not easy. It requires robust definitions of every aspect of the link. Trust me, I know I am the chair of the OIF's interoperability group and used to run a lot of ethernet plug fests at the UNH IOL. So I have seen the good, the bad, and the ugly. And the story of ethernet has always been you just plug it in and it works. And it has that aura of simplicity to it. But in reality, a lot of work goes into making it that simple. And that simplicity wasn't always the case. So at the University of New Hampshire Interoperability Lab, it started because two companies developed products of the same standard, just different drafts of the spec. And so when they tried to connect them, nothing happened. And so that lesson is it's fundamental. We are following the same set of requirements to make integration and operation at scale happen and cost effectively. So all this effort creates robust products and system performance and a healthy multi-source supply chain. Ethernet has provided this for 50 years. And the OIF just celebrated its 25th anniversary as well. So two very healthy ecosystems. And these groups are also defining software and management interfaces to control the hardware, like Nathan mentioned earlier. And just as a side note, I am a byproduct of the UNH IOL, where they nurture, test, and document interoperability across many different technologies. It's great for the industry, but also great for developing the next generation of engineers and getting them active in communities like the OCP or the OIF or IEEE. So support it if you can.
And so evolution towards 1.6T technologies. In regards to the switch ASIC, that's aggregating 50 or 100 terabits of I/O and driving these 1.6T optics. We have been at the radical dye limit for more than five years. Before that, we just increased ASIC size to add more capability to get more bandwidth. But more dye area means more pins, more integrated functions. And that's just a recipe for ballooning product costs. So now the main way to keep increasing total I/O is to just go faster.
But one service core is not able to efficiently cover multiple applications. So from the shortest data dye reaches to the longer chip to chip reaches. For short reach applications, like the switch ASIC to optics, simpler and lower power equalizations are desired. This is typically because the modules just retime the signal inside before the optoelectronic conversion. And so there are some strategies being deployed to increase that reach at 200 gig, whether it's cabled hosts or retimers or et cetera, to limit that complexity on both sides of the channel. Multiple forums, the IEEE and OIF, are pushing towards the evolution towards 1.6T. I've tried to demystify that acronym soup that are these standards with the application spaces they're trying to solve in this diagram, whether it's tip to module or VSR or chip to chip or MR. And so both groups' goals are broad market interoperability and have proven successful in the past. There are other projects defining 200 gig, but I'm focusing here on those that will enable 1.6T optics on the line side.
So both the IEEE and the OIF are in accelerated timelines to deliver interoperable solutions to the marketplace. The pace is something like 20% quicker than the 100 gig projects. And both groups are working on an explicit 1.6T optical technology, coherent IMDD, respectively. But there are plenty more ways to achieve 1.6T, like scaling out other 800 gig technologies. The optical objectives in these groups are addressing different reaches, with ZR being in the 60 kilometer range, but not nailed down yet, and DR being up to 2 kilometers, which should address the most pressing applications first. There will likely be follow on applications, evolving those 800 gig optical technologies, like FR over multiple wavelengths, or LR1, which is coherent light. But that remains to be seen. And what's evident is that at the increasing data rates, coherent becomes a solution that provides greater margin than other applications, and will permeate into the data center sooner than later.
So for modulation formats at 200 gig, there was a question mark as to which PAM modulation scheme the IMDD projects would adopt. Should they stay PAM4, or make the jump to PAM6 or 8, or something else? But the forums have largely decided to continue with PAM4. And advanced modulation formats, like QAM16, enable higher data rates with improved spectral efficiency. So this is what's used in coherent optics. And the success of 400 ZR has built up a terrific amount of excitement for 800 ZR, and then 1600 ZR. And we are figuring out ways to make and standardize cost-effective coherent optics because of the spectral efficiency and more forgiving SNR operation.
Error correction schemes for 200 gig, so doubling the data rate from 100 gig to 200 gig, degrade signal integrity, resulting in less operating margin and a worse prefect BER. And despite the progresses being made on improving channel materials, transceiver architectures, designs, and system-level innovations, these strong effects are needed to loosen the prefect BER requirements for 200 gig. Standard forms are adopted at concatenated effect to protect the OWI from the optics BER budget at little cost to latency or overhead. That includes a strong outer effect to correct burst errors. And a simple binary code is used as the inner code to provide another layer of protection against random bit errors. So the inner code, which is closest to the channel, is well-tuned for the channel. And the outer code cleans up all those errors left over by the inner code. And the concatenated effect extends the low complexity for next-gen 200 gig per lambda optics because we don't need to terminate that host effect or regenerate it inside the module. And also, just figured I should mention, to support a lower latency operation at 200 gig, an operating mode with no inner effect is being currently explored in IEEE.
And so bandwidth requirements for these chipped module channels. So we need to deliver that 50 or 100 terabits of aggregated I/O off the ASIC. And the first and last 12 inches of that link are increasingly eating up more of that link budget, driven by package losses and interconnect losses. This has created a pivot to optics to enable that next generation of bandwidth because with more link budget going to the OWI, the more it's chewing away from the channel, which has typically been passive copper cables. This is creating a recipe for a new outlook on I/O and on interoperability, where companies are investigating new ways to get compute or I/O or memory out of the system into the network. And the increase in loss and noise means the SerDes needs to get more complex to continue delivering a quality signal with enough DSP.
So how complex does that DSP need to be? Here are some sample contributed I/O channels to the forums. They range in losses from 7 dB all the way down to about 20 dB at Nyquist. And so as you can see, there are diminishing returns in achieved SNR and BER when you take a modeled 112 gig SerDes analog front end and iteratively add more bandwidth to send a 200 gig signal. There are obvious improvements at first, but after about 60% increase in the bandwidth from the 112 gig SerDes, we see little improvement in SNR and BER. So that just means that making the jump to 200 gig might not be as big of a hurdle as some might think. It's not just a double. But this study assumes extensive DSP in the receiver, dozens of taps of FFE combined with the DFE. So it's more about co-optimizations in the entire system rather than just throwing more bandwidth at the problem. And we have other complementary ways of eking out more margin to get where we can successfully run ethernet and enable interoperability.
And one of those enablers is link training. So Nathan already talked about CMIS earlier. It came under the stewardship of the OIF in 2022. A mechanism to define link training using CMIS started in the OIF as the 112 gig standards were wrapping up. And the group is working on towards adoption of that baseline for link training at 112 gig. But it also appears that the use of link training will be prevalent for 224 gig or 200 gig. But the path of in-band versus out-of-band or how they might live together in harmony will be a topic of debate for a while. The concept of link training isn't new and offers a method to tune complex SerDes dealing with unique channel topologies, which should simplify bring up and characterization while addressing variations occurring in the field. Previously, it's just been an end-to-end link optimization where the whole system was time invariant. But now each segment might need to undergo link training separately. And in the short term, there's a lot of work to be done. But ultimately, this should simplify the path towards multi-vendor interoperability, which I think is everyone's here's goal.
And the other enabler is MLSD. Progress in DSP transceivers are critical in this evolution to 200 gig, including new features like link training and MLSD. Designers want to leverage these features to squeeze every ounce of margin available when making the jump from the lab into the field. MLSD is particularly useful in scenarios where high noise levels or impairments in the communication channel make it challenging to accurately detect transmitted symbols. And the gains can be magnitudes of BER. As noted on the OCP show floor walking around here yesterday, a lot of people are enabling MLSD.
And so some potential generations of optics at 1.6T. There are challenges and trade-offs for optics in high-speed data transmission. Plugable optics are a known quantity, and there will certainly be a space for them. They are easily interoperable into a system. And with standardized compliance, they can be interoperable with other pluggables as well. So they have a strong ecosystem already in place. And the DSP within these optics might be good enough to overcome the electrical challenges associated with the link, extending the value of pluggables in the industry. And we might be able to get there with advanced interconnect strategies, like flyover cables or re-timers. But as speeds increase, it becomes more difficult to equalize the electrical link between the ASIC and the pluggable module.
So to address this, there is a desire to move the optics closer to the ASIC to simplify the electrical link, since increasing the optical length by millimeters is negligible. And that middle ground is NPO, which is a compromise between the advantages of co-packaged and pluggable optics. But it has its own challenges and a narrow needle to thread with the inertia of pluggables in the ecosystem and the momentum of chiplets and co-packaged optics. And we've actually already seen near-package interoperability at OFC 2023, where we had examples of TE and Amphenol 3.2T connectors driven by an Alpha Wave SerDes over two meters of copper cable with solid BER. And so you can just extend that to optics in the future. And obviously, the holy grail is co-packaged optics, where the optoelectronic conversion happens inside the ASIC package.
But on large ASICs, there will be challenges with heat dissipation and reliability. So to address this, we need improved packaging techniques to improve yields and make CPL mainstream. Enter chiplets. So chiplets can enable common and new connectivity use cases. UCIe is one attempt to standardize the footprint to democratize that concept. And there's tremendous aggregate I/O bandwidth required in current and future ASICs. And there's a limit to how many wires we can stick around the perimeter of a chip. So transmissions are getting faster. Package losses and crosstalk are becoming severe challenges at 200 gig. And integrating miniature optical engines inside the package avoids the need for high-speed electrical I/O off the package. These days, we're seeing monolithic dies use over 50% of the area just for I/O. So chiplets will enable us to let the ASIC do what it does best, with either dense compute or memory or whatever, and move that I/O onto a separate die to move that compute on and off the chip, which could certainly be optical 1.6T. And Alpha Wave Semi already has taped out a 3-nominal test chip enabling UCIe, which could be used for this optical application. And this should improve yields with smaller die sizes, save on power with the improved efficiency, and improve time to market by basically giving a canvas of a known good functionality of the chiplet to someone to put their layer of innovation on top.
OK, sorry, I'll blow through this. So we get interoperability through plug fests and interop demonstrations, like the OIF and Ethernet Alliance do at OFC and ECOC.
Sorry, I have to skip this to get to the big point. And at the OIF, we just demonstrated the world's first multi-vendor 224 gig interoperability. This is two SerDes IP vendors' platforms communicating over a VSR channel. So this would represent something like a switch ASIC to a front panel pluggable. And it's the first inkling of enabling ecosystem for 1.6T optics in the data center. So look for more to come as more companies boot up their capabilities.
And here's my wrap-up. You can check it out later.
And if you want to hunt me down for any more questions or get involved in the OIF or the Ethernet Alliance or IEEE or have questions on 200 gig SerDes, just reach out to me. Thank you very much.
All right. Thank you, Michael. Just one quick question for me. As you develop 224 gig SerDes, do you anticipate that they will work for LPO? Are you making any adjustments to increase chances for LPO to work at 200 gig?
I haven't looked at that yet. But I know leveraging our LR SerDes at 100 gig could enable LPO. So I would assume just continuing down that train of making a very robust, long-reach SerDes for 200 gig. And I think it depends on, really, the optics to enable linear at 200 gig.