-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path239
28 lines (14 loc) · 14.2 KB
/
239
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
All right, good morning everyone. We are officially live now and I was told that we have to stay strict on schedule So we're gonna get started I See many familiar faces, but in case you don't know me and Vlad Kozlov I'm founder and chief analyst at Light Counting market research company Light Counting is celebrating 20 years in business today.
And we build our reputation on providing the data with Information On what the market is today in forecast for the next five years With an objective to bring more transparency to the market. And I think our second objective is to help new technologies to get into the market and you know, we we're hosting this session with Many startups presenting their work Addressing some of the bottlenecks in the industry as we see.
So this is one of many charts that we publish. An update almost every three months now. So we track sales of Ethernet optical transceivers in this chart kind of separates demand between AI clusters. and non AI markets and non AI markets include Legacy segments such as enterprise networks, Telecom networks, you know, if you if you look back 10 15 years ago Ethernet was mostly used in enterprise networks. Telecom was the second largest market and then in 2010 Google started. Well, actually they started 2007. but 2010 as the volumes really picked up for for deployments of Ethernet transceivers in In cloud data centers. And this market has grown quite rapidly over, you know since 2010. If you look at carefully 2023 is a market for non AI application including compute nodes nodes and cloud data centers actually declined. So this AI boom Came just in time to save the market from a decline last year and this year. we expect 40% growth Our forecast tend to be on a conservative side. So we we hesitant to project strong growth for more than a year or two. And mostly because we've seen that over the last 20 years that market segments tend to grow fast for you know, two years maybe three years at most and then there's usually some kind of a slowdown. Which Is very hard to predict. but we are hesitant to Publish exponential growth because we just know from experience it rarely happens.
The next chart is going to look specifically at demand for AI clusters. So we kind of zoom in on this red category shown here. And we also look at number of units of transceivers instead of sales. So you can see is that last year about 5 million Transceivers Ethernet transceivers were deployed in AI clusters. And this year we expect about 11 11 million units and majority of this are 800 gig devices. As you can tell from the chart and Nvida became almost overnight became the largest consumer of optics and you know, they don't necessarily the end user of optics, but a Lot of optics is going through their hands and their largest customers like Microsoft. Prefer to deploy fully equipped systems from Nvidia including the optics. Not to mention the software as well. So Nvidia manufactures optics At the contract manufacturer fabric net, but when it comes to really high volumes they engage suppliers So they work within a light coherent. They are qualifying other suppliers for the next generation 1.6 terabit transceivers. But it's not the only option for the end-users so companies like meta for example prefer to buy optics as themselves. And Drew me our next presentation is from meta and drew may tell us more about it. But our understanding is that meta Buys a lot of GPUs from Nvidia, but they build their own switches a by optics directly and in the future with things that More customers will follow that direction. So the you know the contribution of Nvidia in terms of Selling optics to the end users will actually decline. I mean they they'll probably remain dominant in in sales of GPUs. But the other story is that you know, the gross is Very high and as I said this forecast may be conservative. And I'll explain why? on the next slide.
Well, first of all, maybe stepping back why why there's why the demand for optics is so significant. So what Nvidia did two years ago, they announced In additional networks, they call it a GPU to GPU fabric. And their plans to deploy optics for this network. So The blue network on the top of the racks was always there. So they call it data center network and it relies on InfiniBand. Until the last generation most of the optics were active optical cables. With most recent generations a transition to optical transceivers, which you know gave a boost of transceiver market. But more importantly they started using a lot of optics in this yellow network in the middle of the rack. So this is GPU to GPU fabrics So that's at this point. They're still using InfiniBand. So InfiniBand goes through NICs and every GPU has I think one 400 gig port. They use 800 gig transceivers because they can combine two ports in in In an 800 to 400 gig ports in an 800 gig transceiver. But the most interesting part at least for me of announcements of Nvidia from two years ago. Was that they're planning to transition from InfiniBand to NVLink for optical connectivity. And once that happens the demand for bandwidths could increase quite sharply. Quite sharply meaning by a factor of 10. So if you look if you look at this chart there is even in the Not this is not the latest generation of GPUs. That's from two years ago. Each GPU had 3.6 terabit of bandwidth. And once NVLink is going to become the protocol for this fabric. They will have to use all of that bandwidth to connect GPUs with NVLink. So we're moving from 400 gig per port With InfiniBand to 3.6 terabit per port with NVLink. For for this generation of GPUs Nvidia build a cluster with NVLink over fiber internally. But they haven't made it available to to the end users. And My latest conversations with Nvidia is that they still resolving issues. how to Enable NVLink over fiber connection. It was never designed To work over fiber. So they still working on it. It's possible that as early as next year we may see first deployments of NVLink over fiber. Not sure if it's going to be 800 gig transceivers. Most likely it's going to be 1.6 terabit transceivers. And you know this type of disruptions are very hard to predict. So our forecast is Probably not Not accounting for for this additional bandwidth coming. But it is it is coming. The question is what we don't know is when. and how quickly the market is going to migrate to it.
But even you know, even with the current forecast demand for Ethernet transceivers is is growing very Rapidly. And it's shown on the right side of the charts. The left side shows Copper cables and copper is not going away. I mean, in fact Nvidia is using even more copper now in their latest systems. Because there's servers now that used to be a box. Now those servers are full rack. So whatever copper connections we had on PCB boards migrated into copper cables. So again, I mean this this chart that shows very rapid growth does not account for the type of cables that Nvidia is using in their racks In their latest generation. Because the numbers there are quite quite staggering . So we'll still have to figure out. You know, how do we? How do we start accounting for? For the latest designs of Nvidia servers.
So if you step back and look at applications of optical connectivity and copper connectivity in different parts of the network. For InfiniBand and Ethernet optics already accounts for about 75% of ports. Because this connections are typically between the racks or even between the rows. If you look at NVLink connections at the moment all of its its copper. The only reason we have a little bit of optics in Is the next in the next row of? The circles is because Google Is using optics for direct connectivity between TPUs. As they call it inter core interconnects or ICI which is kind of similar to NVLink in terms of Functionality, but it's it's a different protocol. In case of Google's actually designed it to work over optics from day one because they also use optical switches to to enable very large TPU clusters. Our expectation is that two to three years from now Maybe even 2025 and we link is going to start running over optics and then we're going to see more optics and in this category. There's some mostly kind of shorter shorter reach type of connections. But You know anything between the racks Is very likely to be optics connectivity was in rack. I think it's gonna stay copper. And then the last set of circles shows connections to off package memory. And there is a bottleneck in terms of how much memory GPUs need so there's You know designers are trying to put as much HBM Or high bandwidth memory as possible next to GPUs, but it's still not enough. So the company is developing optical connectivities that could add more HBM to to GPUs. So other types of memory Including you know CXL type of connections that are being developed. One common barrier for deployments of optics in connections to memory. And it's actually an issue across the board is reducing power consumption. So current designs of optical pluggable optical transceivers are about 10 15 picojoules. With linear drive optics whether it's pluggable or core packaged we can get to about half of that. But what designers really would like to have is something on the order of one picojoules or less. Bbecause that's where kind of electrical interconnects are today. And We don't yet see technologies that could deliver this type of power efficiencies. So as part of call for action. We asking the community to come up with new ideas To see if if it's even possible to get to this level of power efficiencies. Another very important issue for for the end users is Reliability of optics. And it's always been an issue but for AI clusters, it's even more important. Because Yeah, AI workloads really don't like any kind of disruption and when when a transceiver fails it could really slow down the model and create a lot of headaches for the operators. In case of Google's a I mean that's one of the reasons they use optical switches because it helps them to to reconfigure the system whether it's a TPU that fails Or optics that fails they can still reconfigure the system and keep it running bypassing problematic node. But ultimately better reliability and better lasers are needed. So we see a lot of progress in quantum dot lasers They only three suppliers at the moment. All of them are small or mid-sized companies. But quantum dot lasers have significantly better reliability and much better performance at high temperatures. So we we do see direction for for improved reliability in the future. It's not gonna happen overnight I think you know first time we've seen transceivers with quantum dot lasers Was that this year or OFC? Typically it takes about two years from OFC demos to real deployments. And then it's a question of building up enough capacity for this quantum dot fabrication.
Another way to look at the transceiver market in this case, we look at all optical transceivers and not just the internet we look at DWDM transceivers used in long-haul metro networks. We use it. We look at fiber home transceivers used in wireless front-haul. So this is total transceiver market as we call it and it's split by different technologies specifically looking at optical modulator technologies. So silicon photonics was a big story over last decade and we expect it to continue to make progress. In fact better reliability of silicon photonics modulators was a key for success of this technology. And it's it's certainly more compatible with co-packaged optics. And even linear drive optics seems to work better with silicon photonics modulators. But we also see other technologies such as thin film lithium niobate. So that's kind of shown in purple on top. And bulk lithium niobate has been Kind of the workhorse of the industry of the last three if not four decades. But now companies Learn how to make thin film Lithium niobate on silicon photonics platform and this modulators are faster. They're more transparent. So we certainly expect to see them come come to the market.
So I think you know as a conclusion. You know, this is just a kickoff of the session. I Think the problems that we're facing Will have to be solved by Many small steps. I don't see a big breakthroughs That's gonna come and change change completely the industry. But it's a small steps like new materials. New structures like quantum dots new device designs. I think we have to look at it's a whole variety of solutions to be able to get to the metrics that the end users are asking for.
Thank you, and I'll take a I'll take a question of two if you have any. Well, if you don't you do have questions, all right, go ahead.
I Don't hold your breath on it. It takes time. It takes time, you know, I I'll point out to this slide. I think I remember. I think it was 2012 when Cisco. Cisco acquired Silicon Photonics startups. And some of the analysts predicted that okay three years from now everything is going to be silicon photonics finish. Our is going to be out of business. And that was 2012 in 2022 10 years later Yes, silicon photonics is part of the market. But still about 20% of the total So it takes a decade. If you think it's gonna take two three years It's gonna most likely is gonna take a decade. So we have to be very patient how to do it. I think what what also matters is decisions by the largest customers. It was Google in the past who kind of led the industry towards pan for modulation For example. I think Nvidia now is gonna be the one who is gonna make decisions Which optics are going to deploy in the unit position to?
Kind of route the innovation in the industry probably over this decade if not into the next one Go ahead.
So it's not in our numbers yet, it's it's kind of in this Category on the right. So it is a lot of bandwidth the challenge with PCIe. It's a little bit behind in terms of generations. But it will it will happen Probably by the end of the decade we will see optics CXL for PCIe as well. So you just have to be patient with it. So I'm gonna stop here because I'm out of time.
it's not a easy question so let's keep it for the panel discussion because we're We're out of time. I'm gonna introduce our next speaker. Which is Drew from Meta. And welcome.