-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path226
90 lines (45 loc) · 28.5 KB
/
226
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
I would say, so if I can ask, not necessarily everyone has been here the whole session, so if I can ask each of you very briefly to introduce yourselves. And also can I ask just roughly how many years have you been thinking about AI just to get an idea?
Vlad Kozlov, LightCounting Market Research. Thinking about AI since November 2022 for sure.
Hi, Ron Swartzentruber, Director of Engineering and Lightelligence. Probably thinking about AI for the last decade.
Andrew Alduino, Meta Platform Systems, AI since, well, two and a half years or so since Meta.
William Koss, CEO of Drut Technologies. It's been about three years, probably in the same category, so ten is awesome, by the way.
Okay, so first of all, Andrew, thank you very much this morning for looking ahead a little bit. But let me ask, and I'm interested in all your thoughts. Because the sense is this keeps scaling and as you pointed out earlier, it's scaling at a rate which is even surprising you. But can you give any insight as to what you're seeing? Is it just a question that models will keep growing? Are you seeing anything beyond LLMs? What insight do you have in terms of what's coming down the pipeline? And it may be limited.
Well, I think we've been quite public and to be honest, I think Google as well as sort of ChatGPT and sort of OpenAI, that we're really at the start of the journey. We're talking about Llama3 now, sort of internal to Meta, and that's really only just starting to bring in images. And when we look ahead to Llama4, now we're going to start to bring in video. The models will get bigger. They're not anywhere near where I think we want them or need them to be. And we're just going to have to invest in that. And that really means not just the hardware and building larger and larger systems with more and more memory and more and more flops. But also more innovative on the software side as well. I mean, you've been predicting or afraid to predict sort of exponential growth forever as we are, right? But we don't see a point at which investing in the hardware is detrimental at this point. And I hope that there will be innovation in sort of software that will help avoid that. So maybe it'll be a mixture of experts or other sort of innovation in that area. But more across the board is really what's coming, I think.
And can I just ask, when you do start going to a mixture of experts, is that just bolting on LLMs or does it -- in other words, that's another form of scaling, but maybe there's efficiencies I don't know.
I think the answer is not anybody knows, right? I mean, there's going to be a great deal of innovation over the next three to five years if we figure this out. It's not entirely clear to anybody. I mean, even the scaling, right, I mean, is not entirely clear. And there's a lot of work innovating across the spectrum, right? I mean, I think that gentleman from Google talked about overtraining models and fewer parameters, right. Which is not widely accepted, but is a sort of strong vector as well. I guess I would say my crystal ball -- our crystal ball isn't very clear in this. There's a lot of work yet to be done.
And, Bill, can I ask you about that? Because maybe you're trying to just scale GPUs because a lot of people want to get their hands on scaling and deposable, but how do you look at this?
I think we'll use GPU as a proxy for like everything right now, but quite frankly, I think there's a few companies that are beyond most other scale. Obviously Meta, Google have done things, Chet GPT. Not everybody has that size or scope. I have found this week that talking to people in the video space, in the media production space who are extremely interested in building much higher performance fabric architectures around GPUs for video processing. And you think of all the things we consume on video, so it's not even just in an LLM, but I would agree that I think we're at the front end of a very long cycle. I probably came to that conclusion maybe about 18 months ago when I started thinking about how we're really building together large systems. And I thought, oh, my God, that kind of hypervisor era of cloud is somewhat over, and that you're going to have to rebuild lots of data centers. And the question is how much is in public cloud, how much moves on-prem, and certainly there's not one end user I haven't spoken to in the last year that hasn't said some migration to on-prem. But how do I do that efficiently, and how do I build that? They haven't even touched an LLM. They're just talking about maybe it's an HPC shop or maybe it's just some higher end enterprise customer. But my view is I think we're very much at the beginning. And what we'll be talking about in five years is very different, and there's an amazing amount of technology. Which is one thing that I'm fairly excited about in the optical space. You see a lot of stuff being shown in labs, demonstrations, tests. It's just being commercialized, and the one statement I would leave you with, I did come to the conclusion, and we'll see if I'm correct on this, that it reminded me very much of the 1990s, and if you were in the optical space in the 1990s and you were building early long-haul networks and then regional networks and you did metro WDM. The problems looking at doing large-scale photonics in the data center looked very much like building a CLEC, ILEC, long-haul network in the '90s. And we're going to adopt the same technologies around wavelength reuse and fiber reuse and reducing switchboards. You get all that type of stuff. It's a very interesting problem.
And Ron, I know your company's been looking at this for a long time, and you've got all sorts of technologies, yet what you outlined today was very pragmatic addressing the marketplace. But I'm interested in your thoughts on this as well.
Yeah, so we're, as I said, working on developing photonics technologies to solve large-scale computing problems. So that exists with optical CXL, whereby we're interconnecting the components in the disaggregated data center, helping to enable those composable architectures, software-defined data centers. But we're also in the optical computing space, whereby we're looking at, okay, how can we reduce the carbon footprint in the data center and enable photonic coprocessors, basically AI-assisted photonics devices? So that's a very new technology, and it's got a long way to go, but we're clearly also photonics interposers developing. We announced at Hotchips the world's first photonic-enabled AI coprocessor accelerator that ran commercial workloads. So that was the first of its kind, and it's only going to increase. It was a smaller model, but those types of things. So what we're looking at is, okay, where can photonics, where can optics really help solve the problems? And certainly it's inter-rack connectivity. That's the obvious one for optical CXL, and it's open source. We're not tied to NVLink. We're not tied to proprietary fabrics. So that is a cost advantage.
Okay. Another question I have, just touching on the GTC summit and the announcement of NVIDIA. It seemed to take people by surprise how much they crammed into a cabinet and also the fact that it was all copper. And yet everything that's being discussed, and you touched on this, Drew, is however good a job you do scaling up, there's still this scaling out and the numbers being quoted. So there seems to be this strange period where this is refocused on copper because it's the most economical. Here at the same time, everyone is discussing how the processing required is growing so phenomenally that by definition it has to be scaled out. There's all this work going on in optics, and yet still that announcement was surprising. So again, I'm just interested in the panelist's thoughts as to what this means. Is it a temporary pragmatic solution? Is the more scope that they can do inside a cabinet? How to view this? Because the sense is, and has been for years, that optics is just a necessity, but we're not quite seeing that yet. Any thoughts?
Well, I think optics is a necessity outside of a rack. But within a rack, I think copper does a decent job. And if they have switches in the middle of the rack and servers on both sides, one meter cable is long enough to interconnect it. And there's still room to cram more GPUs into a rack. So NVIDIA did 72 GPUs and their power consumption was 120 kilowatts. So the maximum for a liquid cooled rack is 400 kilowatts, at least current estimate, which could go up. So they can certainly double it, maybe triple it in terms of number of GPUs per rack. Are they going to try to interconnect two racks as one server with copper? Possible. I mean, you could use some kind of equalized copper cables with very minimal power consumption. So it's possible that the copper will continue to take a larger role. As Drew said, if optics is not available, optics of desired cost and power consumption, the system engineers will design around it.
Okay, let me just clarify if I may, because the whole discussion this morning has been on things beyond pluggable, even though plug. So when I said, implied, where's this optics opportunity, it's everything that's being spoken of, whether it's more power efficient, pluggables, co-packaged optics, and so on. Even optical interposes new architectures. So again, Meta discussed this yesterday, where again, it talked about putting more inside the rack and copper. It did touch on this yesterday in the keynote speech. I'm just interested because, Bill, you talked about, you're already seeing a need for co-packaged optics, and there's not many examples of that. So this is what interests me in terms of this. The sense is that you need to go to more efficient architectures because of the amount of processing required, and optics and new optical technologies, all this has been discussed this morning, and yet we're not quite seeing it. It's still very much a pluggable world, and efficiencies with pluggables. So I'm just interested in your thoughts on this, the trigger points.
I can probably do without that. I have to use it, okay. I think, well, in the case of NVIDIA, why would you transition your architecture to something that you don't have a really strong advantage in, if you don't have to? That's how I would make the decision. I wait until somebody shows up with something that's better than what I can do. So I understand the whole copper thing. The inertia of a large company who develops technology is typically not to obsolete themselves. It's very rare that that happens. So I don't ever see them doing that. And certainly someone who's a consumer of technology, at a very large scale, certainly copper has all the advantages today in supply chain and deployment, pluggables and all that. Just from our perspective, when we look at co-package, it's really the density question. How can we get higher amounts of density of ports? The reason why I think if the data center transitions to a fully disaggregated all photonic architecture, you're just going to need a large amount of ports, a large amount of fiber. So anything that helps reduce those things is really where the magic's going to be. You can start to do it today. It may not be the most efficient design, but so many of the things that you heard the prior presentations about, more wavelengths, higher density, co-package, all those things in time will weigh on the supply chain. That's definitely my bet. And I'll go out and sell that message. That's what I'm going to do. But today, if you were to ask me to build 20,000 GPUs, these guys just announced a couple of designs in those scales, it'd be very hard to do it all optically.
Go ahead.
So when we first really started talking about co-package optics as an industry, it was all focused on switch bandwidth. And so the comparison for co-package optics wasn't copper cabling in a backplane, it was pluggable optics. And I think you can still do the math now and see integrated optics is a lot more expensive than co-package optics. But when we talk about AI/ML though, much of the bandwidth is really being discussed, and it's not even a DAC cable, which is costly compared to a copper backplane. If you look at the cost of bandwidth between an ASIC and an HBM, you're talking about on-package copper traces, which are fractions of a penny. If you're talking about the bandwidth between a GPU and its nearest neighbor CPU or GPU on a board or cables in the backplane, it's still fractions of a penny. Where is co-package optics compared to that? Or power, right? I mean, we're labeling optics power on top of electrical power at that point. So what is the advantage? And that's where I think the question is. And I think someone told me this many, many years ago, right? If all you do is plumbing, then all you get paid is plumber's wages. When those things become prevalent and required, that's when risks and innovation will happen, I think.
Yeah, I would just echo that. We see an end to pluggables in the next decade or so. You're already seeing linear drive take over from transceivers or start to take over in certain applications.I think co-package optics, near package optics certainly have a future, but what the OCP has recognized is that the solutions are all over the map. You know, if you were attending the Future Technology Symposium in the fall, there was many companies, many vendors offering multiple different solutions. So what the OCP has done is established a work stream. It's a short reach optical interconnect. And just a plug for that work stream, I happen to be chairing it, but please come with your ideas. What we plan to do is define how optics can provide that advantage. Drew mentioned it, you know, it's shoreline bandwidth, bandwidth density, power, you know, and we're not gonna maybe define form factors, but we're gonna define how, you know, what's the OCP's recommendation for vendors? If you wanna come and play in this new world where optics, you know, get closer and closer to the silicon, how can they be effective? You know, there's no reason to drive an electrical trace across a PCB if you can go to optics immediately. I mean, furthermore, the bandwidth that you can provide in multi-wavelengths in a single fiber. You know, is an order of magnitude better than what you can do in a copper interconnect. So that's where we see the future. We don't have the answers yet. That's what the work stream's about, but trying to sort of pave the way for predicting where the solutions need to be developed for the future.
Okay, two quick questions, please. And then I want to open it up to the audience. Do you have any advice to all these different photonics players coming at this from different directions, whether it's comb lasers, co-package optics, LPO? There's a whole variety of technologies, optical interposers. Is there anything that you see that's missing that besides, say, lower power or more cost effective? What guidance would you give them? Or is there anything that you're looking for that you're not seeing?
Well, I suppose I would answer that, that as I said in the talk, the development cycle for integrated optics is long. Decisions have to be made earlier on in the design cycle than we might otherwise be happy with. And so, you know, first demonstrations in the lab are necessary but not sufficient conditions for us to make product decisions on. We need maturity of the optical end-to-end link. There can't be a flow chart with a little cloud in the middle that says, "Miracle happens here." Right, I mean, if we need to turn on a 24,000 unit cluster or some X number of that and we need all of that to hit in three months, it can't be, "Trust me, I'll figure out how to partner with somebody "and make it happen." That doesn't work. And so maturity earlier in the process, shooting way ahead of the duck is an important part, I think, of scaling to the volumes needed for these very large clusters.
Yeah, so as a company that we really, I'm trying to sell to end users. So we're out doing designs. I was doing designs a few days ago on like five sites with like 64 GPUs there. It was all kind of Gen 5 stuff. So it's a very small scale but in a tight configuration. So I will tell you what we need is better pricing. Like just shockingly. But part of that is just the maturity of the supply chain. People getting larger orders through, that will happen in time. But I can't order, in this case, we do have a co-packaged Optics product coming and I was doing a design and a quote. I can't be designing and quoting at the kind of like initial sample build price. It's gotta be on par or better than pluggables. Because that's the model. So I'm gonna need to get to pluggable laser prices. And I can tell you, I can buy a 100 gig pluggable for $170 and I'm sure he can buy it at a way better price than that. So that's where just the market's gonna have to go to do practical designs in the field. So, but that will come over time. Every year, they'll be a little bit better. And by the way, the copper guys, and I'm sure most people have heard Andy talk about this, you know, the copper guys will continue to innovate and try to like get copper cheaper and that will play. But at some point, the reconfigurability and playing with all the ability to stitch and manipulate bandwidth and have better use of fiber that will all come into play will be an advantage. That's the bet that we've made.
Okay, last question to all of you very briefly. You pointed out earlier the danger of even looking a year or two ahead, but if I can push you, what do you expect to see in five years time?
Wow, okay. I'm still shocked it happens in a year. I don't know if I can answer that question, but I know what I think we'll need if our approach to the architecture design is. And that's a much larger switch technology. So that means I'll have to eat the message that I just said, which is we'll have to build an all optical switch that has a lot of ports and drives at a very low price point. And that's just gonna take a few years to get there.
I mean, linear curves don't seem to work in this realm, but I mean, if you talk about a linear curve, five years, right, we were at 16,000 nodes last year and 24,000 nodes this year, so it's 50% year over year. We're at, how many nodes, right? How much distance is that? How much power is that? I mean, and to be honest with you, I know the linear curve isn't accurate. So we need more. And it has to be more bandwidth, the lowest possible price in PowerPoints to sort of deliver what it is that I think we wanna need. I mean, develop responsibly, right, and ethically and not Skynet-like. I mean, clearly there's a lot of innovation there. It's more.
I think, so in five years' time, what you're gonna see is an all optical fabric within the data center. I think that's a fairly good prediction. There'll still be legacy implementations, but I think by and large, your data center will have an all optical fabric. So there'll still be copper largely within the rack, but I think that profile is going to move towards optics, optics closer to the processor, the GPU, the memory. Put it that way, so optical memory interfaces, optical GPU, CPU interfaces, so that there will still be pluggables, but they'll just be direct attached. You know, the optics will originate right next to the processor.
So I think, you know, on the subject of the cost of optics, it's only a question of volumes. If we double the volumes, the price can go down quite quickly. And we've seen examples of optical technology scaling in cost to surprising numbers, like when Apple started putting in VCSEL arrays in their face recognition systems. You know, the cost of a laser, I mean, the cost of a laser array with 300 lasers was $1.50. I can't even convert, you know, count it per laser. It's like a fraction of a penny. So suppliers can scale the cost down given very high volumes. And you know, we've seen technologies like what Richard was showing. It's an amazing technology. There's so much engineering going on into those hard drives, but once you make them in hundreds of millions, they don't cost as much as you would think. So I think we will be surprised five years from now how the cost is gonna come down. And I think I agree that there are a lot of benefits of having an optical fabric, which adds additional value to the optical connectivity. You cannot do an optical switch with copper cables. So I think that those trends will probably converge and kind of work together for the benefit of our industry.
Thank you, okay. Let's just have five minutes of questions.
No, no, it's not for you. I have a previous one first. I have two questions. The question is a lot of companies to do come with gorgeous idea, you know, in terms of how to solve this density or to reduce cost or to reduce power consumption. And sure, we come from a long time in the with 3.5 technology, when it's a mature technology, I would say everybody knows what can come, reliability and everything. And when we speak about peak technology, major companies who are developing are fabulous, okay, totally dependent of outside foundry. Each foundry has its own cooking, how to make it. Then my words is coming to Andrew was saying, where are we with the maturity, the reliability of the product today? Because for anywhere you go from foundry to another foundry, you don't get the same product, you don't get the same maturity on the product. And as we know, in the world we are developing today, it's very important to guarantee the services, you know, then question here, how could you guarantee that what today you're pushing with a peak technology, you will get the right maturity to be deployed next year or in two years?
Well, as somebody who wants to consume all these technologies, I would agree with you that there's almost every vendor who's come and pitched us, tried to get us to design in some sort of optical technology. About 18 months ago, they said it would be available in six months and we still don't have some of it. So, you know, it's been like a, I would almost say literally push all the vendors. Who've come and said, hey, design our part into your solution. It's been about a 12 month push. So just a slow process, getting maturity, getting yields, fixing, running another. Yes, I think that will solve itself over time, but it's clearly an issue that wouldn't apply by the fact.
So Vlad showed a very interesting chart early on, silicon photonics volumes, right? And I was involved in the early days of silicon photonics pushing that. We all thought it would be faster as well. It is just a long, slow, steady slot to collect the data, to sort of build it up, to build a manufacturing know-how and to ensure that you can consistently deliver the right product at the right price point with the right yield over time. Which is part of the chicken and egg, I think that we have with all new technologies. There is that sort of innovators dilemma in some sense, right? That's be really, really creative, but it has to be mature and how do you get there, right? I mean, it's unfortunately a reality, I think.
Any other questions?
When you start this morning, the pitch then I have a good question for you. You said it would be no for the panel. OFC was one month ago, okay? And it was clear from the major company as a Meta or Google or Microsoft or Amazon that two years ago, it was a non-seed and non-China. And then we were there, it was non-sea, non-tea, non-China, non-Taiwan. And my question is no, how to handle this situation in term of the supply chain. Because today, majority of the supply chain is coming from Asia and mainly from China. And I know that there are a few companies bringing back production in United States, but how do you see this transition? First, in term of the capacity and number two in term of the cost for the optics, because I assume the optics manufacturing US would be more expensive than the solution manufacturing in China.
Well, we certainly see a disconnect between the politics and the trade. You know, if you look at the statistics trade between US and China keeps on making new records. So the businesses find ways to work around the barriers, if there is an incentive for it. So no matter how hard the politicians work, businesses still find ways of doing it. Having said that, all the leading Chinese suppliers of transceivers that we work with now have factories easily in Thailand, Vietnam, Malaysia, somewhere else. So they're not totally dependent on being in China. I haven't seen anybody transferring transceiver assembly and testing to the US yet. I know some companies have factories in Mexico, but... Say it again.
The number of suppliers that we talked to, they all are moving to Vietnam.
Vietnam, correct. So it's just gonna move to other countries in Asia. I think to mitigate some of those concerns. But I think I'm optimistic that businesses will continue to find ways around the barriers that politicians are building.
Okay. I'd like to thank the panelists for sharing what they did.Thank you. Now, let's just kind of finish up.
Oh. Well, this is the first regional OCP summit that I'm attending. And it's the first time the regional summit has a session on optics. We've been doing it as a global summit for probably two, three years now. And of course, it's in San Jose, which is a Mecca for the optical startups. But it's great to see European startups coming to this event and sharing their ideas. And speaking of ideas, I was listening to Joseph's presentation. I think he has enough ideas for like a dozen of startups. Which is often a challenge. I mean, part of it is a criticism too. Because running a startup, you do have to focus. You have one chance to succeed. So you put all your efforts into the most likely success story. And as a professor, I understand that you try to show that you're a magician, you can do everything. Which is fantastic for me as a former scientist to see. But bringing technologists to the market is not a sprint, it's a marathon. And you need to know the direction you're running. You don't wanna turn around and you're gonna run out of energy pretty quickly. So I think my recommendation to companies, whether it's New Photonics or Enlightra or other presenters is having more focus and putting your bets on what's the most promising. What's the largest opportunities are, taking risks because you startups, it's okay to fail. You'll start over again. My very good friend is one of the co-founders of Inalight. And Inalight is, last year, they're gonna be officially number one transceiver supplier. So for the first time, they're shipping more than Finisar, which is now part of Coherent. But this friend of mine was telling me a story is that Inalight was his third startup. And when he was raising money, he talked to a technical team saying, are we gonna screw it up again, guys? And they're saying, don't worry. By now, we made all the mistakes we could have possibly made. So this time it's gonna work. And yes, it was the third time around and it did work. So don't be afraid to take a risk. I mean, that's what it takes. If you play it safe, you're not gonna succeed. If you try to solve all the problems, you're gonna be great. As an intellectual achievement, it's fantastic. As a business story, you've got to focus on some things that you could deliver soon and make money. Go ahead, yes.
May I answer?
Yes.
Okay, yes. And at the end of Q4, we will have engineering samples ready with monolithic integration lasers, modulators, receivers, equalizers. So we are focused on the products. Today, my goal was to present the major-- But we are very focused on the production. So we will very soon provide the engineering samples.
Right, right. And Yosef, I didn't mean to criticize you that much.
I definitely agree with you.
Right. And just one last thought. I really like the message that Drew pointed out. Don't shoot behind the duck. Even if you're developing products like for 200 gig per lane. So Yanik, you were talking about eight times 200 gig per lane. You know, companies are being qualified now. They started probably six months ago in qualification process. So if you're a startup, yes, you're developing 800 gig, eight times 200 gig per lane products. It's your learning exercise. What you're really gonna be selling is the next generation, whatever it's gonna be, 16, 200 gig, or eight by 400 gig. So you kind of have to shoot well ahead of the duck because it's a five-year span that you're trying to cover. So it's an interesting, it's a complex world, but it's incredibly interesting. So I wish all of you success. We're here to help as Light Counting. We are bringing clarity to the market, and we're helping all of you to succeed. So we are on your side, including yours.