-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path153
30 lines (15 loc) · 13.6 KB
/
153
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Greetings, everybody.We'll talk about UCIe, which is basically an open chiplet standard for building chiplets and having innovations on package.
So I'll do a quick overview of UCIe, the consortium, and then talk about usage models, then conclude.
So we started UCIe in March of 2022, which was last year.We incorporated in June.We started with 10 promoter member companies.Right now we are at north of 130 members right now.And these are the 12 board members.And a lot of exciting things are happening in terms of getting the industry effectively rallying around UCIe as the common standard on which everybody develops chiplets.
So motivation for the UCIe is that we are all up against the radical limit.And if you look into pretty much volume, CPUs, GPUs, what have you, they're all built out of chiplets, whether it is server, client, doesn't matter.They're all built out of chiplets.Primarily radical limit is the concern.The other thing is as we are moving forward with the more advanced process node, yield is a problem.Smaller dies yield better.So it's much better to just get a bunch of chiplets and package them together as a monolithic entity similar to one.Now there are some other motivation, which is around the time to market advantage, because if you have chiplets, you can have the function that is late coming, add that in, and then you can do a quick taping of your package or whatever you want to call that, and that's available.So it gets you quick time to market.And you're only changing the functionality that is really needed as opposed to basically going through the whole taping process for a giant monolithic die.Also lowers your IP porting cost.For example, those are exploding.There was a study done by HIR, and that showed how even if you did nothing, just the cost of porting an IP from one process node to another process node, those are exploding with the more advanced process nodes.So putting the functionality that is not changing in a different chiplet and reusing that causes us to not have to pay for those things.It also offers us the choice of deploying optimal process nodes for the optimal functionality.So for example, computing, I can put that in the most advanced process node, things like memory I/O controller.I can put that in a more mature process node.Log wants to be in a different process node.We can put that.Optical wants to be in yet another different process node.And of course, memory needs to be on its own different process technology.So it enables us to do that.And of course, it offers us the ability to offer bespoke solutions, customized solutions for different customers.You can give different mix up, compute, acceleration, memory, what have you, depending on the need.So think of the picture on the right-hand side is similar to what you have at the board level today.You can build your board by using different processors, different sets of memory coming or DIMMs from different vendors.You can have your choice of GPUs, NICs, whatever you want on the PCI Express slot.It just works.That's basically what we want to do at the package level.That's what UCIe is all about.So you're going to have a Sea of Cores.These are the heterogeneous scores.You're going to have memory connected to it.You're going to have a bunch of customized IPs.You're going to have different types of modems, all of those kinds of things.Put them together on a package.This would work similar to what happens today in PCI Express or CXL.
So with that, UCIe, we have done a second spec, which is 1.1.But this is-- I'll talk about the delta.UCIe is the 1.0 specification Intel donated that when we formed the consortium in March of last year.A fully defined specification.It's a layered approach with industry-leading key performance indicators.You want the key performance indicators to be industry-leading for it to get deployed.Physical layer is nothing but the die-to-die I/O.And it has got all the things like link training, very well specified.You have got things like lane repair, lane reversal, scrambling, discrambling.Of course, you have defined the analog.You're defining the channel characteristics.You are defining the bump map.All of those things are very well defined.We also define what is known as a raw die-to-die interface, which is basically the IP interface that goes between that PHY to the upper layer, which is the equivalent of a link layer.For those of you that are familiar, we call it a die-to-die adapter, which basically die-to-die adapter is responsible for the reliable delivery of bits.So things like CRC, replay, all of those things are built within the die-to-die adapter.And then that talks to the upper layer, which is the protocol layer, using a fleet-aware die-to-die interface, very well defined in the spec, FDI for short.From the protocol layer perspective, we have done the mapping for PCI Express and CXL.And also we have done the mapping for streaming protocols.And we will see that.We also have something called a raw mode, where let's say if you are putting just a SerDes PHY, you're going to say, hey, I have my own link layer stack.I don't need your die-to-die adapter.Fine.You can just bypass the die-to-die adapter, go straight to the RDI interface, just use the PHY.Basically, you bypass that thing.So PCI Express, CXL, fundamentally, we're using those protocols.You get your SoC construction issues addressed.People know how to make things work by getting a clock and power and the rest of it, the software, the discovery, all of those things, error reporting, all of those things are built into those protocol stacks.A lot of us have worked in PCI Express and CXL for a very long time.And from a usage model point of view, all the I/O attach use cases, well-established, decades-old technology.I mean, it comes like clockwork.We have on the seventh generation CXL.We are on to the fourth generation now.So a lot of those I/O attach gets addressed, memory through CXL, acceleration through I/O memory and caching semantics.And now there are streaming protocols, and we'll see more of that.So streaming is basically things that are used for scale-up, right?For example, even if you look into our Xeon, we split Xeon into multiple dies and we stitch them together.It's talking an internal protocol.That's a scale-up.So we want to make sure that those also go through this interface, similar to QPI today, for example, in our case, runs on PCI Express PHY.So those are the types of things.And then, of course, things like CHI, AXI, those are very popular.And those are the types of streaming protocols.You want to send them over UCIe.And then, of course, we got the well-defined configuration space for interoperability.Every one of these layers has a well-defined config structure modeled after PCI Express so that your software knows how to handle them, right?If you look into it very carefully, it's done in a way where you can bring in your existing software and it will just run seamlessly.Same way with form factor management, compliance, interoperability, and then we talked about the plug-and-play IPs there.
We support two types of packages, standard and advanced packaging.Standard is known as 2D, which is for cost-effective, longer distance, so about 25 millimeter.Advanced package is 2.5D.The picture on the right-hand side shows three examples, and that's what they are, examples.There are many more.The topmost one is Intel's EMIB.The middle one is TSMC's CoWoS.The bottom one is AES's FOCoS, right?And the vision is, and as you will see later, your dies can be manufactured anywhere, and they can be assembled by anybody, and it should work.Just like today, you can get your cards from anybody, design your board anywhere, put them on PCI Express or whatever.It just works, right?And we do that with UCIe.
Now, we went ahead and did a recent announcement with UCIe 1.1, fully backward compatible with UCIe 1.0.What we did was we added some enhancements to the automotive segment.So these are things like errors happen, so we defined some configuration registers where those errors will be reported, or predictive analysis for those, those kind of things.We defined some counters and things of that nature.And it's primarily for automotive segment, but also enterprise can use that.And there are some new usages around streaming protocols that want to use the full stack.We introduced that.So that way the stack is there.Earlier streaming protocols were only in raw mode, but people said, hey, we want to really take advantage of that adapter, and also we want to be able to multiplex with PCI Express.We want to use, for example, PCI Express for discovery, error reporting, all of those things.At the same time, we want to use streaming protocol for doing whatever we are doing.We also did cost optimization for advanced packaging, and we did some enhancements for compliance testing.
So those are some of the enhancements that we did with UCIe 1.1 while being fully backwards compatible.Usage models for UCIe, of course, SoC construction at the package level.We talked about that.We want to make sure that innovations that are there at the board level, we should be able to do that at the package level today with UCIe.So you can do this mix and match of different processing chiplets with acceleration, with memory and I/O, with memory, with a bunch of things like optical, all of those things mix and match using UCIe.That's basically what we are enabling.
This is continuing with the example.You can construct, for example, switches, large radix switches from smaller radix switches by connecting them on the package.So you can have 128 as your single die.If you want to have 256, connect two of them using UCIe.You've got 256.Of course, what protocol?It will be whatever is the switch has its own internal protocol to connect.So you can just send that over UCIe.Four of them together, you can construct 512.So now it's basically your mix and match.
This usage model comes from our friends at ARM and NVIDIA.This shows some of the usage models that they are thinking about.So compute dies, for example, that can talk the CHI protocol using UCIe.And on the right-hand side, you have accelerator die, which can talk CHI and/or CXL.And that becomes like an open slot.So the left side is like a scale-up solution, and then the right side is like an open-slot kind of solution.So again, multiple things are possible with this.These are just examples.
The other interesting thing that we had done with UCIe 1.0 is this notion of UCIe retimer, is what we call it.And fundamentally, the usage model is co-packaged optics.So we have defined how the electrical side should work.We have defined all the pause points where you talk-- the optical-- there are multiple technologies out there.So whatever is the choice of optical technology, it can say, hey, I'm not quite ready.We go through all of those most definition in the UCIe spec.And with that, what you can do now is you can have a package that basically you can go through co-packaged optics and connect to other components, like switches and things like that, at the rack level or at a pod level.Now you can have memory, a piece of memory that is sitting in a different rack maybe, and you need more memory capacity.Now you can go ahead and ask for that memory capacity because you are running CXL through the UCIe.
So that's the vision, right?I mean, that's the vision with which we are going.This is the money slide.I'll go through this quickly.Support all these data rates.And you will see that if you support a data rate, you must support all the lower data rates.We leave nothing to chance for interoperability.We define things really well.Today you can bring a PCI Express link, a card, and plug it in.It just works.You want the same thing with UCIe, nothing less than that, right?We define all these things about the cluster width, pump pitches, all of those things.Bandwidth density, if you look into the second table, bandwidth density is defined linear and aerial.And we define things in gigabytes per second, per millimeter or square millimeter, right?So take a look into the advanced package one.It is 188 through 1350, which is if you run at 4 gig, 64 lanes gives you 188 gigabytes per second per square millimeter.If you run at 8 gig, you're going to get 376.If you run at 32 gig, you are going to get 1350 gigabytes per second per square millimeter.That's a lot of bandwidth.If you look into PCI Express, for example, Gen 5, the bandwidth density is less than 20.Same thing for anything else.Networking, it's fundamentally-- so you're getting an order or two magnitude better bandwidth density, which is what you would expect, because these are internal, on-package interconnect.These are well-defined.Those are not as bad.You don't have discontinuities, all of those things, right?Power efficiency-wise, 0.5, 0.25 picojoules per bit.External interconnects are an order of magnitude more, 5 to 10 picojoules per bit.Low power entry and exit, very low numbers.So less than a nanosecond, which means that you can turn these things on and off quickly.So all of these things, and then, of course, latency, an order of magnitude lower.
Now for the ingredients, these are the four sets of things.And you can take a look at it.We have all of those.Here's the interesting thing.
We announced or we demonstrated the interoperability using UCIe last month at Intel Innovation.This was Intel chiplet with a Synopsys chiplet designed in TSMC process technology node.And we packaged them using Intel's EMIB technology.Think about it.Last year, March, the spec came out.Companies around the world have been designing.And this year, you have chiplets demonstrating interoperability running at full speed.Now that's the power of an open ecosystem.And that journey continues.
So with that, I'll take the questions if we have any time.We don't.OK.I'll be outside for you to take questions.