73


So I'm Jim Handy. Tom Coughlin was supposed to present this with me, but Tom's having an interesting year. He's been elected to be president in 2024 of the IEEE. And as a result, this year he's president elect, which means that he ends up spending more time on an airplane than he does at home. So it's really hard to keep track of him. He and I committed to do this. We were going to speak to you together, and then Tom got told that his schedule, about a month ago, maybe two months ago, he got told that his schedule wouldn't allow it. So here you are, you've got me. Hopefully I'll do a good job for you. The name of the presentation is Writing the Long Tale of Optane's Comet. I'm going to talk about Optane, a little bit about its history, what came out of Optane, and the new things that it's allowing computing to do, just because they put so much focus and effort into it.

So here's the agenda. I'll go over Optane's history, talk a little bit about today's alternatives to Optane. I'll talk extensively about the legacy that Optane left behind, and then flow from that into CXL, which is kind of a natural offshoot of this, and then from CXL to UCIe, the Chiplet Interface Standard. And then I'll talk a little bit about the future.

So I'm not going to go step by step through this, but this is Optane's seven year life, or at least the visible seven years of it. Actually, the first phase change memory that Intel introduced was in 1969. There was a paper by Gordon Moore, and a guy who actually puts blog posts on one of my blogs called the Memory Guy, a fellow by the name of Ron Neal, who was lead author in this paper, talking about how they had made the world's first phase change memory. Intel played with that, even introduced NOR Flash compatible product back about, I don't know, 2006, I think it was. And then finally landed on this cross point memory that they introduced in 2015 with Micron. This slide shows everything Intel did at the top, everything Micron did in the lower part. And so you see that, you know, 2015, there was the announcement of cross point. And then you had Optane announced, I didn't put that in there, but Quantics, which was Micron's version got announced. It went on through Optane memory, which was not really memory, it was a fast SSD to go along with a hard drive. And then the Optane DIMMs came in about 2010. And that was when things really should have taken off, but they didn't. So that ended up causing some financial problems for Intel. Micron sat back and watched all of these financial problems, and they said, "We're not participating until it becomes profitable." And then they finally said, "This is never going to become profitable." And they jumped out of the marketplace. Intel came to that realization last year in July. Very interesting little tidbit is that they announced what they called the wind down of Optane exactly seven years to the day after they had announced 3D Crosspoint existed. So in that seven year time span, then we had Optane come through all of those changes.

And they drove a lot of change to the industry, but I'll talk about that in a minute. First, I'm gonna talk about what are the alternatives to people who were really hoping for this to happen?

And the first alternative in this table is Optane. Intel and Micron built a whole ton of Optane, and they've got wafers in some wafer vaults in, I think it's Rancho Santa Fe, New Mexico, just waiting, dying to be packaged up so that people can use them in DIMMs. And so anybody who wants Optane DIMMs can, or the persistent memory module, they call it, can still get those. They'll be able to get them for a while. I see somebody taking pictures of the slides. These slides will be available online shortly, as will a video of this entire presentation if I'm speaking too quickly for you, which I hope I'm not, but I do that sometimes. Anyway, the columns in this is, is it persistent? What is its speed relative to DRAM? And so I put 30% down for Optane 'cause it's about a third the speed of DRAM. What does it cost compared to DRAM? And the 3D cross-point strategy was to price it half as much as DRAM. Now, because of the fact that it came in these very large modules, 256 gigabyte, 512 gigabyte modules, they priced it like Samsung's 256 and 512 gigabyte modules, DRAM modules, a half of that. And Samsung's charging exorbitant amounts for that. So don't look for Optane's price to be 50% that of a 16 gigabyte DRAM module. It's nowhere near that, it's significantly higher. But still, if you're looking for the same size DRAM module, then it costs about 50% of that. And the issue with it is that it's winding down. You've got NVDIMM-N, which has been around for a long time. I'll talk about that more in the future. But basically it's a DRAM that's got some backup NAND flash in case there's a power failure. It is persistent. It has the same speed as DRAM. So I put 100% in the speed table. And I say it costs 200% of what DRAM does, but I've heard actually that it's more like five times as much. And the big problem with it is that it requires a battery, which is not the most reliable component in order to do that power down backup thing. Everspin, the company that is the lead in the MRAM business, and MRAM in case you don't know is magnetic RAM. It's memory technology that isn't in the mainstream, but it uses magnetic bits that are accessed the same way as a DRAM. And it's persistent. It's the same speed as DRAM, but it costs about 10 times as much as DRAM. So that's really expensive. It's $1,000 for an eight gigabit chip, or no, wait a minute, for one gigabit chip. So, you know, it might even be more than a thousand percent. The big problem with it is that it's not compatible with DRAM 100%. And so changes have to be made to the host system to be able to accommodate the fact that it doesn't need to be refreshed. Sounds like a big benefit, but it ends up getting in the way. Fast SSDs, you know, that's something that some people are proposing as an alternative to Optane and Kioxia and Samsung, mostly Kioxia, talk about some of their SSDs as being storage class memory, which if you take the most broad definition of storage class memory, then yes, it is. But what it's come to be known lately is something that's like DRAM speeds. And it's certainly not that. And so I say that it's 0.1% of the speed of DRAM. It's a thousandth as fast. It costs 20% as much, so it's really cheap, but it's slow. And then the last, you know, most obvious answer is additional DRAM. So it's not persistent, so that's a problem. Same speed as DRAM, same cost as DRAM. And the big problem that it runs into is bus loading.

I'm gonna talk about each one of these one by one. So this is Optane. It's still around. The current inventory is fulfilling needs, and it probably will for a number of years. There is ongoing low-level demand, and that's gonna keep Intel in that business. They're just not talking about it anymore. And there's an awful lot of support already in place thanks to a number of SNIA members, because SNIA put together the Persistent Memory Programming Standard, and that ended up being what caused Optane to work in the applications in which it works.

You have the NVDIMM-N, and I talked about this before. It's a bunch of DRAM. The larger chips are NAND flash chips. Costs about twice as much as DRAM does, because you've got extra stuff on there. There's also a microcontroller that when the power goes out, the microcontroller moves all of the DRAM's data into the NAND flash. When the power comes back on again, the microcontroller moves all of the NAND flash data back into the DRAM, so you can start over with a warm start. But it does require a backup power source, which a lot of people don't like. You have to find a place in your system to stick this particular thing as a PCIe slot with huge capacitors on it. This isn't shown to scale. That thing is actually much, much larger than it really looks. Or you need to find a place to bolt a battery onto the side of the chassis for the server.

So a lot of people dislike that for reliability concerns, et cetera. You've got the MRAM-DIMM, and unfortunately I couldn't slant this picture the same way as all the other pictures and have the logos right side up. Everspin's logo is actually upside down on all of these. But production for this started in 2017. They haven't seen a big enough market to warrant making a DDR4 or a DDR5 version, so they only have a DDR3 version. Like I said before, it requires changes to processors because it doesn't refresh, and it costs more than 100 times as much as DRAM. So that's problematic in its own right, but it does do the trick, and it's a nice, fast application.

And then I just wanted to point out that MRAM is slowly being adopted in the enterprise. And so IBM is using it in their FlashCore modules. They use it instead of DRAM. And the reason why is because the DRAM is required to store information that would need to be reloaded every time that the SSD rebooted. And they thought that they could probably do better than that and they could also protect data in flight a whole lot better. So the translation tables, buffers for write coalescing, which would be the data in transition, things that are being written to the SSD but haven't yet made it into the NAND Flash because NAND Flash is so slow. It's an easy way to protect data in flight, and it's a very easy way to get into persistence if you wanna have that in your SSD. But also we're seeing increasing consumer adoption of MRAM in things like medical applications and vehicles, health monitors, that kind of stuff. So what that's going to do is that's going to cause the number of wafers that have MRAM on them to grow at a pretty substantial rate. That's going to drive down the costs because scale is huge in the semiconductor market. The more wafers you make, the cheaper they are to make. And that was one of the things that stood very much in the way of Optane's ever being able to become profitable. And then finally, we believe that the economies of scale will reduce prices as a result of that huge consumer demand.

And then finally, SSDs. And I put SSD, really? Well, yes. And Kioxia and Samsung are both advocating this. They have special NAND chip architectures for that. You can only warrant doing that. You can only pay for it if you've got a huge NAND volume to support that. And typically they use SLC, single-level cell NAND, which is a whole lot faster than multi-level cell NAND, the two bits per cell, three bits per cell. But the way that the market goes, the volume is very low for SLC NAND. The economies of scale play a part here too. And so SLC NAND is about six times as expensive as MLC NAND, which is more expensive than TLC. And then the question is, which performs better, a fast and small DRAM or a great huge NAND flash? Well, that's an interesting thing. And I made a presentation, I don't know how many years ago at a conference called MemCon about this, where I showed this.

First of all, I wrote a book on cache memory design. And so this kind of feeds into that is whether a cache memory does a good job or a bad job depends on how much locality there is in what's going on with your memory. And this is also true of virtual memory, is that a virtual memory will do fewer page swaps if you've got high locality in your code. Locality is represented by, high locality is represented by that white line, is that this is kind of an abstract concept, but it's how many accesses happen in a small address range versus you have a wide address range and you have your accesses smeared across an awful lot of those. So the red shows what happens when you've got your address accesses smeared across a very wide range. The white is when you've got them very tightly clustered in a single range. Now, this might be your DRAM in the system. And you see that it does an okay job with the red, but everything that's outside of where the DRAM is, is going to end up causing a page swap or something like that. And the white does better because it's got higher locality. Let's say you double the amount of DRAM in your system, all of a sudden the white would, almost everything in the white is being taken care of by the DRAM. And so that ends up being a really good solution, but it's still kind of a so-so solution for the thing that doesn't have very high locality. And this would be an awful lot of databases and AI type programs.

And so an alternative that people who really think through this problem, think about, is what if you took us back to the original amount of DRAM and you put in a great big, slow NAND flash memory and SSD. And this is why SSDs have become so popular, is because you can do that. And you can see that the red is very well taken care of there. And the lower height of this means that, you've got slower access. It's like I say, kind of an abstract chart. And with the white, you do have that part that's not covered that is not gonna do as well. But overall, if you have a huge slow memory, it's going to do you a really good job when it's matched with a small amount of fast memory. But once again, depending on how localized your references are in there. So that's the argument that's being used in favor of using SSDs as an alternative to Optane memory, is that if you have a big SSD, it might be able to get you the same amount of performance that you get with Optane.

Okay, the final thing on that table that I showed you was to put more DRAM into the system. And that's been problematic for a number of years. Large DRAM ends up adding capacitance. They load down the memory channel if you put multiple banks on there, or even if you put multiple chips into a single DIMM. And part of the reason why those large Samsung DIMMs are so expensive is because they take care of that by mounting the chips on top of each other through an arcane approach called through silicon vias. Adding memory channels increases the power and the pin count on the processor, because you can, instead of putting a whole lot of DIMMs on a single channel, you just put out multiple channels that have a single DIMM on each one. Well, that ends up that the processor has to drive all of those pins. And so it consumes a lot of power, adds a lot of pins. And so this is something that limits the processor's ability to use the power for more productive uses. IBM has been trying to find solutions for this for years. And in their power architecture, they use something called OMI, the Open Memory Interface. It's a non-DDR interface. And so they'll take DDR memories, stick them on something like a DIMM, but larger, with a controller. And then that controller is what talks to the processor through the processor's PCI port. So that's the OMI interface. The OMI interface has now been turned into a CXL, a part of CXL, but they acquired the rights for this. So, but CXL, the original CXL, is about adding slower memory to the memory channel through CXL. And so one of the nice things about that is it allows you to have much larger memories, but more importantly to the hyperscale data centers is that it allows you to have disaggregated memory, that you can now treat memory the same way you treat storage or servers. You virtualize the memory and you can assign different servers, different amounts of memory, depending on what they need. It requires memory tiering, and because of that, then it can accept different speeds of memories. So back in that table, I showed you that MRAM, or I'm sorry, Optane was about a third the speed of DRAM. This would take care of that without having to have a special interface for it. And so we'll talk about it, CXL in a little bit.

Just bringing us back, this is the exact same table that I showed you before. And it's just, you know, so we've got Optane winding down, but you know, by golly, it's still there. You've got NVDIMM-N if you wanna pay for it, MRAM-DIMM once again, if you wanna pay for it, and also if you can work with the DDR3 interface and your processor can handle no refreshes. Fast SSDs, you know, there are pluses and minuses to that, and then added DRAM.

So let's talk a little bit about what happened with Optane to support all of this.

And probably the most important thing that I mentioned before is the SNIA persistent memory programming model. And that was just the start of things. It allows for you to have hierarchical tiers. You can have different speeds of memory in there, but there are other tiers that are starting to appear in the memory area. So for example, GPUs, which are widely used for artificial intelligence, use high bandwidth memory. This is really tightly coupled to the processor memory. It has to be within two millimeters of the processor chip. And so it's always packaged inside the GPU's package. And it stacks. So once again, it uses this expensive technology that Samsung uses for the large DRAM DIMMs. DDR, you know, of course, that's still going to be used for a number of years, and then CXL. We're seeing memory disaggregation happening where servers don't have to have more memory just in case a large program comes along. If a large program or a program that requires a large memory space comes along, then CXL allows them to borrow memory space from a shared pool. And then finally, it allows memories to move into the chiplet. And so you'll see this model being used for persistent memory caches, which emerging memories support. And I'll talk about emerging memories in a little bit.

Another thing that, or I'm sorry. So Optane's legacy gave you a fresh look at memory. And I just have this the old way and the new way. I was thinking of doing a build for this, but the old way was all DRAM ran at one speed. And now you can have mixed memories running at mixed speeds over the CXL channel. Second one is that persistence is something storage does and not what memory does. And it's slowed down because of context switches. Each transaction with storage requires an interrupt. And that interrupt slows down the overall access. With the new way of looking at things, it's okay to have persistence in memory and it's okay not to use context switches to get to those. And I'll talk about that in a moment. Then I say memory is only put on the memory channel and the one below it, only memory is put on the memory channel. I love English. So memory is only put on the memory channel. You don't put memory into the storage area 'cause it slows it down. But now you've got four channels that you can do that with, HBM, DDR, CXL, and UCIe, which is coming, the chiplet interface. And then the bottom one, only memory is put in the memory channel. Now you can put memory semantic SSDs or maybe even other things onto the CXL channel and communicate with it as if it's memory, and CXL just hides all of that.

So this is what I was talking about with context switches. This is a SNIA slide from years ago, back when they were working with the persistent memory programming model. And you've got orders of magnitude of speed. And so you've got the columns, hard drive, SATA SSD, and NVMe SSD and then persistent memory, which was obtained before it was announced. And the green area at the bottom is where you'd wanna use polling of actually having the processor just go back and check and say, you ready yet? You ready yet? You ready yet? In a loop, because that's more efficient than doing a context switch. Up at the top, the pink part, you'd naturally use a context switch because that's the fastest and most efficient way of communicating with a hard drive, SATA SSD or an NVMe SSD. In between, there's this kind of a funny color thing. And that's where you can't really decide which one is which. And CXL is really designed mostly for that lower green band where you don't wanna use context switches. NVMe is really good for SSDs. And SATA, that's fine for hard drives. So, you don't need to have a fast interface for that. But CXL is a good place to put persistent memory.

So let's talk about CXL.

First, I'm gonna talk though about how Intel, how they forced the DDR bus to accept Optane. You've got Optane that's running at a third the speed of DRAM on the same bus as DRAM. And nobody's gonna wanna slow down their DRAM bus to a third its speed so that everything will run at the same speed, which is what the DDR bus was designed for. So they said, okay, we'll put some extra hooks into this. We'll call it DDRT. And it will handle both fast and slow memory. It uses a transactional protocol for the slow writes 'cause Optane writes for about twice the time that it takes for a read. So it will dispatch a write and then it will get back a response from the Optane module saying that write has been completed. It's based on a standard DDR4 interface. And so it had some modified control signals, which I didn't bring a laser pointer here. I guess I can point with this thing. But the modified control signals are one of these. I think it's this one here. I didn't make it big enough that you could read it. But the red line and the blue line are pretty much the same signals, the big arrows. There are other arrows that pass through. Those are pretty much the same signals for all of these things. And there were just a handful of signals that were different that were on unassigned pins on the JEDEC DDR4 standard that went there. And so the timing, the protocols, all of that were the same for DDRT as they were for DDR4. And so that allowed you to put DRAM and Optane into the same sockets. If you had two sockets per channel for your memory channels then you could put in DRAM in one socket and Optane in the other, which was what they recommended for their users to do. But the trouble is, is that every time that JEDEC would come up with a new DDR interface, then Intel would have to follow it with a redesign of DDRT to support this. So that was a big headache for them.

CXL solved that problem. Removed not only the requirement to have a new bus that matched the DDR, but it also allowed you to use different kinds of memory with one processor. Right now, I know a guy who looks at it and he says, 'Oh, is that one of Intel's DDR4 processors or is that one of Intel's DDR5 processors?' And he categorizes Intel's processors by the DRAM they're able to use. This with CXL, you can use both DRAM interfaces. CXL allows far memory, which is the memory that's on the other side of the CXL channel to use any interface. And OMI is a faster version of CXL that is used for near memory that allows the same kind of a thing. I'm not gonna talk much about OMI here, but if you'd like to talk with me about it later, I'll be over in the persistent memory lab in salon eight and I'll take any questions there. CXL also supports memory disaggregation. I have a nice animated slide I didn't bring with me for that but basically, if you've got one application that needs a huge memory, you don't have to load up all of your servers with a huge memory. You can have that huge memory be in a pool somewhere else and then just be assigned to whichever server needs it. So memory pools can be dynamically allocated. Datasets can be removed from processor to processor. This is shared memory and this is something that's even more elaborate that's only available in the third generation of CXL. And it also paves the way for UCIe, the chiplet interface.

So let's just talk about that. Any memory using any server, you have a DDR4 server and you're gonna have some DDR4 DRAM and so you put in a channel between the two of them. That makes sense, that's the way things are always done. With DDR5, you do the same thing. You have the DDR5 talk to the DDR5. These typically would be on different server motherboards but with CXL, it gives you the ability through CXL channels instead of DDR channels for the DDR5 server to talk to the DDR4 DRAM and it doesn't have to be already connected to the DDR4 server. And it allows the DDR4 server to use DDR5 DRAM. So that's a nice thing by itself but these each have to be separate CXL channels. You can also put different kinds of memory. I'm listing some emerging memory technologies which I'll talk about in a minute on these. So this is MRAM, resistive RAM, ferroelectric memory, Optane is one.

I have a question up here.

So when the DDR4 server is connected to the DDR5 DRAM, is there some ...  

Yeah, it would be through a CXL channel.

Yeah, so CXL is basically the voltage levels, the signaling of PCI, but with a different protocol layered on top because the PCI protocol covers an awful lot of bases and it makes it a little bit slow to do that. And so CXL has narrowed that down and without getting into it too much, it will handle either PCI by doing some handshaking at the front and saying, 'Are you a PCI device or are you a CXL device?' And then it will do that. No, there's no extra silicon on the server side. There's extra silicon on the DRAM side because you need to have something that speaks PCI that is a CXL controller over there. And, you know, Marvell, Microchip, I think Samsung makes their own and a whole lot of companies are gonna come out with that. I would expect pretty much any SSD controller company to come out with a CXL controller at some point because it's gonna use the same PCI that they already use on their NVMe SSD controllers.

So yeah, these other technologies, they're gonna require controllers of their own, but you could talk to any of those memories with any of that, or you could even put flash memory in there and talk to it with that. Now, if somebody were to do this, and this is with CXL 1.0 kind of an interface, then you would need to have a separate CXL channel for every one of these arrows on here.

And the people putting together CXL said, 'Okay, that's not really the optimal solution.' Let's get rid of those and stick a switch in the middle, and then just have everybody talk to the switch." So this is CXL 2. And it just allows these people, now the switch does add yet another delay, but it does allow fast access to all of these. These delays are, you know, just, I think they're sub 10 nanosecond delays.

And then if you wanna get more complex and do a fabric, then CXL 3 gives you this, which allows the switches to connect different hosts to each other, two different memory arrays. It also allows memory to be shared between two processors and for it to be coherently shared, that the cache in one processor is not gonna have a stale copy and the cache in this other processor is gonna have a fresh copy of something that's supposed to be fresh in the main memory. So it takes care of that. So, you know, I mentioned the near memory at CPU. When you build a server, you are always going to have DRAM right next to the CPU and it's going to be attached by a DDR interface. It's just your extended memory, your slower memory, which is called far memory, which is gonna be communicated with via CXL. So near memory at the CPU, far memory on CXL, and CXL can support all kinds of memory applications. Large memories for, you know, doing this disaggregation of memory, can do memory pools, memory sharing, for trading messages and memory fabrics. I happen to think that the memory sharing thing is a pretty cool thing because I watched my son play video games and there are times when he changes a scene in a video game and it takes a long time to load. That's gonna go away because he's not gonna be moving data from the processor cache to, or from the processor memory to the GPU memory over an NVMe channel. It's going to be being moved by CXL. So that'll speed up an awful lot. And then I say there are no memory interface dependencies. There really is one. The controller is, the CXL controller on the memory side is going to have to understand the kind of memory it's talking to, but it's not a big sacrifice."

So that leads to UCIe and the UCIe people said, well, let's just take chiplets and put UCIe on them or put CXL on them.
And I put this here, this is, you know, for people old enough to remember it, chicklets gum was like phased out in 2006, but it was something that I grew up with and it's a gum that's in a candy coating.
And I said, oh, that name's too close to chiplets.
So I'll just doctor that picture a little bit.

But what it's for is things like this. This is Intel's Ponte Vecchio server processor. And you can see those little gold squares. Okay, first of all, around the gold squares is this heavy white silver line. That is where the lid of the package on the processor gets glued on. So you wouldn't see this if you were to buy a Ponte Vecchio CPU module. You would see instead just this big, almost square thing of metal on top that said, you know, whatever the processor number is and the Intel logo and all that. But if you peeled that off, then you would see all these little gold squares. The gold squares are separate chips. Some of them are memories, some of them are logic chips. I believe that the one in the upper right-hand corner and the one in the lower left-hand corner are IO drivers. The two largest chips, I believe, are the processing chips for the thing. And then the square ones are probably HBM DRAM modules. So, and Intel says that they're going to be doing that. They're going to be introducing their first client processor using a chiplet approach sometime early next year, I believe they said. I can't remember. I think it's called Stony Brook or something. So one of the nice things is that you can have multiple sources for these chips. That right now, HBM is largely supplied by SK Hynix, which is one of three leading DRAM manufacturers. But the other two, Samsung and Micron, are trying very hard to get into that market and take some away from SK Hynix.

UCIe is really cool for memories because it allows the processors to use, or I'm sorry, it allows the processor designers to use a logic process to build logic out of, and to use a memory process to build a memory out of. Right now, with the older process technologies, you'll have SOCs, microcontrollers, ASICs, and that kind of stuff built in a logic process. And that limits designers to only using SRAM, which can be built out of logic transistors, and NOR flash, which is the only other memory that does well in a logic process. They're going to be, okay, for multiple reasons, that NOR flash is going away. And for a reason I'll tell you about shortly, then the SRAM is also threatened with going away. And so what are they going to use in the future? Well, they could use DRAM, MRAM, resistive RAM, FRAM, phase change memory. They're all something that could be a whole lot cheaper than SRAM and could migrate through processes a whole lot better than NOR flash. And if they did that, then they'd get significant diarrhea and cost reductions, but it would drive them to using a chiplet approach. But one of the nice things is that it would commoditize chiplets. That chiplets right now are not widely used, and so they're sole sourced, and having the memory built on the chip itself is also a sole source kind of a thing. If you have chiplets, then all of a sudden this memory behaves like a memory DRAM, or that kind of memory where all of a sudden it's a commodity and everybody who is building it is competing on price to try to get the business. And so the price goes way down. And that can only really happen if you have the same chiplet used by multiple memory companies and sourced by multiple sources. And so if you have Intel, AMD, NVIDIA, anybody else who builds processors is using the same chiplet, then the market gets big and there'll be multiple sources for it. That'll get up the volume, that'll get the cost down. And then Micron, SK Hynix, Samsung, Kioxia, Western Digital might all jump into this market, say we want a piece of that, we're gonna compete on price, and bam, the prices go way down.

Now, I talked about how I was gonna talk about SRAM. This is something that makes a whole lot more sense to a chip designer than it's probably going to make to any of you people. But it's a graph of the area of SRAM in F squared, which is proportional to the size of the transistors on the chip. And so if an SRAM, let's say that bottom line is 500 times the size of a transistor on a chip, then a typical SRAM at whatever, 14 nanometers or 10 nanometers is going to be about 450 times the size of a transistor on there. When you get up towards the three nanometer area, all of a sudden with the Samsung process, that one bit of SRAM is going to be as big as a thousand transistors are in the logic. Now, that's something that as it goes off into the future, that's gonna be a really bad problem. What this is driving is it's first of all driving for a very large area of a processor chip to be SRAM, which is really, it's not using it to its best cause, but it's also driving the cost up for these chips more than it needs to be. And so it's going to cause an emerging memory technology, probably MRAM if things stay the way that they are today, to become the cache memory in standard processor chips. Maybe not all of the cache memory, maybe it will be the L2 cache, but like that diagram that I showed you where I had the two curves, red curve and the white curve, you're going to see that they're gonna, the size of the caches, the L2 caches, is gonna just grow exponentially on these processors once chiplets start being used and once something like MRAM starts being used for that, because a very large cache can do such a good job, even though it's really slow. So we'll see large capacity future caches using emerging memories in order to drive out the cost.

So chiplet memory can be persistent. I think I've already said that a number of times. And what that means is that you can have persistent code cache, persistent data cache, which is a new thing. And then software will need to be written that really takes advantage of that. So that's gonna require some re-architecture of that. The NVM programming model, I think, is a good basis for this. There will be security concerns. And I was just talking to John Geldman, who's running the security session right now. He says, otherwise he'd be in here. And what if persistent memories with persistent caches would fall into the wrong hands? And do the cache lines, how do you handle that? Do you erase cache lines when they need to be invalidated? Should memory communications and NVM data at rest be encrypted? You know, this is all big questions that are gonna have to be answered. And John says, oh yeah, we're on top of that.

Yes, the question in the back.

Okay, so you were more announcing what's going on in the storage security areas. Is, yeah, you haven't figured this problem out yet, but you're working on it, right? Okay, okay, and so you're waiting for a product. Yeah, it's always nice if you can put together the standard before the product comes out, but the product and the market are really the things that drive it.

Okay, so I guess I've taken care of all that. And so off in the future.

You know, we're, because of things outside of mainstream computing, we're seeing emerging memory falling into place. And that's mostly because in microcontrollers and ASICs and things that use NOR flash, NOR flash can't be built on processes that are smaller than 28 nanometers. And as I said with that other chart, SRAM is growing very unattractive. There's already some use of emerging memories, MRAM I told you about with an IBM and some other places is being used in the enterprise. And there's really strong growth in consumer applications for MRAM, which is going to drive the economies of scale. And so we're expecting that the increased consumption will cause the prices to go down because of the economies of scale. And then the technical benefits will fall into the hands of SNIA members. So fast, very low power, less messy than flash, but they're all persistent.

And we've got a report that we wrote on this. It's the four types of memory that are on there. MRAM, the magnetic one, phase change memory, which is optane, resistive RAM, which the benefit for that is it can go into a cross point just like phase change memory, so it can be really cheap. And then ferroelectric memory, which is something that can be built on current processes.

And all of these new memories are persistent. They have a small single element cell. So I got to my last bullet first. They're all persistent. So they can be used as persistent memory, but they also use a single element cell. A single element cell, as opposed to SRAM, which was so problematical because it uses six transistors, these all use a single transistor or even a diode type select mechanism. And so they can be made very small and they can be stacked into 3D. And so the promise, the reason why people have been researching these things since the 1960s is because they can be built much smaller than DRAM or NAND flash. And as long as they can be built smaller than in theory, they should be able to be built fast. I'm sorry, cheaper. And cheaper is good. It drives the market. They also allow write in place. You're not going to have all of this nastiness in flash of block erase, page write, of garbage collection, or any of that kind of stuff because of the fact that you can just write over existing data. And it also, they also offer much more symmetrical read write speeds. You know, usually the writes are less than 10 times as slow as the reads. And so, you know, and very often there'll be only two times or three times the speed, the time. So that means that they're very fast memories in comparison to NAND flash. They're much easier to use.

You know, this is our view of what the revenue is going to be like out through 2030. It's kind of a small font on there. I'm sorry about that. It's actually 2032 that we go out to with our forecast and the report. And you can see the DRAM and NAND flash, they're growing, but this is a log chart. So it doesn't really show too much, but you've got really fast growth going on in MRAM. And we're expecting to see that happen over time. It might not be MRAM.

It might be one of these other technologies, but there is going to be that. And this is just a plug for our report on that. Each one of those subway lines on there is a different type of technology. You've got MRAM phase change and stuff shown off on the right-hand side. And then all of these different options of them that are being explored right now. And eventually one of these is going to win out, and we cover all of them so that we'll always be a winner. Oh, and the report's now available. There's a URL on the slide, which will be available to you.

So we're almost to the end of our 50 minutes here. I'll just say Optane in its short life, it created a great legacy. It's created a programming module and new architectures. And there are many Optane alternatives that you can use with these. CXL, all of them have their disadvantages, but that was in the table. CXL has opened the door to a new memory architectures. And so the processors no longer need to tie, no longer need to be tied down to a single DDR4, DDR5 interface or a single memory type. UCIe takes CXL strengths and makes them available to chiplets. And chiplets are the way that future processors are going to be made. So we think that emerging memories are going to really solve a lot of problems tomorrow through these changes.

And with that, I'm going to open it up to more questions. I'll try to remember to repeat the questions, which I haven't been doing so far. So please keep them relatively short. All the way in the back row.

Okay, so the question was, instead of using CXL, why didn't we use NVMe over Fabric? And the short answer is that CXL has been designed to be significantly faster than NVMe. NVMe still does a context switching protocol. It still uses interrupts. And that's very fast for NAND flash. It wasn't really fast enough to make Optane look very good. And it's much too slow for anybody to want to put DRAM on. And the main reason why the hyperscale data centers want CXL is because they would like to put DRAM in shared pools. Any other questions? Well, you've all been very easy. As I said before, I'm going to be going over to the, I want to say it, the hardware lab where they've got CXL over there in salon eight. So, and this is awful. You have to walk past salon A to get to salon eight. So don't go to A, go to eight. But anyway, I'll be over there. If you have any other questions you'd like to talk to me about them one-on-one, I'll just go over there and talk with you about them. So thank you very much.