281

YouTube:https://www.youtube.com/watch?v=hxxP5xG807g
Text:
Hi, everyone. My name is Arvind Jagannath, and I'm product management at VMware. I manage some of the memory-related products at VMware for the core vSphere. I have been at VMware for almost four years, and since the last two odd years, I've been focusing mainly on memory and memory optimizations and how can we actually solve some of the customer pain points with respect to resource inefficiencies. So I'm happy to be participating in the Memory Fabric Forum, and today, let's walk through memory tiering, which is a VMware solution.And I'll present to you some of the......the latest updates and where we are going, and also some performance details.

So this is standard disclaimer.

All right. So with customers, what are the pain points that our customers are observing today?So the problem statement is all about underutilized CPUs. So in our virtual infrastructure, customers typically size their infrastructure in such a way that they allocate enough amount of memory for allocation to all their VMs. At the same time, they also allocate enough memory such that it's sufficient for the number of cores on the system. And so they look into sort of the typical......the typical ratios that they need to allocate for their VMs, and then the sizes appropriately such that, you know, all their cores and VMs are used and in a really good fashion, which actually creates just enough space for any sort of failures or insurance for any sort of vMotion-like scenarios, right? So really what we are seeing, in terms of pain points is, customers tend to be memory bottlenecked and most of the enterprise applications, the nature of them is such that all the VM memory that's available appears as consumed.But at the same time, customers don't typically utilize their CPU fully and they're left with a lot of the CPU.So really, if you look at the problem, there is something called memory starved CPUs that our customers are facing.And the effect of that is, they're not really utilizing their infrastructure well and they end up with locked up software licenses, which they have to pay a lot for and plus they also have a lot of wasted power space and cooling that they would ideally want to use.And all this results in a higher TCO for the customers.

So the immediate reaction is possibly to add more DRAM.But adding DRAM is not simple because it increases customer TCO.A higher density DDR, for example, is much more expensive than a lower density one.And so if they have to replace one with a higher density, they have to exponentially pay a lot more.And they also need additional power.Space and cooling for these additional components.There is also this issue where when customers do their sizing and purchase their servers, they typically try to maximize their bandwidth on their DDR channels.So what that means is that they try to fully populate it and they run into a memory wall-like issue where they cannot really add more DIMM into their DIMM slots.Now, that problem is also compounded by newer CPUs because as the cores increase with newer Xeon or AMD CPUs, you will see that this problem becomes more intense and thus scaling by adding DRAM is not really a good solution.

The other problem that customers face and for which they react, is can they add more servers is the other reaction.So when customers try to add more servers, this is kind of more obvious because now they have to allocate more space, power, and cooling and allocate more licenses.And this results in more maintenance and OPEX costs.So this is typically not ideal.So customers really would ideally, you know, use this.They want to improve the utilization of their existing hosts.And they would like to see how they can do more with their licenses, with their infrastructure that they already have.

So in terms of VMware's unique approach, is VMware does page classification.So through years of experience with dealing with products like vMotion, where VMware has been able to look at pages in a very finely granular fashion and be able to manage those pages between hosts or between different memory types, they have been able to come up with a unique kind of a capability where they are able to classify pages of memory.So when we look at a typical system today, which is capable of doing memory tiering for ESX, you could literally think about it as being classified into unused pages or cold pages, which are used infrequently, warm pages, which might be used infrequently as well, and then active pages, which are heavily used.

Now, if you add a cheaper memory tier, VMware is going to be able to do a lot of things, right?So VMware can actually use memory tiering to do intelligent memory placement.Basically what that means is it can look at these pages and proactively try to move and promote and demote pages from the hot and cold tiers.So in this case, the hot tier is the host DRAM and the colder tier is this cheaper memory tier.

And now, after a while, and now, after a while, after performing tiering, what ends up happening is a lot of space is available now.And the intent is not to reduce or expand the capacity.The intent is to really make sure that all the critical workloads do have space to run their most active parts on the host DRAM.So the implication is really on, on performance, and performance is our number one goal.

And so we make sure that there is enough host DRAM so that a lot of the CPU, a lot more of the CPU can be utilized.And thus we can reach to a point where more of the host's resources can be used better.

So basically there are two solutions that are available.One is the NVMe.And the other is the vSphere 8.0 U3.And the vSphere 8.0 U3 is the NVMe based tiering.And today, vSphere 8.0 U3 is already released.And with that, we announced a tech preview.And what that means is that with the tech preview, you could go ahead and deploy a solution that uses NVMe, SSD, and this has to be directly attached on the host.So you can deploy a solution, and then you can deploy a software that uses NVMe based memory tiering.But then this software is fully integrated into the ESX kernel, which means that customers do not really have to do anything special.All they need is a config.And the GA is targeted at 9.0.Again, this is subject to change.And this is, again, you know, things that are still in the works.So we're going to talk a little bit about the software.And also, this is something that customers are able to deploy today, test, run their workloads in their test environment and see the benefits for themselves, where they can ensure that some of their DRAM capacity can be freed and they can run more workloads and potentially use their CPU better.The other solution is Project Beeberry, which is CXL based.It involves, you know, a device that runs some logic.And then this is also something that we are working closely, collaborating with some vendors.And we are at a stage where we can do demos and proof of concepts for Project Beeberry as well.And again, the targeted GA is 9.0 U1, which is also subject to change.But we have both these solutions providing unique value proposition.Both have their pros and cons, which I'm able to discuss further.

So when you look at NVMe memory tiering results, so let's look at, you know, some performance.And we typically run a lot of benchmarks.And one of the benchmarks that is commonly used at VMware is the VDI benchmarks with login VSI.So this graph shows that without significant loss in performance, we are able to scale and improve the capacity by twice, just by having a system that has NVMe tiering.And similarly, we have another benchmark that we ran.It's the VMMark benchmark.It's a very common, you know, VMware-specific benchmark that a lot of our partners use.And what we have been able to demonstrate is scaling of VMMark bundles.Basically, each tile is a bundle of workloads.And these workloads are sort of, you know, unified together, and they can be deployed as an instance.So what happens is, as you deploy a tiered, which is NVMe tiering, and you deploy a certain amount of tier size, you will see that the number of tiles or the number of VMMark bundles can also be scaled up.And this is of a lot of value to customers because now they can just deploy cheaper memory tiers to actually increase the number of workloads they can run.And this goes back to my earlier point about utilizing CPU better because now, you know, more of the CPU is being utilized with this additional workload.

Now let's look at some configuration details.So the configuration, the goal of the configuration on VMware is to make it as simple and seamless for our customers as possible.And really, there are very few settings to enable memory tiering.It does require a reboot.And the other important point is, when we look at the host memory capacity, we see an aggregated capacity.So for example, if the customer's configured a 3-to-1 or a 4-to-1 ratio for DRAM to the secondary NVMe tier, they will see some total amount of memory capacity.And this capacity can be used without impacting any of the workloads that are running.

We also have certain other conditions such as, you know, the type of NVMe SSD that customers can use.So this criteria is well documented in a KB article data.In a KB article that we released along with the tech preview.So that KB article is a memory tiering KB article from VMware.And that specifies what the endurance of the NVMe should be, what the performance class should be, etc.There are certain other restrictions with the tech preview, which we are hoping to overcome with the GA release.

So let's now talk about the CXL-based project, Peaberry.So basically, you know, let me actually...So some of the goals of this is, you know, and why we are doing this is it's a hardware-software co-design.We are working with a few vendors.And this hardware-software co-design provides better performance and uses the properties of CXL to provide improvements in the responses, response times, and performance.And how we can move up to supporting even some of the latency-sensitive or in-memory sort of workloads.So things like zero page faults and ability to track host memory from the device are really essential features that are coming with the CXL protocol itself.So in essence, we use the CXL protocol to our advantage.There are a lot of good things.We implement logic in the device that makes use of the CXL protocol and makes it possible to make the memory tiering better.Again, we follow the same philosophy with any memory tiering solution.There are no additional software licenses, additional bundles, no separate workflows or lifecycle steps to manage this.So this is fully integrated into the core vSphere.

Let's look at the performance and here, as you can see, the performance, we definitely have run multiple performance benchmarks and one of them is the HammerDB with Oracle.And what we observe is with Project Peaberry device, we get up to 90 to 95% performance.And, you know, you can see how close this performance is by comparing the two red charts.So the one on the left is a pure DRAM performance, which shows, you know, the transactions per minute, TPM, for typical Oracle client load.And these clients connect to a server that is running Peaberry.And each workload runs as its own VM on the server side.And that VM is configured to either use DRAM only or use a 50-50 or a one-to-one capacity of DRAM and Peaberry.So when we run these two VMs side by side, and calculate the ratio of the performance, we see that the performance is within 10%.And that's really beneficial to our customers because, like I said before, performance is a very critical goal for VMware.

Now let's look at the customer value prop and how these solutions can increase, you know, can increase the TCO value prop for our customers.So in the middle, you see how NVMe actually reduces the total costs.And basically, you know, if you look at the left-hand side, it's 72K is the cost of the server.And this includes both DRAM and the other costs associated with the infrastructure.But when you move to NVMe, the cost decreases to 58K.And now the DRAM cost is a fraction.And the entire memory cost is a fraction of, you know, the total DRAM cost that's incurred for a pure DRAM system.And then finally, when we look at Peaberry, and we allocate, for example, one terabyte from DRAM and one terabyte additional capacity from Peaberry, you can see that the cost reduces even further.And so you can see how the value proposition of Peaberry actually helps with reducing the costs even further.One key point that I want to mention here is also the fact that not only do we reduce the hardware TCO, we also ensure that customers get some other value propositions, such as, how do they better manage and utilize their CPU?How can they better manage and utilize their software licenses?And basically, this whole thing ties up to the consolidation of their server resources.And we believe that this consolidation will, in fact, improve the value for server vendors.And it will improve the value for, you know, the hardware vendors.Because now, customers understand that, you know, each of those hardware components or the server, including the software, is better utilized and has better value.And hence, we believe that instead of, you know, hurting any sort of sales, et cetera, for our partners, it will, in fact, improve on the core value proposition and improve on the sales as well.Of course, it definitely improves on the cost-based value proposition for our customers.And with that, I will end my session.Please feel free to contact me for any customer POCs or demos or any sort of customer interactions.And I'll be more than happy to support it.Thanks a lot.