-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path144
50 lines (25 loc) · 13.8 KB
/
144
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Thank you, everyone, for joining us today.My name is Samer El-Haj Mahmoud.I'm a distinguished engineer at Arm.And my colleagues here are --
Hi, James Bodner.I am a firmware manager at NVIDIA.
And my name is Tim Lewis.I'm the CTO of Insight Software.
And today we are going to talk about Arm's approach to designing standards-compliant servers across an open ecosystem and how we collaborated with NVIDIA on designing -- or using Arm standards for NVIDIA's Grace and the firmware ecosystem around that.
So I'll start by mentioning just a few quick things, just a background of what Arm System Ready is.So Arm System Ready is a compliance certification program that is based on a set of standards, standard requirements, specifications for the hardware, for the silicon, for the firmware and the interfaces, the system software interfaces to the operating systems, as well as compliance testing suites and tools around those specifications.And finally, a certification program to verify and ensure and increase the confidence in the compliance of a particular system.The goal of Arm System Ready is for software to just work on Arm servers that are designed based on these standards.
We're proud to have grown the program into a large ecosystem of many players across the industry starting with our ISVs, CSPs, OEMs, silicon providers, IHVs, independent firmware vendors and communities.And I'm proud here to be standing with two of our partners, NVIDIA and Insight, that are part of the success of this program.
Last year at the OCP Global Summit, Arm also launched the first OCP Experience Center in North America, hosted by Arm, allowing us to be a solution provider for OCP, as well as having this Experience Center provide services for companies that are trying to get their hardware either OCP ready to be OCP accepted or system ready certified.So we provide both the system ready and the OCP accepted.
And since then, we have also worked with the OCP to make system ready prerequisite, a requirement for servers that are being submitted as OCP accepted.So for Arm-based servers that are going to go through the OCP accepted criteria and checklist, one of the items in the checklist is to ensure that they first go through the system ready program, make sure that they are designed based on these standards and interoperable security and basic functionality, still allowing partners to innovate at all layers, at the silicon, at the firmware, at the software, but using the basic building blocks that ensure compatibility.And we have, if you check the latest OCP accepted checklist, that program is already integrated here, and we're already starting to see some Arm-based servers showing up on the product catalog, the OCP product catalog.
Now within system ready, we're trying to address a wide ecosystem for Arm devices, from data centers, hyperscalers, edge embedded, all the way to rich IoT devices.And that requires some tailoring of the requirements, although using the same principles in general, there are some vertical segments that require some tailoring of the requirements.So for servers, we have system ready SR as the main, what we call band or category of certification that focuses on wide compatibility with operating systems and hypervisors and the type of software stacks that people expect to run on industry compliant servers.We also have system ready LS as a variation that allows Linux boot and OSF firmware as the basis for the firmware interfaces to boot Linux, especially for hyperscaler market.We have extensions of the program that cover virtual environments for VMs running in the cloud or in other virtual platforms, as well as optional security interface extensions covering the foundations of the security interfaces to the software.So things like TPM, secure boot, and secure firmware update.
The program, as I said earlier, is based on requirements specifications published by Arm.The few that I will go very quickly through here are the base system architecture, the BSA or the server specific version of BSA, which is the SBSA.Those specify the silicon and the hardware requirements for silicon providers that are integrating Arm IP or the Arm ISA architecture in building their server SOCs.Then we have the BBR specification, which covers a range of recipes for firmware requirements for different segments.The ones that are most applicable to servers would be the SBBR and the LBBR using Linux boot or using standard UEFI ACPI interfaces.The base boot security requirements, which span requirements across TCG and other based on other standards for TPM, secure boot, and secure system design checklist.And finally for manageability for the control plane of the servers, we have the SBMR, which standardizes some of the areas of how does the Arm server chips connect to the BMC on their sideband interface or in-band interface, allowing a wider interoperability for the computer manufacturers, for OEMs, for cloud providers to choose their silicon suppliers while still being able to fit in the standard management patterns of connecting to a BMC.
Now it's important to note that all of this is not done in vacuum.Arm develops all of these standards as part of a collaboration model in what we call the system architecture advisory committee.Think of this as kind of an industry consortium of all the Arm partners, all the players in the ecosystem, where they have their voices are heard, their input, their consensus is reached either driving Arm specifications like the ones I just mentioned, consuming what's happening in industry standards and figuring out how it applies to the Arm ecosystem or contributing back what we think are gaps or changes needed in the industry standards.And we have this working relationship across many of our partners that are participating in this group.
I mentioned testing earlier.So our compliance testing suite is open source, available on the Arm GitHub, was contributed last year to OCP for platforms that would like to go through OCP accepted and are required to get system ready, then they need to use this test suite to run for the compliance testing.And it's modular, uses both UEFI and Linux and covers all these different specifications I mentioned.
This year, or what's new that we just announced a few weeks ago, we launched manageability compliance testing, the SBMR-ACS.This is currently hosted on the Arm GitHub.It's public.We are in the process of contributing it to the OCP.It is based on the open BMC test automation and the robot framework.It helps in automating the compliance testing of Redfish, IPMI and other interfaces.There is potential to add other things in there as well.Currently based on the SBMR specification from Arm, a lot of these requirements, though, could span servers outside the Arm architecture.Some of these could be leveraged.The nice thing about this is it also automates the OCP hardware management compliance testing for the OCP manageability profiles, which are required, the basic and the server profile.
And the last thing I want to add here is that the journey for the success of the Arm servers, all these servers that you see at the expo floor displayed by our partners and across really now numerous manufacturers and cloud providers, the journey for this success really starts from the pre-silicon design and the IP selection when the architecture during the product definition and the architecture phase of building a new silicon.And for that, Arm has worked on establishing what we call the system-ready pre-silicon compliance testing that could -- a set of tools that are available, some from Arm, some from our EDA partners that can be used to ensure that the silicon that's being built is compliant to these standards before tape-out, before the silicon is finalized.This journey then continues after the silicon is taped out through bring-up, through the firmware development, until there is a reference board from a silicon provider such as NVIDIA providing their Grace, which is what Jim is going to talk about next.Jim?
All right.So NVIDIA decided we're going to go build a data center CPU.Everybody knows us as the GPU company, but now we're going to make a CPU to innovate and take AI even further.I actually have one.If you haven't seen it in the booth, this is what it looks like.This is actually two Grace processors.And so like Samer said, you build a processor, but the processor doesn't do anything unless you have an ecosystem or a system that you can put it into.And for us to sell this, we need to get this to market fast.We want to build that upon standards.So this is where our partnership with Arm comes in to use those standards.Yesterday, Samer was in a session with Ampere where you'll see a similar message where we all work together in these sessions to build these standards.Because really for us, if you want to use our GPU with a Grace, you can do it the same way as you can use our GPU in an Ampere box, all based on Arm.
So we start with the same standards Samer talked about, BSA, BBR, SBMR, all the things you need to build a system, gets the firmware and the ecosystem ready.Also for us, what was important is to standardize some other pieces in the industry such as firmware update and attestation and telemetry gathering.If you were part of the GPU working group sessions talked yesterday, you'll notice NVIDIA standardizing our GPU management through the same APIs.So now you want to get telemetry, you want to do security, you want to do firmware update, the GPU, the CPU, even our DPUs and NICs will all be the same.
So you got the standards, now you've got to build a solution and the firmware is the key to getting that working.So we started with kind of the open source community to do that.So OpenBMC for our management stack, EDK for UEFI for booting, and even trusted firmware for the Hafnium and TFA which runs on the trusted zone of the ARM processor, we all use open source and then we donate this back as it matures and it gets ready and accepted in the industry.So you build all this and you've got a system and then how do you know what you built works?
And this is where the certifications come in.So we're happy to announce that NVIDIA is now Grace certified for ARM system ready.You'll actually notice we have four different reference designs listed up here.This is both our GH200 or the CG1, the Grace Hopper, or the Grace Superchip which I have here which is our dual socket Grace design.We have different form factors we did including our new MGX form factor that Jensen introduced back at Computex earlier in the year.Also I don't want to forget it, we were actually the first ARM partner as well to achieve the server certification which again for us building a system foundation with security is critically important.
Another fun note, we also build a DPU which uses an ARM CPU, Bluefield.That was certified with ARM system ready as well earlier this year.So you run your operating system, your foundation on Grace, that same base technology and certification can be now done in the DPU which can also run an operating system.So you can see the correlation and the benefit of these standards.
So that's how we did it and so we have reference systems.If you look at this we're kind of in the bottom green swim lane here where we talk how NVIDIA does it.They can take our open reference code and go do it themselves but doing it yourselves can be difficult.Open BMC is a journey for a lot of companies to take so we also partnered with Insight who has an open firmware solution to be able to take what we did and then help us get that to market quickly and Tim will walk us through that.
Thanks Jim.So about more than 20 years ago, more than 30 years ago, 12 blocks from here, I stood on a field and received a piece of paper and that piece of paper was when I graduated from San Jose State.It said that I met the minimum requirements required by the university to graduate from that school and that I had gone a certain number of courses to meet those requirements.So system ready is much like that in the sense that we have at Insight gone through the work along with our partners NVIDIA and ARM to achieve system ready piece of paper.Get one of these nice little statues.But frankly that's not the end of the journey.Neither it was not the end of my journey when I was a young man getting my piece of paper on that field at San Jose State.It's not the end of your journey.You need to customize it.You need new features that have to make your platform ready to go beyond just the certification but actually to go to production.Insight is the partner who can do that.Both whether it's your firmware, your BIOS firmware or with your BMC, our supervised OPF product.
So we're proud to announce that in the last month we have achieved system ready status with NVIDIA on both SR and SIE on the 4351, the Grace G200 or the Grace Superchip, the 4352.That allows us to say that we are ready.We have done the minimum requirements not only to do it but we're also ready and enabled to go the next step with you in your journey towards production for your platform.
So we're certified.We're certified with NVIDIA and Grace Hopper.We're certified with ARM on this thing and we're ready to help you to be certified as well.So at this point I would say that means that this ecosystem is ready.It's not just an ARM thing.It's not just an NVIDIA thing.It's now a whole ecosystem thing because we can help you get to that next step.
That's the whole thing.SR makes sure that you know you have a certain level of trust.SIE makes sure you have a certain level of trust and the SBMR is going the same direction.We're now compliant with SBMR as well with our BMC product.You have a certain level of trust.
So join system ready.Is this your part now?Are you doing the last slide or I might have finished this slide?All right.We're out of time.All right.Come join system ready.Join us in the OCP experience center with Samir and Jim and also visit the ARM booth B14.Thank you very much.