-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path4
47 lines (23 loc) · 13.9 KB
/
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Good morning.Welcome everybody.It's a beautiful day and they have found the best spot in town for us for this exciting server and storage track.We have a packed session for you today.The team has put together a very nice program.I am Siamak Tavallai.I am the incubation committee lead at OCP for server project.Within the server project, we have a number of active sub-projects, modular hardware system, ODSA, networking basically, MES 3, NIC 3.We have memory centric solution around composable memory system.We have a work stream around external and extended cabling and we have an HPC subgroup.These subgroups are very active.Companies, individuals from different companies get together, come up with solutions and eventually contribute that to the OCP community.Today, we have a number of sessions out of those sub-projects and external contributors.Christian is the first presenter.Please Chris, identify yourself.
Okay. Thanks Siamak for the introduction.Good morning everyone.I'm Christian Pinto from IBM Research Europe in Ireland and I'm kind of the outsider here so because I'm representing the Open Fabrics Alliance today.So my talk today is mostly, you know, to stimulate some cooperation, chats and further work in the future.So my main area of interest is around disaggregated composable systems.
So this is where I do most of my research and because of this, I'm also co-chairing the Open Fabrics Management Framework workgroup in the Open Fabrics Alliance.So today, the title of the talk is that we're talking to you about how to disentangle your data center with this Open Fabrics Management Framework that we are developing in the Open Fabrics Alliance.So let's start.
So basically a bit of background, what we are seeing in modern distributed systems is that these systems are getting bigger, more complex and heterogeneous and we already have multiple fabrics, right?So you have your Ethernet fabric, you have your InfiniBand fabric, RDMA fabric for storage and soon we're also going to see fabrics that target like composable disaggregated systems.For instance, with the advent of CXL 3.0, you're going to start seeing CXL fabrics.Hopefully, this is what we like to see.And the thing that we have seen and all the participants in this workgroup have basically seen is that there is no common framework for managing all these multiple fabrics and there is no common model for the applications to interact with these fabric managers.And oftentimes, you see different optimizations being applied for different fabrics.So it is a bit of a pain for administrators to administrate this system, to manage these systems, but it's also for software run times, for instance, that want to leverage all these nice fabrics.No, you have to integrate support for each of them one by one.And again, disaggregated systems are going to make this even harder.
So in the Open Fabrics Alliance, we are, you know, taking our stab at this problem and we propose this Open Fabrics Management Framework.The idea is to leverage DMTF's Redfish.So Redfish is -- I'll have a slide later on, but it's basically specification for defining systems, including compute, storage, networking.And the idea is that we give clients a single API for managing the entire system.Where a client can be a human being, so it can be data center administrator, but most importantly, it's going to be software run times.So let's say you have a software run time doing distributed shared memory across machines and you want to configure your memory pools dynamically.So this is the kind of run times that we want to -- that we are targeting and we want to build a common API for.And the idea is that this framework will be able to manage multiple fabrics seamlessly and will enable dynamic reconfiguration of and composition of components.And actually, this aggregated composable system is kind of the main drive behind this whole concept.So we have three layers, hardware management and the clients.
So we start from the hardware layer.What we do is that we are defining what we call fabric agents.So the fabric agent is a component that translates whatever custom API, the fabric manager or the fabric, let's say, provider has defined into a Redfish API.So this is the first step that brings a fabric within this OFMF concept, meaning that once you have an agent,your fabric speaks Redfish and it is a common language for all the fabrics.These agents also take care of actually actuating on requests for reconfiguration of the fabric, composition of components, and also generate events in case something changes in the fabric.There is a flapping link, a new device got added, a device disappeared and so on and so forth.And what we are working on is also-- and this is also open for discussion, like how many details, what is the level of detail that this agent should expose to the OFMF so that we can use these details for doing dynamic reconfiguration of the system using intelligent policies and things like this.
The core of the framework is what we call the OFMF itself.So this component creates a single view of your system by means of a Redfish tree.So I'll show a Redfish tree later on.But you have your own system described as an entire Redfish tree.No matter how many fabrics you have.And basically clients interact with this OFMF that will provide access to events and logs, reconfiguration of the components, and also, of course, authentication and access control, right?Because this is a framework that is a bit of a kind of admin level framework, so you don't want any client to just log in and have fun with the fabrics and reconfigure links and stuff, right?So authentication is important.
And basically, then the clients will have access to this OFMF via this Redfish API.So the API is unique for any fabric or hardware that you have in your system.And we have a layer that sits in between the management and the clients, which we call composability layer.In this layer, we envision resource managers and policies to live there.So for instance, if I am a client of a system and I just need, okay, give me this much memory with these characteristics, for instance, with these latency characteristics, it is this layer that uses policies for identifying the best hardware that fulfills the requirement.So there's going to be resource provisioners for memory, like fabric attached memory, compute storage, and so on and so forth.
Just what is Redfish and Swordfish?So Redfish is a standard.They define a protocol which is restful for management of networks, hardware, compute, convert infrastructure, and so on and so forth.And we also leverage the Swordfish, let's say, specification, which is an extension to Redfish, which specifically targets storage.But let's say that the API specification is based on Redfish.So how am I going to use this thing?
So let's say I have this very complex system in my infrastructure, which is I have a one node that is connected to a CXL switch.And to this CXL switch, we have a Type 3 multi-logical device.So CXL is, for those who don't know it, is the very hot interconnect standard that stands for Compute Express Link.And these MLD devices are specific memory devices where there is one physical memory device that can represent itself as multiple logical devices.And these logical devices can be attached to virtually any, let's say, system that is linked to the upstream ports in the CXL switch.I'm not here for discussing about CXL, so be gentle with me on this today.
And so let's say what I want to achieve is I want to connect CPU 1 to logical device 1.So what you will need to do today if you buy-- let's say there is a CXL switch in the market, you buy it.You will have to interact with the fabric manager and, okay, identify the upstream port where your system is connected to, identify the downstream port which the device is connected to, identify the actual logical device ID that you want to connect to the system, send a specific message to the fabric manager saying, okay, make the connection between this upstream port and this downstream port.Now, whatever happens in the switch is out of the, let's say, scope of this discussion.There's virtual circuits and stuff to be created.This is the fabric manager that does all of this.And most probably, the fabric manager API is going to be custom or specific to the system.So we have seen this already, not CXL systems.There are off-the-shelf composable infrastructure based on PCIe Gen 4 from different vendors, and they have different APIs.
So this is something that we have seen already.So basically, what you need is that the client requires to know the details of the specific fabric and also to speak the language of the fabric manager.
What we are proposing here is that with the OFMF, what happens is that each device or set of devices that can be used for composing together with other devices is represented as a resource block.So the resource block is a standard construct in Redfish.So we have selected Redfish also for this reason because they are already envisioning support for composable and disaggregated systems.So if I am a client of the OFMF and I want to connect a CPU 1 to LD 1, the only thing that I need to know is that I want to connect resource block 1 to resource block 2.So how would that work?
I was showing this composability layer at the beginning.So the clients would basically query the composability layer and ask, "Okay, what are the resource blocks that you have in this system?" This is what you get out of this request.Then clients can go inside each of the resource blocks and basically you get all the information, right?In resource block 1, you will get that there is a CPU with this much memory, the model of the CPU, the amount of memory, the model of the memory, the serial of the memory, all the details that you want, right?And basically, the client selects the resource blocks that it wants, "Okay, I want this CPU with this specific type of memory."
It means, "Okay, I need resource block 1 and resource block 2."So if you want the connection to happen between CPU 1, so resource block 1, and LD 1, resource block 2, the only thing to be done is send this request to the OFMF.
Say, "Create, compose a system with me that is composed of resource block 1 and resource block 2."So all the details on the fabric are gone.They are not really gone. They are just hidden.They are hidden to the client that you don't really necessarily need to know.What is the upstream port? What is the downstream port?And what is the actual logical device ID that you have to connect to your node?
So to show you the level of detail, this is how the OFMF would represent this system.It's a redfish tree.So in the redfish tree, you get all the level of detail down to the actual ports where the components are connected in the switch.And here, with these, like, shades, I'm showing what would the resource block represent within the redfish tree.So all these complexities is completely hidden from the user.
So whenever we send that request to the OFMF, what happens is that it creates a set of -- we call them fabric connections.And by identifying the fabric, it also contacts the specific agent that is in charge of managing the CXL fabric, for instance.So the user, the client doesn't really interact with the agent.It doesn't even know what is the agent, which agent is managing the fabric.You just interact with this system, and all the rest is taken care of by the OFMF.
And so just to summarize, basically, the only thing that the clients do is they post resource blocks to be composed together.The rest is happening automatically. So the composition server figures out, "Okay, this resource block means we need to create a specific connection in the fabric between these ports."And this request is sent to the OFMF, saying, "Create the fabric connections."When the OFMF receives the fabric connection creation request, what it does, it says, "Okay, this fabric is managed by this specific agent."So the agent is triggered.It does the actual reconfiguration of the hardware.And of course, then there is messages that go back to the client saying, "Successful," or "Something went wrong."And that's it.Your request is taken care of, and the Redfish tree is updated in the system.So after the composition happened, if you want, you can actually query the Redfish tree to figure out which server is connected to which device in the fabric, and so on and so forth.Here I made the example of CXL 3.0.This would work with any fabric, right?It doesn't necessarily have to be CXL 3.0.So it can be other types of fabrics.You can create, for instance, virtual networks between systems, and so on and so forth.The thing is, we are actually trying to abstract the specific details of the fabric from the client.
And so I'm here today because we have seen that there are some synergies with the projects going on in the OCP.Specifically, the Composable Memory System project is one that is very close to what we are targeting here.And since we are working on doing hardware software co-design for building these composable memory systems, it would be nice to interact so that we make sure that enough information is exposed by these systems up to the higher management layer so that we can use this information for doing intelligent configuration and composition of resources.Also, the Hardware Management Project is another point of synergy because they build hardware software for managing systems.So, for instance, you could even foresee these OFMF agents become part of these management modules that are being developed in the project.
So there is a call to action.So on top you see a link to the website of the work group that I co-chair in the Open Fabrics Alliance.I suggest to have a look, reach out.Of course, the calls are open to anybody.Anybody can contribute.You can see also who are the partners involved in this project.And then the rest is the links to the OCP projects that have an affinity to our work.And this is it.
So please, if you have any questions, comments, suggestions, you want to have a discussion together, that's why I'm here today.
Thank you, Chris.