-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path74
117 lines (58 loc) · 40.1 KB
/
74
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
All right, let's get started. So, hello everyone, I'm Jason Molgaard. I'm an architect at Solidigm, also co-chair of the computational storage TWG at SNIA. And I'm here with Shyam Iyer, who's a distinguished engineer at Dell and is the chair for the SDXI TWG. And it's interesting in the previous talk, for those of you who were here, Steven mentioned SDXI and computational storage, and that's exactly what we're gonna talk about, is the combination of those two technologies. We may need to do some QEMU emulation, Steven. So that we can keep that in mind for the future, 'cause we have not done that. So this is a kind of a look at some early work that we've been doing, a collaboration between the two TWGs on computational storage and SDXI.
So briefly on the agenda, Shyam's gonna give an overview of SDXI for those who aren't familiar with it. And I will give a quick overview of computational storage. Hopefully it's not too repetitive from some of the material yesterday, but just wanted to make sure that everybody's on the same page, and then we'll jump into the combination of SDXI and computational storage.
So with that, I'm gonna have Shyam come up here and educate us about SDXI.
Okay. Thanks Jason. All right.
So this is the short version of the SDXI overview. I'm gonna be talking more about it tomorrow for the long form edition of it. So I will try my best to give the relevant things so we can get down to the combination of how this works with computational storage. So what is SDXI and why did we actually invent or try to work on some standard for memory to memory data movement? Well, turns out our current data movement standard is a software-based memcpy. Why? Because it has a stable instruction set architecture, it's been there for a while, but we have problems because it takes away from the application performance. It incurs software overhead while providing context isolation. Offload DMA engines are not a new concept. It's been there forever, but we all have experienced vendor-specific offload DMAs that we have to work with to get the right kind of programming going on. And there is no standardization at user level software, yes. And that's one of the reasons we started looking at a memory to memory data mover standard in SNIA that is one extensible. It's also forward compatible and independent of IO interconnect technology. So we started in June, 2020 and almost 23 member companies contributed to it, close to 89 members. And yay, we got 1.0 released last November, 2022. So that's the link there. You can go and download this. It's got some of the cool things that we worked on as part of this multi-vendor standards group.
What does it do? Among some of the design tenets that we ended up designing with, assuming you have an application and you wanted to do memory to memory data movement, today you have lots of software layers in between that you need to do to just do simple data movement. We're trying to cut down a lot of these context isolation layers while preserving some of the tenets like architectural stability and security that these layers of software provide for us in the form of hardware. Also allowing you to directly program these accelerator interface from the user mode. While we're doing this, we're making sure that we are not just trying to do this for say DRAM memory target. Could be something for persistent memory or memory behind IO devices or maybe fabric extended memory like CXL. Also, this interface is such that it doesn't have to be tied to one CPU architecture family. So if you're an x86 processor vendor or an ARM vendor or a RISC vendor, you can have this accelerator interface as part of your package. There isn't something specific about an architecture there. It's also form factor independent, which means you can have the accelerator interface as an add-in card or a discrete component like in a GPU, an FPGA or an IO device like a drive. So we did all of this also keeping in mind that you can abstract and virtualize these interface such that you can do live migration of applications from one host to the other host. One of the important things for accelerators to do is to be able to quiz and stop and resume the same work in a different host. This is incredibly hard to do. And this is kind of one of the things that we tried to build it into the spec early on. And while we are at this, we are trying to make sure that, you know, we can leverage offloads using the same framework, define new kind of operations as you're moving data. And that's all part of the standard framework here.
This is a much simplified view. Again, long form, please come talk to me tomorrow. What I wanted to explain here is all of the states that we describe in the standard are in memory or globally defined such that you don't have any hidden states in the device. So you don't need to worry about, oh, what was the accelerator doing when it was stopped and quizzed at a particular point of time. There is one standard descriptor format and with some scope for future expansion, which allows us to talk the same kind of language and not have to worry about different kind of programming interfaces. It's also easy to virtualize because of the structures being architected and being all states in memory. And there is an architected setup and control mechanism for the function. So the function has various status or states in which it is in. So you can know exactly what state the function is while you're moving the context. While we make sure that the spec is interconnect independent, the DMA bus can be PCIe, CXL, or anything else that you want to bring to the table. But just for an example, we did provide a PCIe-based binding in the spec. So there is an SDXI class code that registered with PCI-SIG for PCIe-based implementations.
So this shaded region that I'm calling you out is the structures that a user space application needs to know about for it to produce the work. So, you know, in this structured model, the function has some MMIO space, context tables, and state structures that govern how the context is defined. And within this shaded region is where all the data structures are required for an application to produce the work. A descriptor ring is provided here. Whenever the application needs to enqueue work, it increments the write index. When the function has read the descriptor, it increments the read index. When you have to notify the function there is new work, you have doorbell mechanism. In the descriptor itself, we have structures that show you how, when the completion status, when the operation has finished, and that's what the completion status tells you.
This is the outer shaded region that a privileged software can control. So again, this is a layered model that allows you to have separation between user-level software and privileged-level software.
This is, again, a complicated example of how SDXI helps you do address space to address space data movement. I'm not going to go into the details of it in this talk today, but just think about, let's say, the middle address space B, say something like a hypervisor, and it's trying to do the data movement from address space A to an address space C, which could be like a VM, VM A and VM C. You could produce the work for a data movement operation here, and once the descriptor is read, you get the allowed address spaces from the descriptor that address space B is able to move data between, and those are A and C in this example, and if that checks out, the function on address space A goes and checks an identifier, which tells us that address space A is allowing a function B to reach into its address space and read data from it or write data to it, and the same thing happens on address space C, and this kind of a check and balance mechanism is for the security to make sure that address space A to address space C data movement can happen, and this is one of the examples that we provide of how an address space to address space data movement can happen. This is a complicated example. You can do the same thing with a single address space system or a two address space VM to VM kind of data movement where both VMs are trying to do data movement between each other. There are various extensions of this. Some of them want to do the same kind of example for host to host address space data movement, and we are kind of working on this in the next versions of the spec.
I won't go into the details here, but this is an example of an application pattern when an application needs to perform a memcpy. This type of accelerator interface helps you do that by offloading that to a DMA operation where you're just copying the data from a source buffer to a destination buffer.
This is another example of a storage data movement application pattern, and as you can see, an application is trying to do the movement to a storage, and there's one memcpy here, and then kernel mode driver does another memcpy from kernel buffer to a DMAable buffer, and then the storage picks the data from the DMA buffer to the storage. Now, the reverse thing can happen when you're trying to read the data from persistent storage, and as you can see, there are multiple memory buffer copies that happen. Now, you can solve this using the help of a persistent memory buffer that just, you basically move data into persistent using memcpies, but again, we come to the same problem that you are taking away from the application performance. If you did that with the help of an accelerator, now you can perform moving data into persistent and retrieving it back to it by just using them as memory pointers.
This is another example of virtualized data movement that I was talking about earlier. This type of accelerator pattern allows you to do VM to VM data movement without involving the hypervisor during the time of the data movement. Of course, the hypervisor is still in control whenever it needs to stop it, quiz it, resume it, but we can do live migrations of these VMs in a very architected way using this pattern.
This is an emerging use case that we are also thinking about, which is, like I said, SDXI devices need not be a PCI device. It could be a CXL device as well. An SDXI device, which is a PCI device, can also use a memory expansion. Like CXL is working on memory expansion right now, so all of that memory becomes part of the system physical address space. So SDXI works with any memory that is part of the system physical address space. So if you were trying to move data from CPU attached memory to another region of CPU attached memory, you can do the same thing between a CPU attached memory to a memory expanded using CXL. This picture on the right shows you where an SDXI device is an actual CXL device potentially, and so now you have things like the CXL device itself may have memory, so an SDXI construct can be used for moving data between CPU attached memory and device attached memory. So all of this is part of the emerging use cases where data movement can be augmented with an SDXI-enabled architecture.
These are some of the other investigations that we are doing. Again, plug for my talk tomorrow, I have a lot more prettier pictures than the set of slides, you know, bullet points here. Key thing to note, SDXI 1.0, yes, we got it up, and 1.1, we are still working. We have more operations that we are working on. We're looking at host-to-host. We're looking at how we can improve QoS. We're looking at newer operations that involve memory-to-memory data manipulation along the way as data is being moved, and this is all part of our investigations right now for 1.1.
It is not just the spec work that we are working on. There is a new software group within the technical working group right now, and it is working on software-enabled activities related to the standard. We currently have a project for libsdxi. This is a library implementation of, or rather an OS-agnostic user space library that shows you examples of how to use the programming interfaces. As of last month, we just had initial code submitted by AMD and Red Hat, and a big shout-out to them for being the first contributors to the project. We are having regular meetings now to expand the project. There is upstream driver work underway to have an upstream SDXI driver for the Linux kernel. This work is not happening inside of SNIA, but a lot of the members that participate in these activities are also participating in the software group, and we are helping them with support outside of the group as well. Something that Steven, not this, the other Steven would be very, very happy about is we are also looking at an emulation project. We have been looking very closely at vfio user as well. Part of the reason is the licensing makes a lot of sense for SNIA being BSD3-class, and we are looking to see if we can enable that for future enablement activities. The last one is we are trying to look at compliance tools that we can build as part of the software work group, and one of the things I say is we are vendor-neutral, but if you're not part of the group, you can't hold us responsible for the bias, so if you want us to be really, truly vendor-neutral, if you're not part of the group, come and join us. The other thing that I would say is we are working very closely with computational storage, and that's why we created a new subgroup, the CS+SDXI subgroup. The prerequisite of joining this subgroup is you need to be a member of both computational storage TWG and the SDXI TWG. With that, I would like to hand it over to Jason to give you an overview of computational storage, and we will tie those two things together and field some questions.
All right, thanks, Shyam. These next few slides are somewhat repetitious from three different presentations that already happened yesterday, so I'm going to go very, very fast for those who maybe already saw those presentations, but for those who haven't, you get a quick overview, and then you can certainly go reference the slides.
So in SNIA, we have defined three architectures in our architecture and programming model: the computational storage processor, the computational storage drive, and the computational storage array. The computational storage processor connects up to a fabric but doesn't actually have any data storage. The computational storage drive is what you would be familiar with in terms of a traditional drive, but also has some compute capability, and then same thing for an array. It's an array that you would be familiar with, including all the control software and everything, but has additional compute capabilities.
Detail on all of the sub-blocks of that teal-colored block that was shown in the other slide, each of them has these computational storage resources. I'm not going to walk through them. They're here for your reference if you want to grab these slides if you're not already familiar with these terms.
In another presentation given by Oscar Pinto, we talked about the computational storage API. So just as a quick refresher, we've developed this API that is supposed to be agnostic to the device. It's OS agnostic as well. That way we can go off, we can use that library, develop a plugin that would be vendor-specific, and of course a device driver for your computational storage device, and then everybody can use that same interface to the SNIA API for computational storage. This is relevant. We're going to touch on this a little bit more coming forward as we want to tie this in with the software work that's happening with SDXI moving forward.
All right, a third presentation that was yesterday on computational storage, very briefly, talked about what was going on in NVMe related to computational storage. Briefly, there's two new command sets that have been defined, the computational programs command set for performing computations down inside of an NVM subsystem, and then of course there's the subsystem local memory command set, which provides the memory needed in order for a program to execute down in an NVM subsystem. And this picture shows the computational or the compute namespaces over on the left, the subsystem local memory in the middle, and then of course the NVM namespaces that we're already familiar with and exist in traditional SSDs on the right-hand side. And so these are new command sets. They're making their way through NVMe today and will be coming soon in order to support computational storage.
In one of the presentations, there was a comparison of terms, and again, I won't necessarily walk through all of these, but I think it's important to understand that the SNIA terms and the NVMe terms maybe are slightly different, even though they mean the same thing. It's helpful to have this kind of decoder ring if you're having a conversation and you're not familiar with the two sides. So available for your reference. All right, that was a very whirlwind tour of computational storage. Feel free to grab me if you want to go over it in more detail. I just didn't want to repeat three hours of presentations again.
So let's move into the combination of SDXI and computational storage, because that's, after all, what we're very interested in.
So I'll get us started on this slide, and then I'm going to turn it over to Shyam. So I think that one of the very first questions that people are probably sitting there asking themselves is, 'Computational storage is supposed to reduce data movement. Why do we need a data mover?' Well, that's a great question, and I think that it really comes down to this. We want to do the compute where the appropriate compute engine is. Right? Computational storage is not going to solve every problem in the world. You can't run every workload on it. Some workloads have to be done on the host. In the same way, some computational storage drives may have a compute engine that is better suited for a particular workload, and others do not. So in that case, we want to be able to do a peer-to-peer data movement from one computational storage drive to a different computational storage drive in order to perform an operation, because that drive happens to be better suited for that workload, but it maybe doesn't have the data. We've still offloaded the host in doing that process and allowed the host to go off and do other things. SDXI, as Shyam will get into in great detail in his presentation tomorrow, it reduces the amount of data movement within the host software stack. So it's very efficient. So the fact that we're doing a data movement with SDXI, we're doing it in the most efficient way possible and still offloading the host at the same time. All right, I'm going to turn it back over to Shyam. He's going to walk us through the rest of this slide.
Some of it. So the second question that we get asked is, 'Okay, SDXI is a memory-to-memory data mover. Computational storage computes on a storage device. What are you guys talking about?' So the reality is memory is everywhere. There is memory within a computational storage device. As Jason was talking about from the memory range set, the terms that he was talking about, there are different regions of memory within a computational storage device. And you can be talking about host memory, you can be talking about PCI memory, you can be talking about device memory. SDXI is a memory-to-memory data mover. And you need to compute on the device or the data wherever it is. So data in use memory is where computation needs to occur. So SDXI bridges the two worlds. And while it's doing that, it can also allow data to be transformed as it is moving the data. So it can provide the computation inside a computational storage device as in terms of data transformation. Take an example, compression. That's a computation. But that compression can be done as you're moving the data from a source memory in host to a device memory in the device. And computation can be, like I was mentioning, is something that you can enable within the device and have it done privately or make it available to the host to be done by the host itself. So this is one of the reasons we created the subgroup. And it's a collaboration between the SNIA technical workgroups here. And what we're trying to do is at least develop a unified block diagram that imagines a computational storage architecture with SDXI devices in them. And while we're doing that, we are developing the use cases for SDXI-based CS devices. Some of these are the member companies that are participating in them. Some of them are regular attendees to this. But this is one of the stocks where we would like to invite all of you to come and join this group and contribute by way of use cases, new proposals as we get down this path. Again, this is an early efforts preview. And we enjoy -- I mean, we are a fun group. We don't bite. And the least that we can do is -- I mean, both Computation Storage Twig and SDXI Twig are winners of the most innovative memory technology. So the thing we can do is shame you into winning an award. So you might as well join an award-winning group.
So something else that gets brought up is, okay, what about NVMe? So SDXI is a data mover engine. It's also fabric agnostic, which means it could be moving the data on the PCIe as the DMA bus, CXL, or something else. Now, NVMe also has DMA as an integral part of it. If you're not moving the data, you're not doing NVMe right. So how does SDXI work with this? So SDXI can work with host addressable memory if it is external to the NVM subsystem. All right, if the NVMe device has private memory and it's not mapped to the system address space, then if your SDXI instance is external to the NVM subsystem, it can't work. However, an SDXI instance, if it is internal to the NVM subsystem, it can move data between the device private memory and the host addressable memory. And within the device memory space, it can move data from one NVM host memory space to another memory space. Now, if the device memory space was mapped to the host by way of like a CMB or a PMR region, CMB is control memory buffer, and it's a construct for NVMe where some of the device memory can be mapped into the host physical address space. If you had an SDXI instance external to the NVM subsystem, then it's fair game. You can move data between that CMB location and a host memory location, and you could be performing data transformations. So this is one example of, you know, when Casablanca's last dialogue is, "This is the beginning of a new friendship," kind of, "This is the beginning of a new friendship with CS, computational storage, SDXI, and NVMe." So there are multiple ways that these technologies can complement each other.
So pictorially, this is an example of an SDXI instance in your system topology where it's external to an NVM subsystem, and it can be employed to do the data movement between host attached memory and device memory, provided that's mapped to the system physical address space. Now, Jason talked about the computational storage API in his section. So a computational storage API can, southbound to it, have a storage driver with a computational storage extension interacting with the storage, and that's something that Oscar talked about in his talk yesterday as an NVMe example. Now, you could also have, southbound to the computational storage API, an SDXI driver and a library that can be added to the pipeline of a computational storage API work. So you could be using SDXI to move the data from host memory to the device memory and do the transformations as part of a computational storage API data flow. If the SDXI instance were internal to the NVM subsystem, now the SDXI is performing the data movement between host memory and device memory, but it needs a producer inside the NVM subsystem. So a producer in this case is the SDXI -- I mean, the computational storage API, southbound of the storage driver on the host that is driving SDXI within the NVM subsystem. Now, you can also think that SDXI can compute on the data that is local to the NVM subsystem, and that's where you can do a few of these transformations.
Something that the group is working on is the various data flows through which you can perform these kinds of computations. Again, I would invite all of you to come and join these discussions we are having in the subgroup. And for the next set of slides, I want to invite Jason, and he goes into some of the configurations we will be talking about.
All right, so we've looked at SDXI external to an NVM subsystem and SDXI internal to an NVM subsystem.
So let's kind of move into a couple of device types that the group has envisioned so far. We've deliberately tried to come up with names that did not invoke thoughts of CXL. So we've got type A and type B that we've thought of so far. So with type A shown over here on the left-hand side, it's essentially a device with an NVMe interface. It's got an NVMe function. And it's got SDXI down on the inside. So we've got SDXI inside of our NVM subsystem in this case. And so the thought is, well, what kind of compute could you do with said device? And certainly the first one that we've listed here is if you had an NVMe plus SDXI command, it would be one option. That does not exist today. We are dreaming at this point. What if? So that is one scenario that could be invented. Another scenario that is much further along and is nearing completion today is an actual NVMe computation command. So I very briefly skimmed over the new commands sets that are being added in. They're progressing through membership vote, ratification, and whatnot. So those are on their way. They're much sooner into existence. But that is a possibility. And through that command, then it could potentially invoke SDXI and some of the data manipulations that it can perform as part of the computation that we want to do. And then, of course, number three is related to just that. So what if we wanted to, through a data movement operation, do a transformation? And that SDXI operation would be transparent to the NVMe. So an NVMe command comes in. And as part of that NVMe command, we say, oh, we need to move this data from one SLM region, perhaps, to a different SLM region, or from one device to another device in the process of moving that through that transformation. And so it's behind the scenes. NVMe doesn't even realize that that's what's happening. But the SDXI is invoked and causes that to happen. All right, so switching over to our type B device over on the right-hand side, the difference you'll notice is that not only is there an NVMe interface, but there's also an SDXI interface, or an SDXI function. And so now the host is aware of that. It knows, oh, there is SDXI down inside of this device. And so it can issue SDXI operations directly to that device, in addition to the NVMe commands, and through those SDXI operations, do some of the same things. And we think it opens up one other method of performing compute down in the drive. And that is that because we have that capability, now you can issue your NVMe commands directly to the NVMe interface. You can issue SDXI operations directly without necessarily having them be transparent. It can be very explicit in that regard, doing some of the exact same things.
All right, taking those two device types and expanding further from there into different configurations of systems that we've kind of thought of at this point. So we've got X, Y, and Z, not to be confused with A and B, of course. Configuration X, we've got kind of two of our blocks that we had shared previously. In this case, we've got device one has an NVMe interface, or an NVMe function. Device two has an SDXI function. So it only has an SDXI function, in this case, for device two. Of course, there's SDXI internal to that device. So the host is able to issue NVMe functions or commands to device one. And it can issue SDXI operations to device two. And through that, we can either move data back and forth between those two, or whatever the case may be in terms of the computer data movement that we need to do. For configuration Y, both the device one and device two have NVMe interfaces. So this would be like the type A device shown, where the SDXI would be transparent. So you'd issue an NVMe command. And through that command, the hardware would interpret it and say, oh, we need to perform an SDXI type operation in order to satisfy whatever that NVMe command that was requested. And then configuration Z, we've kind of put it all together into a very large operation or configuration, where we've got device one, which is just a traditional NVMe interface drive that we'd be familiar with or device. We've got device two, which is an NVMe interface device with SDXI. And then we've got a device three, which has both an NVMe interface and an SDXI interface. And of course, again, because the host would know that, and it would be able to issue those commands or operations as appropriate to the appropriate interface and the appropriate device for data movement or compute.
So this is a diagram that we shared yesterday, just as a teaser leading up to this presentation today. And so this is a block diagram that we have spent quite a bit of time developing in the group, where we've taken the picture of the computational storage drive, that shared earlier from the architecture and programming model, the computational storage architecture and programming model. We've put two of them together. We've expanded on that, added in a couple of hosts up at the top. We've added in some shared memory over on the right-hand side. And we've drawn in then red and green arrows to represent where could SDXI play a role in data movement, and who would invoke that SDXI operation. And what could be done with that operation as well.
Question?
Yeah, my question is, for the scenario of three options you showed before, can any of those handle the case of like sharded data, and then if it's in a distributed environment where you have some drives that are not capable of it, and so don't have SDXI, and then some that do? Does it scale with that SDXI plus small patients, or does it scale in a hyperscale environment?
So to repeat the question for the recording, the question is, does an SDXI-based computational storage system scale in a hyperscale type environment? And I think that's exactly-- if I can go back one slide-- I think that's part of the goal here, for sure. So not every device has SDXI, as shown here. But I think the key is, as long as the address is visible to the SDXI that's being used for the operation, then it can go get that data from that memory and bring it over to its own device, or send something over there. And that is the key, that SDXI requires-- it has to have the visibility of all the devices. So if you have SDXI external to the NVM subsystem, like Shyam explained, in that case, your host has to have visibility into all those memories down in the device if you're going to have that SDXI perform the data movement from one to the other. So that works today if your memory happens to be a CMB-type memory. It would not work today if your memory is an SLM memory, because that is not host addressable as it stands today. Could that be changed? Yes, it could. And maybe that will. Who knows? But if you've got the SDXI down inside of the device, and it's able to address those other memories, then absolutely, it could perform that data movement and move that data over. Does that answer the question?
Yeah, I think it possibly. But what if the drive does not have SDXI?
You could issue an NVMe command or an SDXI command to a drive that does have it and tell it to go copy the data from that other drive that does not have the SDXI. Does that make--
Yeah.
OK.
So I guess the question I have is, if the drive doesn't do data, how does that get passed back up to the host? The host has to look at the data to drive A and build it so that it's actually not on drive A or is that just on... I guess that's the part that's a little bit-- if the drive is truly acting as an initiator, acting as a target, doing computations on the data, and counting this chunk of data and some over here and that chunk of data and some over there, how does that get communicated back up to the host or the repeat feed?
So the question is, just again to repeat for the camera, if we're doing this data movement, how does the host find out that we've moved something? And I think that's a great question. Clearly, you'd have to have some mechanism for the host to either have requested that operation to happen, so it's aware of the fact that it made this request somewhere that in the system it said, please go do this operation. You go figure out where the data is. And the drives go figure it out and move the data to the appropriate location, perform the operation, and put it back into some location that the host would specify. I think that's the most logical thing. But certainly, there's a lot of layers of software here that we haven't solved yet.
Yeah. If I could just add-- yeah, just at the higher level, the host has triggered an action to be performed at some point of time. Now, in between, the data can move without the host knowing about it. And that's fine, because you're optimizing going back and forth to the host as part of this peer-to-peer. But a trigger could be initially because the host wanted to perform an action. Or it could be something configured as a policy, a pre-program.
Yeah.
Yeah, no, that's a good use of data. It's just distributed computational storing and computational power without the host. It's just making sure you know where your data actually moved it out.
Yeah.
Yeah.
I'll ask another question.
But in a perfect world, if you had some of these SDXI-enabled devices, you get the benefit of performing the computation as you are trying to store or retrieve data, or while some work computation needs to be performed on the data without having to store or retrieve it. But this configuration is only explaining you some of the hybrid ways in which data pipeline can be optimized further. Because not every device is going to have a computation element inside it.
Go ahead. Jim, question.
So as part of all the NVMe spec work that was done, basically being able to transfer data to and from the host of the substance of local memory, things like NVMe read/write were added to support byte addressable to those regions. There's also been NVMe copy, where you can copy data between those. And so I guess I'm just curious how SDXI improves on that versus some of those more native NVMe commands. I mean, certainly one gap is that NVMe did not define doing a transformation on the in-transit. So you have to put it in some sense of local memory and then do the compute. And I know there was talk about adding that at some point in the future. But I just wondered if you could talk about that a little bit, what SDXI brings on top of what's the host NVMe data does do.
All right, so the question is, what does SDXI bring over using the NVMe command sets for performing compute? And I think that the number one thing is that, as written today, the NVMe commands require host involvement for everything. If you want to move the data, you have to issue an NVMe command. Then you have to-- or sorry, a host has to issue an NVMe command. And then you have the host issues another command to do the compute operation. I think that what SDXI brings is the ability for a single command to be sent that allows a device to say, I've got to go get this data, and I'm going to copy it over to myself. I'm going to do the compute on it that the host had requested me to do, and then copy it back, all in one operation. Now, could NVMe be extended to do that? Yes, I think it could. I think the other thing that SDXI brings is it's a standard DMA at the end of the day. And as Shyam had mentioned earlier, if you're not using DMAs in some form or another in your NVMe, then you're not doing it right. And so here's a way to-- this could be a future thing that doesn't exist yet. But what if SDXI kind of became the standard data mover in NVMe for all operations that everybody uses, and you configured that data mover instead of whatever DMA operation people are using today, which is probably more proprietary or whatever? Those are a couple of thoughts. Shyam, did you have any other thoughts to add to that?
Yeah, a little bit philosophical tangent. So different languages have different words that can describe something beautifully. These two standards have definitely developed separately. But there are nice things about each of these standards. So SDXI has been written as a memory-to-memory data mover. So a lot of the memory semantic data movement, it brings in things like memory barriers, how to fence things, how to sync an operation. If you're building a pipeline of chained operations to do a compressed encrypt hash dedupe, how do you actually do that pipeline in a memory-based model? That's what SDXI builds into that descriptors that we are standardizing. When we combine these two technologies in these kind of ways, now you can enable those things with the help of SDXI. Now, again, SDXI is not just for computational storage and NVMe. It is also being defined for other cases where you cannot actually have an NVMe function. So that's one of the ways that we are trying to make leverage of this.
So this gentleman here in the gray, yeah.
So again, the details have to be worked out in a potential NVMe plus SDXI proposal. But if I look at the SDXI descriptor itself, it can be a compound descriptor, or it could be a chain of SDXI descriptors. Now, how do we make that part of the NVMe plus SDXI proposal? That needs to be worked out. If this is one command that triggers a set of SDXI descriptors, or one command that triggers one SDXI compound operation, is something, again, needs to be figured out as part of the proposal. And that's what the group will end up trying to define.
Now, last question.
Yeah, so the question is, does this work with inline accelerators? And I think that one of the things that Shyam pointed out is what's built into SDXI is that it can do data transformations as it moves the data. That's inherent as part of the SDXI spec. So if part of our operation is that we want to do a particular transformation anyway, as our compute in our computational storage device, that transformation can happen by the SDXI as we're moving it. Now we've got it in our drive. We've done half of our compute, let's say. Now we need to do a couple of other things that we've been told to do. And we've now completed the compute operations that we were wanting to complete anyway. And SDXI kind of helped, because it provided that capability, or that compute, that we would have otherwise had to execute separately in our computational storage drive.
Right, okay, let me replay that question. I think, and you can correct me if I'm, is the question correct or not. I think you're asking if there's host memory and if it is not allocated to the device to be able to do the DMA. Yes, so you're talking about the PRI spec from PCI-- Okay, so if there's a page fault, what will happen? Is that what you're asking? Okay, so this is one of the things SDXI did is when we say you can have a user virtual address be specified as part of a memory descriptor, that means that it is not an actual physical address. And for that reason, when you're doing user virtual address to user virtual address-based data movement on host memory, we require an IOMMU with an ATS PRI implemented so the device can ask for the user virtual address and that gets effectively translated to a real physical address. Now, if that translation did not happen and there was a page fault, we just follow the ATS PRI spec, the operation will basically halt and it will cause an error. You can go and resolve the error and the privileged software will receive the error and it can go and correct it. The user software will get the error log and at some point, the context will go and stop, the SDXI context. Once the SDXI context is restarted again, you can retry it. At that point, if the page is available, then the operation will succeed. But this is pretty well documented in terms of the spec on how actually page faults are resolved as part of the spec. Great question.
So great questions and great discussion, everyone. We are out of time, but we really appreciate the conversation. So Shyam and I will be here tonight at seven. There's a computational storage boff. Happy to continue the conversation at that time. You can also grab us in the hallway, ask questions and to join. You just have to join both the SDXI Twig and the computational storage Twig and then you can be part of the work group or the collaboration and come and give us your ideas on where we should go with this. That's it. Thank you.