-
Notifications
You must be signed in to change notification settings - Fork 752
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYCL][Doc] Extension spec for composite devices (#11846)
Add an extension specification for new APIs that allow an application to access card-level devices on PVC. --------- Co-authored-by: Marcos Maronas <[email protected]> Co-authored-by: John Pennycook <[email protected]>
- Loading branch information
1 parent
803fa48
commit 9a1b908
Showing
1 changed file
with
290 additions
and
0 deletions.
There are no files selected for viewing
290 changes: 290 additions & 0 deletions
290
sycl/doc/extensions/proposed/sycl_ext_oneapi_composite_device.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,290 @@ | ||
= sycl_ext_oneapi_composite_device | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2023-2023 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 8 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. Interfaces defined in this specification may not be implemented yet | ||
or may be in a preliminary state. The specification itself may also change in | ||
incompatible ways before it is finalized. *Shipping software products should | ||
not rely on APIs defined in this specification.* | ||
|
||
|
||
== Backend support status | ||
|
||
The APIs defined in this extension are only useful when using the Level Zero | ||
backend, and they are only useful when the Level Zero environment variable | ||
`ZE_FLAT_DEVICE_HIERARCHY=COMBINED` is set. The APIs may be called even when | ||
using other backends, but they will return an empty list of composite devices. | ||
|
||
|
||
== Overview | ||
|
||
Some Intel GPU architectures are structured with multiple tiles on a single | ||
card. Currently, this applies only to the Intel(R) Data Center GPU Max Series | ||
(aka PVC). By default, SYCL exposes each of these tiles as a separate root | ||
device, so each tile corresponds to a separate `device` object that is returned | ||
from `device::get_devices`. Applications generally perform better when each | ||
tile is treated as a separate device, but some advanced use cases can benefit | ||
by treating all tiles on a card as a single "composite" device. This extension | ||
provides APIs that enable these advanced use cases. | ||
|
||
A composite device has the same semantics as any other SYCL device, though the | ||
performance characteristics might be different. The application may submit a | ||
kernel to a composite device, and the implementation automatically schedules | ||
work-items to each of the underlying tiles. Memory allocated on the composite | ||
device is accessible to any of these work-items, regardless of which underlying | ||
tile it runs on. | ||
|
||
Applications that use this extension can access the same hardware through | ||
different `device` objects. For example, it is possible to get one `device` | ||
object that represents a card and another `device` object that represents a | ||
tile from that card. As noted previously, a device representing a card is | ||
known as a "composite" device. A device representing a tile is known as a | ||
"component" of that composite device. Since the tile-level devices are | ||
returned from `device::get_devices`, they are also considered "root" devices, | ||
which is a term defined in the core SYCL specification. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_COMPOSITE_DEVICE` to one of the values defined in the | ||
table below. Applications can test for the existence of this macro to | ||
determine if the implementation supports this feature, or applications can test | ||
the macro's value to determine which of the extension's features the | ||
implementation supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
=== Enumerating composite devices | ||
|
||
This extension adds two new functions for enumerating the available composite | ||
devices. One is a free function and the other is a new member function of the | ||
`platform` class: | ||
|
||
``` | ||
namespace sycl { | ||
namespace ext::oneapi::experimental { | ||
|
||
std::vector<device> get_composite_devices(); | ||
|
||
} // namespace ext::oneapi::experimental | ||
|
||
class platform { | ||
std::vector<device> ext_oneapi_get_composite_devices(); | ||
}; | ||
|
||
} // namespace sycl | ||
``` | ||
|
||
The free function `get_composite_devices` returns all of the composite devices | ||
across all platforms. The member function | ||
`platform::ext_oneapi_get_composite_devices` returns the composite devices | ||
within the given platform. | ||
|
||
The execution environment for a SYCL application has a fixed number of | ||
composite devices which does not vary as the application executes. As a | ||
result, each call to these functions returns the same set of `device` objects, | ||
and the order of those objects does not vary between calls. | ||
|
||
=== New device information descriptors | ||
|
||
This extension adds two new device information descriptors: | ||
|
||
``` | ||
namespace sycl::ext::oneapi::experimental::info::device { | ||
|
||
struct component_devices; | ||
struct composite_device; | ||
|
||
} // namespace sycl::ext::oneapi::experimental::info::device | ||
``` | ||
|
||
[width="100%",%header,cols="37%,19%,44%"] | ||
|=== | ||
|Device descriptor | ||
|Return type | ||
|Description | ||
|
||
|`component_devices` | ||
|`std::vector<device>` | ||
|Returns the set of component devices that are contained by a composite device. | ||
If this device is not a composite device, an empty vector is returned. | ||
|
||
|`composite_device` | ||
|`device` | ||
|This query may only be applied to a device that has | ||
`aspect::ext_oneapi_is_component`. Returns the composite device which | ||
contains this component device. Since the set of composite devices is fixed, | ||
the `device` object returned from this query is a copy of one of the `device` | ||
objects returned by `get_composite_devices`. | ||
|
||
If this device does not have `aspect::ext_oneapi_is_component`, the | ||
`device::get_info` function throws a synchronous `exception` with the | ||
`errc::invalid` error code. | ||
|=== | ||
|
||
=== New device aspects | ||
|
||
This extension adds two new device aspects: | ||
|
||
``` | ||
namespace sycl { | ||
|
||
enum class aspect : /*unspecified*/ { | ||
ext_oneapi_is_composite, | ||
ext_oneapi_is_component | ||
}; | ||
|
||
} // namespace sycl | ||
``` | ||
|
||
[width="100%",%header,cols="50%,50%"] | ||
|=== | ||
|Aspect | ||
|Description | ||
|
||
|`ext_oneapi_is_composite` | ||
|Indicates that the device is a composite device. Any device with this aspect | ||
will have at least two constituent component devices. | ||
|
||
|`ext_oneapi_is_component` | ||
|Indicates that the device is a component device of some other composite | ||
device. This applies only to a root device that is a direct component of | ||
some composite device. A sub-device will not have this aspect even if its | ||
parent is a component device. | ||
|=== | ||
|
||
=== Impact on "descendent device" | ||
|
||
This extension augments the definition of the term "descendent device" from the | ||
core SYCL specification as follows. Given some composite device _C_ and some | ||
component of that composite device _R_, device _R_ is a descendent device of | ||
_C_ and all descendent devices of _R_ are also descendent devices of _C_. | ||
|
||
This definition means that a SYCL context that contains a composite device is | ||
compatible with any of its component devices, even if those component devices | ||
are not contained by the context. See the core SYCL specification for details. | ||
|
||
|
||
== Impact to the ONEAPI_DEVICE_SELECTOR | ||
|
||
The `ONEAPI_DEVICE_SELECTOR` is an environment variable that is specific to the | ||
{dpcpp} implementation. Therefore, this section that describes the interaction | ||
between this extension and that environment variable is non-normative and does | ||
not apply to other SYCL implementations that may support this extension. | ||
|
||
The `ONEAPI_DEVICE_SELECTOR` environment variable determines the list of root | ||
devices that are returned from `device::get_devices`: _R0_, _R1_, _R2_, etc. | ||
The devices returned from `get_composite_devices` are computed from this list | ||
by iterating over the elements _Ri_ in order: | ||
|
||
* Start with an empty list of composite devices. | ||
* If device _Ri_ is a component of some composite device _C_ and if all other | ||
components of _C_ are also in the list of root devices returned from | ||
`device::get_devices`, then _C_ is appended to the list of composite devices | ||
unless _C_ is already in that list. | ||
|
||
This algorithm ensures that a composite device is made visible to the | ||
application only if all of its components are also visible. This is important | ||
for two reasons. The first reason is philosophical. We do not want to expose | ||
a composite device unless it has at least two component devices. This is | ||
similar to our decision in the core SYCL spec to disallow partitioning into | ||
sub-devices which results in only a single sub-device. We don't want to expose | ||
partitioning when the parent and child represent exactly the same hardware. | ||
The second reason is practical. The {dpcpp} implementation will associate | ||
each composite device with a single Level Zero native device handle, which | ||
represents the entire card. There is no way to get a Level Zero handle that | ||
represents a subset of the tiles. Therefore, we do not expose a composite | ||
device unless it represents all of the tiles on a card. | ||
|
||
The algorithm also imposes a sensible order on the composite devices, which | ||
corresponds to the order of the root devices. Thus, users who set the order of | ||
root devices via `ONEAPI_DEVICE_SELECTOR` will also get a corresponding order | ||
for the composite devices. | ||
|
||
|
||
== Issues | ||
|
||
* What is the impact of this extension on | ||
link:../supported/sycl_ext_oneapi_default_context.asciidoc[ | ||
sycl_ext_oneapi_default_context]? I think there are two reasonable options. | ||
One option is to say that the default context is not affected by this | ||
extension. As a result, applications that use composite devices will need to | ||
create their own context which contains the set of composite devices that | ||
they care about. The other option is to change the default context to | ||
include all of the composite devices. This would be easier to use because | ||
the default context continues to reflect all devices in the system. As a | ||
result, the default context can be used in all scenarios, and an application | ||
only needs to create a custom one in order to gain some optimization. It's | ||
not clear at this point if redefining the default context like this would | ||
result in a performance penalty for applications that don't use composite | ||
devices. | ||
|
||
* What guarantees do we make (if any) about the accessibility of USM allocated | ||
on a composite device? Do we guarantee that this memory is also accessible | ||
on the component devices? Is the answer different for "device" USM vs. | ||
"shared" USM? One option is to say that the application needs to | ||
specifically enable P2P access between a composite device and its component | ||
devices in order to guarantee accessibility. There is a related question in | ||
the core SYCL spec about accessibility of USM w.r.t. sub-devices, which we | ||
have not yet resolved. | ||
|
||
* Should the `ONEAPI_DEVICE_SELECTOR` be extended to allow selection of | ||
composite devices? For example, syntax like | ||
`ONEAPI_DEVICE_SELECTOR=level_zero:C1` could mean "select the second Level | ||
Zero card device and expose it as a root device from | ||
``device::get_devices``". Our current thinking is that we should not allow | ||
syntax like this. |