Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Extension spec for composite devices #11846

Merged
merged 10 commits into from
Dec 21, 2023

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Nov 9, 2023

Add an extension specification for new APIs that allow an application
to access card-level devices on PVC.

@gmlueck gmlueck requested a review from a team as a code owner November 9, 2023 20:12
Fix typos.

Co-authored-by: Marcos Maronas <[email protected]>
@gmlueck
Copy link
Contributor Author

gmlueck commented Nov 27, 2023

Thanks for finding those, @maarquitos14.


Some Intel GPU architectures are structured with multiple tiles on a single
card. Currently, this applies only to the Data Center GPU Max series (aka
PVC). By default, SYCL exposes each of these tiles as a separate root device,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to say that is not SYCL, and not even the UR, it is the Intel(R) L0 driver implementation that does that. SYCL and UR just expose what L0 driver has presented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to read this specification from the point of view of a SYCL application developer. From that point of view the UR and Level Zero are just implementation details. The only thing that matters to this person is how the SYCL APIs expose the hardware.

namespace sycl {
namespace ext::oneapi::experimental {

std::vector<device> get_composite_devices();
Copy link
Contributor

@jandres742 jandres742 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck : from the definition above:

A composite device has the same semantics as any other SYCL device, though the
performance characteristics might be different. The application may submit a
kernel to a composite device, and the implementation automatically schedules
work-items to each of the underlying tiles.

then get_composite_devices() would return the equivalent devices returned by zeDeviceGet when ZE_FLAT_DEVICE_HIERARCHY is set to COMPOSITE. When it is set to FLAT, then get_composite_devices() will return 0, as with FLAT the root devices returned are tiles, so the statement above "the implementation automatically schedules work-items to each of the underlying tiles." doesn't hold.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_composite_devices function will return an empty list in both FLAT and COMPOSITE modes. It's only in COMBINED mode where it returns something interesting. In this mode, I believe the statement about distributing work-items to the underlying tiles is true, right?

== Impact to the ONEAPI_DEVICE_SELECTOR

The `ONEAPI_DEVICE_SELECTOR` is an environment variable that is specific to the
{dpcpp} implementation. Therefore, this section that describes the interaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont know if this needs to be reworded? The ONEAPI_DEVICE_SELECTOR is being implemented in the UR, which means will be used by all customers of the UR, not only the dpcpp implementation, oneapi-src/unified-runtime#220.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, DPC++ is the only SYCL implementation that uses the UR. I think we should keep the wording like this for now. We can revisit it if other SYCL implementation start using the UR.

In any event, I think the logic described in this section will probably be located in the DPC++ runtime, not in the UR.

gmlueck and others added 2 commits December 20, 2023 16:58
I got feedback that this convention is more confusing than helpful.
@gmlueck
Copy link
Contributor Author

gmlueck commented Dec 21, 2023

@intel/llvm-gatekeepers I think this is ready to merge

@againull againull merged commit 9a1b908 into intel:sycl Dec 21, 2023
2 checks passed
@gmlueck gmlueck deleted the gmlueck/composite-device-spec branch December 21, 2023 20:17
steffenlarsen pushed a commit that referenced this pull request Feb 12, 2024
Initial implementation to support `sycl_ext_oneapi_composite_device`
specified in #11846.

Depends on oneapi-src/unified-runtime#1192.

---------

Signed-off-by: Maronas, Marcos <[email protected]>
Signed-off-by: Marcos Maronas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants