First steps to enable SYCL backend in Python Interface #155

sommerlukas · 2024-11-14T14:55:46Z

First implementation steps towards supporting the SYCL backend in the CUTLASS Python Interface.

The main additions from this PR are:

Generating a suitable GEMM template and arguments for the CUTLASS 3.x API and Intel PVC as target.
Calling DPC++ instead of nvcc to compile device and host code.
Using the DPCTL library to transfer data and launch the kernel via SYCL.

The support so far focuses on a simple GEMM, epilogues (e.g, with visitor) are not yet supported.

Compilation is currently only possible with development versions of DPC++, the -fsycl-rtc-mode flag that was added to support CUTLASS nested parameter classes in free-function kernels as part of this work is not yet available in releases.

The activation of the SYCL backend via environment variable is a temporary solution, a follow-up will look into a cleaner solution.

aacostadiaz · 2024-12-05T10:12:31Z

python/cutlass_library/generator.py

+
+    math_instructions = [
+      MathInstruction(
+          [16, 8, 16],


Should it be 8, 16, 16 to match the 8x16x16 (M,N,K) MMA operation for bfloat?

Yes, that probably makes sense. This value was probably based on values used for CUDA devices, makes sense to adapt it for PVC.

I changed it in the latest commit.

aacostadiaz · 2024-12-05T10:13:41Z

python/cutlass_library/generator.py

@@ -7026,6 +7026,47 @@ def GenerateSM90(manifest, cuda_version):

 ###################################################################################################

+def GeneratePVC_TensorOp_16b_gemm(manifest, cuda_version):


What is cuda version here?

It's the CUDA version, e.g., 12.4.0, defined here.

Right now, we don't use that parameter. If we come to a point where we need to make distinctions based on SYCL version or similar, we can change this to reflect a version that we need.

For now, we only have this parameter to be compatible with the expected interface here (via generate_function_name and generate_function).

aacostadiaz · 2024-12-05T10:18:04Z

python/cutlass_library/generator.py

+def GeneratePVC_TensorOp_16b_gemm(manifest, cuda_version):
+    # TODO: Add remaining supported configurations
+    layouts = [
+      [[LayoutType.RowMajor, 8], [LayoutType.RowMajor, 8], [LayoutType.RowMajor, 8]]


is 8 the alignment?

Yes, I think so.

aacostadiaz

Look good, thanks!!!

I left some questions but I think they will be more relevant for follow up PRs.

Signed-off-by: Lukas Sommer <[email protected]>

FMarno

Looks fine to me. Hard to see individual issues, but I also don't really have a knowledge of the whole system.

FMarno · 2025-01-23T09:55:11Z

python/cutlass/backend/compiler.py

+            if self._is_sycl():
+                q = dpctl.SyclQueue(cutlass.sycl_device())
+                module = dpctl.program.create_program_from_spirv(q, cubin_image)
+            else:
+                err, module = cuda.cuModuleLoadData(cubin_image)
+                if err != cuda.CUresult.CUDA_SUCCESS:
+                    raise RuntimeError("Cuda Error: {}".format(err))
+
+            if self._is_sycl():
+                kernel = module.get_sycl_kernel(operation_name)
+            else:
+                err, kernel = cuda.cuModuleGetFunction(
+                    module, bytes(str.encode(operation_name)))


Suggested change

if self._is_sycl():

q = dpctl.SyclQueue(cutlass.sycl_device())

module = dpctl.program.create_program_from_spirv(q, cubin_image)

else:

err, module = cuda.cuModuleLoadData(cubin_image)

if err != cuda.CUresult.CUDA_SUCCESS:

raise RuntimeError("Cuda Error: {}".format(err))

if self._is_sycl():

kernel = module.get_sycl_kernel(operation_name)

else:

err, kernel = cuda.cuModuleGetFunction(

module, bytes(str.encode(operation_name)))

if self._is_sycl():

q = dpctl.SyclQueue(cutlass.sycl_device())

module = dpctl.program.create_program_from_spirv(q, cubin_image)

kernel = module.get_sycl_kernel(operation_name)

else:

err, module = cuda.cuModuleLoadData(cubin_image)

if err != cuda.CUresult.CUDA_SUCCESS:

raise RuntimeError("Cuda Error: {}".format(err))

err, kernel = cuda.cuModuleGetFunction(

module, bytes(str.encode(operation_name)))

FMarno · 2025-01-23T09:57:50Z

python/cutlass/backend/compiler.py

-        if self.backend == "nvrtc":
-            # 3. compile
+        # 3. compile
+        if self.backend == "nvrtc":  # with nvrtc backend


Suggested change

if self.backend == "nvrtc": # with nvrtc backend

if self.backend == "nvrtc":

FMarno · 2025-01-23T09:58:32Z

python/cutlass/backend/compiler.py

@@ -303,6 +335,50 @@ def emit_compile_(self, operation_list, compilation_options, host_compilation_op
            if err != nvrtc.nvrtcResult.NVRTC_SUCCESS:
                raise RuntimeError("NVRTC Error: {}".format(err))

+        elif self.backend == "dpcpp":  # with DPC++ backend


Suggested change

elif self.backend == "dpcpp": # with DPC++ backend

elif self.backend == "dpcpp":

Signed-off-by: Lukas Sommer <[email protected]>

sommerlukas requested review from mehdi-goli and aacostadiaz November 14, 2024 15:37

sommerlukas self-assigned this Nov 27, 2024

aacostadiaz reviewed Dec 5, 2024

View reviewed changes

aacostadiaz approved these changes Dec 5, 2024

View reviewed changes

sommerlukas added 5 commits December 13, 2024 13:16

Enable DPC++ as alternative host & device compiler

ba7327e

Signed-off-by: Lukas Sommer <[email protected]>

Adapt GEMM 3.x template for PVC

5c0eaf2

Signed-off-by: Lukas Sommer <[email protected]>

Add Intel PVC to generator

28b4648

Signed-off-by: Lukas Sommer <[email protected]>

Cleanup

341999d

Signed-off-by: Lukas Sommer <[email protected]>

Use correct TileScheduleType

19524ee

Signed-off-by: Lukas Sommer <[email protected]>

sommerlukas force-pushed the python-interface-enable-sycl branch from f4b9079 to 19524ee Compare December 13, 2024 13:17

sommerlukas added 14 commits December 13, 2024 13:26

WIP: SYCL memory handling

1556093

Signed-off-by: Lukas Sommer <[email protected]>

Deactivate persistent cache for SYCL

3f3392e

Signed-off-by: Lukas Sommer <[email protected]>

Use correct name to look for kernel

e114f9b

Signed-off-by: Lukas Sommer <[email protected]>

Use work_group_memory for local memory

45fae59

Signed-off-by: Lukas Sommer <[email protected]>

WIP: Launch kernel with SYCL

38cd9ee

Signed-off-by: Lukas Sommer <[email protected]>

Fix argument structure

e12ccf5

Signed-off-by: Lukas Sommer <[email protected]>

Support for XPU Torch Tensor

1dc40fd

Signed-off-by: Lukas Sommer <[email protected]>

:Merge branch 'sycl-develop' into python-interface-enable-sycl

1e63399

Specify subgroup size

58d6cb3

Signed-off-by: Lukas Sommer <[email protected]>

Remove print

e49cc52

Signed-off-by: Lukas Sommer <[email protected]>

Disable range rounding

457d0e8

Signed-off-by: Lukas Sommer <[email protected]>

Distinguish Nvidia and SYCL

25c892a

Signed-off-by: Lukas Sommer <[email protected]>

Check Torch XPU availability

862e57a

Signed-off-by: Lukas Sommer <[email protected]>

Use correct math op shape

2709a21

Signed-off-by: Lukas Sommer <[email protected]>

sommerlukas requested review from aacostadiaz, FMarno and joeatodd January 22, 2025 12:51

Formatting and small fixes

8d7db92

Signed-off-by: Lukas Sommer <[email protected]>

FMarno approved these changes Jan 23, 2025

View reviewed changes

sommerlukas added 2 commits January 23, 2025 16:56

Merge branch 'sycl-develop' into python-interface-enable-sycl

9689fa7

Address PR feedback

9ec8195

Signed-off-by: Lukas Sommer <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First steps to enable SYCL backend in Python Interface #155

First steps to enable SYCL backend in Python Interface #155

sommerlukas commented Nov 14, 2024 •

edited

Loading

aacostadiaz Dec 5, 2024

sommerlukas Jan 22, 2025

aacostadiaz Dec 5, 2024

sommerlukas Jan 22, 2025

aacostadiaz Dec 5, 2024

sommerlukas Jan 22, 2025

aacostadiaz left a comment

FMarno left a comment

FMarno Jan 23, 2025

FMarno Jan 23, 2025

FMarno Jan 23, 2025

		@@ -7026,6 +7026,47 @@ def GenerateSM90(manifest, cuda_version):

		###################################################################################################

		def GeneratePVC_TensorOp_16b_gemm(manifest, cuda_version):

	if self.backend == "nvrtc": # with nvrtc backend
	if self.backend == "nvrtc":

	elif self.backend == "dpcpp": # with DPC++ backend
	elif self.backend == "dpcpp":

First steps to enable SYCL backend in Python Interface #155

Are you sure you want to change the base?

First steps to enable SYCL backend in Python Interface #155

Conversation

sommerlukas commented Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aacostadiaz left a comment

Choose a reason for hiding this comment

FMarno left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sommerlukas commented Nov 14, 2024 •

edited

Loading