[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

wenju-he · 2024-12-13T01:08:02Z

Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined.

The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline.

Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does.

We can't assume backend compiler lowers the global variable after AlwaysInlinerPass.

…ysInlinerPass and move to PipelineStart Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined. The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline. Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does. In addition, intel gpu compiler has a pass to transform global variable in addrspace(3) to alloca that runs after pipeline basic simplification. Therefore, we shall run SYCLLowerWGLocalMemoryPass ealier.

clang/lib/CodeGen/BackendUtil.cpp

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

jsji · 2024-12-13T01:19:57Z

Looks like you need to rebase to pick up the new changes in this pass first.

…artEPCallback

wenju-he · 2024-12-13T02:17:39Z

Looks like you need to rebase to pick up the new changes in this pass first.

done

jsji

LGTM. Thanks!

bader · 2024-12-13T21:37:53Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

+      continue;
+    }
+    std::string FName = llvm::demangle(Caller->getName());
+    if (FName.find("sycl::_V1::ext::oneapi::group_local_memory") ==


Hardcoding current function name from DPC++ library is unfortunate. The code in the DPC++ header files can be changed at any time.

To make it more robust, I thought we could go up in the call stack up-to the kernel function ignoring all functions in sycl:: namespace. This will require SYCL kernel to be inlined into kernel function wrapper.

@Naghasan, do you have any thoughts on that?

I agree it is unfortunate, especially w.r.t. upstreaming. I don't know what the plans are for this one but if it is seen as important, we might want to improve this.

This will require SYCL kernel to be inlined into kernel function wrapper.

I don't think this is an issue TBH, I don't see any benefit in not inline the SYCL kernel in the wrapper, even in SPIR-V.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

I also just realized syclcompat::local_mem uses it, it isn't technically a valid usage of it w.r.t. the extension but something the attribute would allow to correctly handle.

cc @elizabethandrews @joeatodd

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

A new attribute "sycl_forceinline" is added in a4fe915
Please review.

Naghasan · 2024-12-17T10:14:37Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

+    return false;
+
+  bool Changed = false;
+  for (auto *U : ALMFunc->users()) {


we need to use a work list here rather than the simple loop.

This function https://github.com/intel/llvm/blob/sycl/sycl/include/syclcompat/memory.hpp#L71 needs to be updated as well, and this function won't be able to handle the nesting. The CI is currently green because there is no test requesting 2 distinct local memory objects using this function in the same kernel.

done, thank you for the suggestion. Now I understand what you mean by syclcompat::local_mem.
Also added a new test sycl/test/check_device_code/syclcompat_local_mem.cpp that has two calls to syclcompat::local_mem in a kernel.

premanandrao

FE changes look okay to me.

uditagarwal97

SYCL Changes LGTM.

wenju-he · 2025-01-06T07:57:37Z

kindly ping @Naghasan @intel/dpcpp-tools-reviewers @intel/syclcompat-lib-reviewers for review

Naghasan

LGTM, thanks for the new test

GeorgeWeb

The syclcompat changes look good to me, though I'll appreciate if @joeatodd also has a quick look to confirm, if possible.

wenju-he · 2025-01-13T03:54:07Z

kindly ping @intel/dpcpp-tools-reviewers for review

asudarsa · 2025-01-13T14:16:42Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

@@ -84,6 +86,42 @@ ModulePass *llvm::createSYCLLowerWGLocalMemoryLegacyPass() {
  return new SYCLLowerWGLocalMemoryLegacy();
 }

+// In sycl header __sycl_allocateLocalMemory builtin call is wrapped in


Why can we not rewrite the SYCL headers to 'inline' these calls? Is there a specific reason? Thanks

Why can we not rewrite the SYCL headers to 'inline' these calls? Is there a specific reason? Thanks

We can't ask users to call __sycl_allocateLocalMemory internal intrsinsic when documented interface is sycl::ext::something::something::group_local_memory<T>

asudarsa

Changes look OK to me. It will help to get an answer to my high-level question before submission. May be I am missing something here.

Thanks

AlexeySachkov

Strictly speaking, the extension spec explicitly says that those functions can only be within kernel functor scope, but the spec doesn't prevent us from lifting that restriction if we want and can do that.

I assume that this PR exists because syclcompat has a bug where it uses the exception in a way which is not guaranteed to work (and it doesn't).

I understand that we are trying to fix that issue in syclcompat to make it work, but by doing so we put ourselves into a weird situation.

The thing is that we are introducing several conceptually incorrect things to our project:

syclcompat is documented as header-only library. This PR breaks that by introducing compiler support which is necessary for correctness of certain syclcompat features. Tagging @intel/syclcompat-lib-reviewers for awareness.
we are setting an example of incorrect usage of the extension in our implementation and looking at the code my guess that it won't work the same way anywhere outside of syclcompat library which may confuse those who will dig into implementation details.

The proper fix I think is to either completely and officially remove the limitation of work_group_memory being only useable at SYCL kernel functor scope, or to rewrite syclcompat to use some other mechanism for obtaining local memory (we do have several of them; not every is already implemented and not every will fit, but still, worth exploring).

AlexeySachkov · 2025-01-13T14:47:56Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

@@ -84,6 +86,42 @@ ModulePass *llvm::createSYCLLowerWGLocalMemoryLegacyPass() {
  return new SYCLLowerWGLocalMemoryLegacy();
 }

+// In sycl header __sycl_allocateLocalMemory builtin call is wrapped in


Why can we not rewrite the SYCL headers to 'inline' these calls? Is there a specific reason? Thanks

We can't ask users to call __sycl_allocateLocalMemory internal intrsinsic when documented interface is sycl::ext::something::something::group_local_memory<T>

AlexeySachkov · 2025-01-13T14:59:56Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

+// distinct global variable. Inlining them here so that this pass doesn't have
+// implicit dependency on AlwaysInlinerPass.


So, instead of having two building blocks (i.e. passes) where each does a specific thing, we now have one of them doing both things.

This doesn't sound good from the high-level design point of view. I understand that having an implicit dependency is probably not a good thing, but are there any reasons to remove the dependency completely?

Because many passes have dependencies on each other and there are mechanism to explicitly tell the pass manager about them: The AnalysisUsage::addRequired<> and AnalysisUsage::addRequiredTransitive<> methods. They are mostly used for requesting results of a certain analysis, but they can also be used to request specific transformations to be performed before a certain pass is run. You can find examples of that with LoopSimplify

This doesn't sound good from the high-level design point of view. I understand that having an implicit dependency is probably not a good thing, but are there any reasons to remove the dependency completely?

I agree, but I think it is good to making this pass self-contained.

Because many passes have dependencies on each other and there are mechanism to explicitly tell the pass manager about them: The AnalysisUsage::addRequired<> and AnalysisUsage::addRequiredTransitive<> methods. They are mostly used for requesting results of a certain analysis, but they can also be used to request specific transformations to be performed before a certain pass is run. You can find examples of that with LoopSimplify

This would inline other functions that are not related to what we handle in this pass.

Naghasan · 2025-01-13T15:48:47Z

I assume that this PR exists because syclcompat has a bug where it uses the exception in a way which is not guaranteed to work (and it doesn't).

@AlexeySachkov The change you reviewed exist because the pass was moved in the pipeline and runs before inlining, the change required because of that, nothing to do with syclcompat.

syclcompat came into the picture because I highlighted that the existing tests couldn't highlight an issue the patch was bringing.

AlexeySachkov · 2025-01-13T15:55:01Z

I assume that this PR exists because syclcompat has a bug where it uses the exception in a way which is not guaranteed to work (and it doesn't).

@AlexeySachkov The change you reviewed exist because the pass was moved in the pipeline and runs before inlining, the change required because of that, nothing to do with syclcompat.

syclcompat came into the picture because I highlighted that the existing tests couldn't highlight an issue the patch was bringing.

Ok, even if syclcompat wasn't the origin of how this issue was found, the use of the extension from syclcompat still violates the extension spec and we are still introducing compiler dependency into syclcompat by adding that attribute into the header.

wenju-he · 2025-01-14T04:11:49Z

The proper fix I think is to either completely and officially remove the limitation of work_group_memory being only useable at SYCL kernel functor scope

The restriction that work_group_memory is only useable at kernel function scope might be a language behavior, rather than implementation limitation due to inlining. The behavior aligns with OpenCL.

I have filed a feature request #16617 to address the issue of dependency on inlining.
The feature request also address

we are still introducing compiler dependency into syclcompat by adding that attribute into the header

wenju-he · 2025-01-20T01:34:30Z

I have filed a feature request #16617 to address the issue of dependency on inlining.

The request may take time to discuss.
@AlexeySachkov @intel/dpcpp-tools-reviewers is it ok to merge this pr as it is to unblock our work?

jsji · 2025-01-23T03:05:42Z

@intel/dpcpp-tools-reviewers Please approve or comment. Thanks.

wenju-he requested review from a team as code owners December 13, 2024 01:08

wenju-he requested review from bader and jsji December 13, 2024 01:10

wenju-he mentioned this pull request Dec 13, 2024

[SYCL] Move SYCLLowerWGLocalMemoryPass to OptimizerEarlyEPCallback #16347

Closed

bader requested a review from Naghasan December 13, 2024 01:12

jsji reviewed Dec 13, 2024

View reviewed changes

wenju-he added 2 commits December 13, 2024 09:48

Merge branch 'sycl' into SYCLLowerWGLocalMemoryPass-inline-PipelineSt…

51b1ec9

…artEPCallback

fix inlineGroupLocalMemoryFunc

a4f8382

wenju-he had a problem deploying to WindowsCILock December 13, 2024 02:17 — with GitHub Actions Error

wenju-he requested a review from jsji December 13, 2024 02:24

inlineGroupLocalMemoryFunc: return false -> continue

b42fd22

wenju-he temporarily deployed to WindowsCILock December 13, 2024 02:40 — with GitHub Actions Inactive

jsji approved these changes Dec 13, 2024

View reviewed changes

wenju-he temporarily deployed to WindowsCILock December 13, 2024 03:28 — with GitHub Actions Inactive

bader reviewed Dec 13, 2024

View reviewed changes

check device code

44db66a

wenju-he requested a review from a team as a code owner December 16, 2024 03:28

wenju-he requested a review from uditagarwal97 December 16, 2024 03:28

wenju-he had a problem deploying to WindowsCILock December 16, 2024 03:29 — with GitHub Actions Error

clang-format

97add86

wenju-he temporarily deployed to WindowsCILock December 16, 2024 03:40 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 16, 2024 04:28 — with GitHub Actions Inactive

add ir attribute sycl_forceinline to group_local_memory

a4fe915

wenju-he temporarily deployed to WindowsCILock December 17, 2024 05:55 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 17, 2024 06:45 — with GitHub Actions Inactive

Naghasan requested changes Dec 17, 2024

View reviewed changes

wenju-he requested review from Naghasan and bader December 18, 2024 01:45

wenju-he temporarily deployed to WindowsCILock December 18, 2024 02:36 — with GitHub Actions Inactive

change 4 to 2 in check_device_code/syclcompat_local_mem.cpp

0de9070

wenju-he had a problem deploying to WindowsCILock December 18, 2024 04:00 — with GitHub Actions Error

deterministic order of erasing functions

db2ad4a

wenju-he temporarily deployed to WindowsCILock December 18, 2024 04:27 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 18, 2024 05:18 — with GitHub Actions Inactive

add back __SYCL_ALWAYS_INLINE

442fe98

wenju-he temporarily deployed to WindowsCILock December 19, 2024 01:01 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 19, 2024 01:51 — with GitHub Actions Inactive

remove #include llvm/Demangle/Demangle.h

275d60d

wenju-he temporarily deployed to WindowsCILock December 27, 2024 04:57 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 27, 2024 05:47 — with GitHub Actions Inactive

premanandrao approved these changes Dec 27, 2024

View reviewed changes

uditagarwal97 approved these changes Dec 27, 2024

View reviewed changes

Naghasan approved these changes Jan 6, 2025

View reviewed changes

GeorgeWeb reviewed Jan 6, 2025

View reviewed changes

asudarsa reviewed Jan 13, 2025

View reviewed changes

AlexeySachkov reviewed Jan 13, 2025

View reviewed changes

wenju-he mentioned this pull request Jan 14, 2025

Remove restriction that group_local_memory/group_local_memory_for_overwrite function requires to be inlined #16617

Open

wenju-he requested a review from AlexeySachkov January 14, 2025 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

wenju-he commented Dec 13, 2024 •

edited

Loading

jsji commented Dec 13, 2024

wenju-he commented Dec 13, 2024

jsji left a comment

bader Dec 13, 2024

Naghasan Dec 16, 2024

wenju-he Dec 17, 2024

Naghasan Dec 17, 2024

wenju-he Dec 18, 2024

premanandrao left a comment

uditagarwal97 left a comment

wenju-he commented Jan 6, 2025

Naghasan left a comment

GeorgeWeb left a comment

wenju-he commented Jan 13, 2025

asudarsa Jan 13, 2025

AlexeySachkov Jan 13, 2025

asudarsa left a comment

AlexeySachkov left a comment

AlexeySachkov Jan 13, 2025

AlexeySachkov Jan 13, 2025

wenju-he Jan 14, 2025

Naghasan commented Jan 13, 2025

AlexeySachkov commented Jan 13, 2025

wenju-he commented Jan 14, 2025

wenju-he commented Jan 20, 2025

jsji commented Jan 23, 2025

		// distinct global variable. Inlining them here so that this pass doesn't have
		// implicit dependency on AlwaysInlinerPass.

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Are you sure you want to change the base?

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Conversation

wenju-he commented Dec 13, 2024 • edited Loading

jsji commented Dec 13, 2024

wenju-he commented Dec 13, 2024

jsji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

premanandrao left a comment

Choose a reason for hiding this comment

uditagarwal97 left a comment

Choose a reason for hiding this comment

wenju-he commented Jan 6, 2025

Naghasan left a comment

Choose a reason for hiding this comment

GeorgeWeb left a comment

Choose a reason for hiding this comment

wenju-he commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asudarsa left a comment

Choose a reason for hiding this comment

AlexeySachkov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Naghasan commented Jan 13, 2025

AlexeySachkov commented Jan 13, 2025

wenju-he commented Jan 14, 2025

wenju-he commented Jan 20, 2025

jsji commented Jan 23, 2025

wenju-he commented Dec 13, 2024 •

edited

Loading