Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Open
wants to merge 13 commits into
base: sycl
Choose a base branch
from
Open
6 changes: 2 additions & 4 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1042,6 +1042,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
/*FP64ConvEmu=*/CodeGenOpts.FP64ConvEmu,
/*ExcludeAspects=*/{"fp64"}));
MPM.addPass(SYCLPropagateJointMatrixUsagePass());
// Lowers static/dynamic local memory builtin calls.
MPM.addPass(SYCLLowerWGLocalMemoryPass());
});
else if (LangOpts.SYCLIsHost && !LangOpts.SYCLESIMDBuildHostCode)
PB.registerPipelineStartEPCallback(
Expand Down Expand Up @@ -1191,10 +1193,6 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
MPM.addPass(SPIRITTAnnotationsPass());
}

// Allocate static local memory in SYCL kernel scope for each allocation
// call.
MPM.addPass(SYCLLowerWGLocalMemoryPass());

// Process properties and annotations
MPM.addPass(CompileTimePropertiesPass());

Expand Down
29 changes: 0 additions & 29 deletions clang/test/CodeGenSYCL/group-local-memory.cpp

This file was deleted.

31 changes: 20 additions & 11 deletions clang/test/CodeGenSYCL/kernel-early-optimization-pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,24 @@
// SYCL device target, and can be disabled with -fno-sycl-early-optimizations.
// New pass manager doesn't print all passes tree, only module level.
//
// RUN: %clang_cc1 -O2 -fsycl-is-device -triple spir64-unknown-unknown %s -mdebug-pass Structure -emit-llvm -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-NEWPM-EARLYOPT
// CHECK-NEWPM-EARLYOPT: ConstantMergePass
// CHECK-NEWPM-EARLYOPT: SYCLMutatePrintfAddrspacePass
// RUN: %clang_cc1 -O2 -fsycl-is-device -triple spir64-unknown-unknown %s -mdebug-pass Structure -emit-llvm -o /dev/null 2>&1 | FileCheck %s
// CHECK: SYCLVirtualFunctionsAnalysisPass
// CHECK: ESIMDVerifierPass
// CHECK: SYCLConditionalCallOnDevicePass
// CHECK: SYCLPropagateAspectsUsagePass
// CHECK: SYCLPropagateJointMatrixUsagePass
// CHECK: SYCLLowerWGLocalMemoryPass
// CHECK: InferFunctionAttrsPass
// CHECK: AlwaysInlinerPass
// CHECK: ModuleInlinerWrapperPass
// CHECK: ConstantMergePass
// CHECK: SYCLMutatePrintfAddrspacePass
// CHECK: SYCLPropagateAspectsUsagePass
// CHECK: SYCLAddOptLevelAttributePass
// CHECK: CompileTimePropertiesPass
// CHECK: RecordSYCLAspectNamesPass
// CHECK: CleanupSYCLMetadataPass
//
// RUN: %clang_cc1 -O2 -fsycl-is-device -triple spir64-unknown-unknown %s -mdebug-pass Structure -emit-llvm -fno-sycl-early-optimizations -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-NEWPM-NOEARLYOPT
// CHECK-NEWPM-NOEARLYOPT-NOT: ConstantMergePass
// CHECK-NEWPM-NOEARLYOPT: SYCLMutatePrintfAddrspacePass

// Checks that the compile time properties pass is added into the compilation pipeline
//
// RUN: %clang_cc1 -O2 -fsycl-is-device -triple spir64-unknown-unknown %s -mdebug-pass Structure -emit-llvm -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-COMPTIMEPROPS
// CHECK-COMPTIMEPROPS: Running pass: CompileTimePropertiesPass on [module]
// RUN: %clang_cc1 -O2 -fsycl-is-device -triple spir64-unknown-unknown %s -mdebug-pass Structure -emit-llvm -fno-sycl-early-optimizations -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-NOEARLYOPT
// CHECK-NOEARLYOPT-NOT: ConstantMergePass1
// CHECK-NOEARLYOPT: SYCLMutatePrintfAddrspacePass
49 changes: 43 additions & 6 deletions llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,17 @@
//===----------------------------------------------------------------------===//

#include "llvm/SYCLLowerIR/LowerWGLocalMemory.h"
#include "llvm/Demangle/Demangle.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/Pass.h"
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/Utils/Cloning.h"

using namespace llvm;

#define DEBUG_TYPE "LowerWGLocalMemory"
#define DEBUG_TYPE "sycllowerwglocalmemory"

static constexpr char SYCL_ALLOCLOCALMEM_CALL[] = "__sycl_allocateLocalMemory";
static constexpr char SYCL_DYNAMIC_LOCALMEM_CALL[] =
Expand Down Expand Up @@ -84,6 +86,42 @@ ModulePass *llvm::createSYCLLowerWGLocalMemoryLegacyPass() {
return new SYCLLowerWGLocalMemoryLegacy();
}

// In sycl header __sycl_allocateLocalMemory builtin call is wrapped in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we not rewrite the SYCL headers to 'inline' these calls? Is there a specific reason? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we not rewrite the SYCL headers to 'inline' these calls? Is there a specific reason? Thanks

We can't ask users to call __sycl_allocateLocalMemory internal intrsinsic when documented interface is sycl::ext::something::something::group_local_memory<T>

// group_local_memory/group_local_memory_for_overwrite functions, which must be
// inlined first before each __sycl_allocateLocalMemory call can be lowered to a
// unique global variable. Inlining them here so that this pass doesn't have
// implicit dependency on AlwaysInlinerPass.
static bool inlineGroupLocalMemoryFunc(Module &M) {
Function *ALMFunc = M.getFunction(SYCL_ALLOCLOCALMEM_CALL);
if (!ALMFunc || ALMFunc->use_empty())
return false;

bool Changed = false;
for (auto *U : ALMFunc->users()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to use a work list here rather than the simple loop.

This function https://github.com/intel/llvm/blob/sycl/sycl/include/syclcompat/memory.hpp#L71 needs to be updated as well, and this function won't be able to handle the nesting. The CI is currently green because there is no test requesting 2 distinct local memory objects using this function in the same kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thank you for the suggestion. Now I understand what you mean by syclcompat::local_mem.
Also added a new test sycl/test/check_device_code/syclcompat_local_mem.cpp that has two calls to syclcompat::local_mem in a kernel.

auto *Caller = cast<CallInst>(U)->getFunction();
if (!Caller->hasFnAttribute(Attribute::AlwaysInline)) {
// Already inlined.
continue;
}
std::string FName = llvm::demangle(Caller->getName());
if (FName.find("sycl::_V1::ext::oneapi::group_local_memory") ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding current function name from DPC++ library is unfortunate. The code in the DPC++ header files can be changed at any time.

To make it more robust, I thought we could go up in the call stack up-to the kernel function ignoring all functions in sycl:: namespace. This will require SYCL kernel to be inlined into kernel function wrapper.

@Naghasan, do you have any thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is unfortunate, especially w.r.t. upstreaming. I don't know what the plans are for this one but if it is seen as important, we might want to improve this.

This will require SYCL kernel to be inlined into kernel function wrapper.

I don't think this is an issue TBH, I don't see any benefit in not inline the SYCL kernel in the wrapper, even in SPIR-V.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

I also just realized syclcompat::local_mem uses it, it isn't technically a valid usage of it w.r.t. the extension but something the attribute would allow to correctly handle.

cc @elizabethandrews @joeatodd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

A new attribute "sycl_forceinline" is added in a4fe915
Please review.

std::string::npos) {
// Already inlined.
continue;
}
for (auto *U2 : make_early_inc_range(Caller->users())) {
auto *CI = cast<CallInst>(U2);
InlineFunctionInfo IFI;
[[maybe_unused]] auto Result = InlineFunction(*CI, IFI);
assert(Result.isSuccess() && "inlining failed");
}
Caller->eraseFromParent();
Changed = true;
}

return Changed;
}

// TODO: It should be checked that __sycl_allocateLocalMemory (or its source
// form - group_local_memory) does not occur:
// - in a function (other than user lambda/functor)
Expand Down Expand Up @@ -317,9 +355,8 @@ static bool dynamicWGLocalMemory(Module &M) {

PreservedAnalyses SYCLLowerWGLocalMemoryPass::run(Module &M,
ModuleAnalysisManager &) {
bool MadeChanges = allocaWGLocalMemory(M);
MadeChanges = dynamicWGLocalMemory(M) || MadeChanges;
if (MadeChanges)
return PreservedAnalyses::none();
return PreservedAnalyses::all();
bool Changed = inlineGroupLocalMemoryFunc(M);
Changed |= allocaWGLocalMemory(M);
Changed |= dynamicWGLocalMemory(M);
return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
jsji marked this conversation as resolved.
Show resolved Hide resolved
}
Loading