Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA LLVM Debug Info Segfault #576

Closed
jgreener64 opened this issue Jan 12, 2023 · 5 comments · Fixed by #582
Closed

CUDA LLVM Debug Info Segfault #576

jgreener64 opened this issue Jan 12, 2023 · 5 comments · Fixed by #582
Assignees

Comments

@jgreener64
Copy link
Contributor

I am on Enzyme 0.10.15, CUDA 3.12.1 and Julia 1.8.2. sync_threads in a GPU kernel causes a segfault:

using CUDA, Enzyme

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

function kernel!(xs)
    sync_threads() # Works with this line commented out
    return
end

function grad_kernel!()
    xs   = CuStaticSharedArray(Float32, 10)
    d_xs = CuStaticSharedArray(Float32, 10)
    sync_threads()

    Enzyme.autodiff_deferred(
        kernel!,
        Duplicated(xs, d_xs),
    )
    return
end

CUDA.@sync @cuda grad_kernel!()
signal (11): Segmentation fault
in expression starting at REPL[1]:1
_ZN4llvm10DwarfDebug23emitInitialLocDirectiveERKNS_15MachineFunctionEj at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter31emitInitialRawDwarfLocDirectiveERKNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter22emitFunctionEntryLabelEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter18emitFunctionHeaderEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter16emitFunctionBodyEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter20runOnMachineFunctionERNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/.julia/packages/LLVM/9gCXO/lib/13/libLLVM_h.jl:947 [inlined]
emit at /home/jgreener/.julia/packages/LLVM/9gCXO/src/targetmachine.jl:45
mcgen at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/mcgen.jl:73
unknown function (ip: 0x7f6d660d81cf)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:430 [inlined]
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:427 [inlined]
#emit_asm#120 at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:68
emit_asm##kw at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:62 [inlined]
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:354
#224 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:347 [inlined]
JuliaContext at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:76
unknown function (ip: 0x7f6d66104c2a)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:346
cached_compilation at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/cache.jl:90
#cufunction#221 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:299
cufunction at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:292
unknown function (ip: 0x7f6d6610458f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
do_call at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_body at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:467
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:750
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64841.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_30403.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_56736.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
true_main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:575
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 60089744 (Pool: 60049914; Big: 39830); GC: 55
Segmentation fault (core dumped)

Errror with Enzyme.API.printall!(true):

after simplification :
; Function Attrs: mustprogress willreturn
define void @preprocess_julia_kernel__4340_inner1({ i8 addrspace(3)*, i64, [1 x i64], i64 } %0) local_unnamed_addr #3 !dbg !13 {
entry:
  %1 = call {}*** @julia.get_pgcstack() #4
  call void @llvm.nvvm.barrier0() #4, !dbg !14
  ret void, !dbg !17
}

; Function Attrs: mustprogress willreturn
define internal void @diffejulia_kernel__4340_inner1({ i8 addrspace(3)*, i64, [1 x i64], i64 } %0, { i8 addrspace(3)*, i64, [1 x i64], i64 } %"'") local_unnamed_addr #3 !dbg !18 {
entry:
  %1 = call {}*** @julia.get_pgcstack() #4
  call void @llvm.nvvm.barrier0() #4, !dbg !19
  br label %invertentry, !dbg !22

invertentry:                                      ; preds = %entry
  call void @llvm.nvvm.barrier0(), !dbg !19
  ret void
}


signal (11): Segmentation fault
in expression starting at REPL[1]:1
_ZN4llvm10DwarfDebug23emitInitialLocDirectiveERKNS_15MachineFunctionEj at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter31emitInitialRawDwarfLocDirectiveERKNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter22emitFunctionEntryLabelEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter18emitFunctionHeaderEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter16emitFunctionBodyEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter20runOnMachineFunctionERNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/.julia/packages/LLVM/9gCXO/lib/13/libLLVM_h.jl:947 [inlined]
emit at /home/jgreener/.julia/packages/LLVM/9gCXO/src/targetmachine.jl:45
mcgen at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/mcgen.jl:73
unknown function (ip: 0x7f9e1796740f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:430 [inlined]
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:427 [inlined]
#emit_asm#120 at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:68
emit_asm##kw at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:62 [inlined]
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:354
#224 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:347 [inlined]
JuliaContext at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:76
unknown function (ip: 0x7f9e17993e6a)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:346
cached_compilation at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/cache.jl:90
#cufunction#221 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:299
cufunction at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:292
unknown function (ip: 0x7f9e179937cf)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
do_call at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_body at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:467
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:750
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64841.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_30403.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_56736.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
true_main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:575
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 51509610 (Pool: 51472048; Big: 37562); GC: 46
Segmentation fault (core dumped)

CUDA version info:

CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.161.3, for CUDA 11.4
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+470.161.3
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 47.531 GiB / 47.544 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 45.943 GiB / 47.541 GiB available)
@vchuravy
Copy link
Member

So James and I just run into this when playing around with your reproducer from #511 (comment)

using CUDA, Enzyme, StaticArrays, LinearAlgebra, Atomix, UnsafeAtomicsLLVM

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

struct Atom
    σ::Float32
    ϵ::Float32
end

function find_neighbors(coords)
    n_atoms = length(coords)
    neighbors = Tuple{Int, Int}[]
    for i in 1:n_atoms
        for j in (i + 1):n_atoms
            if norm(coords[i] - coords[j]) <= 1.0
                push!(neighbors, (i, j))
            end
        end
    end
    return neighbors
end

n_atoms = 1024
coords = rand(SVector{3, Float32}, n_atoms) .* 2.7f0
atoms = [Atom(0.02f0, 0.02f0) for _ in 1:n_atoms]
cu_coords = CuArray(coords)
cu_atoms = CuArray(atoms)
neighbors = find_neighbors(coords)
cu_neighbors = CuArray(neighbors)

function force(c1, c2, a1, a2)
    dr = c2 - c1
    invr2 = inv(sum(abs2, dr))
    σ = (a1.σ + a2.σ) / 2
    ϵ = sqrt(a1.ϵ * a2.ϵ)
    six_term = (σ^2 * invr2) ^ 3
    f = (24 * ϵ * invr2) * (2 * six_term ^ 2 - six_term)
    return f * dr
end

function kernel!(forces::CuDeviceMatrix{T}, coords_var, atoms_var, neighbors_var,
                 ::Val{M}, shared_fs) where {T, M}
    coords = CUDA.Const(coords_var)
    atoms = CUDA.Const(atoms_var)
    neighbors = CUDA.Const(neighbors_var)

    tidx = threadIdx().x
    inter_ig = (blockIdx().x - 1) * blockDim().x + tidx
    stride = gridDim().x * blockDim().x
    shared_is = CuStaticSharedArray(Int32, M)
    shared_js = CuStaticSharedArray(Int32, M)

    if tidx == 1
        for si in 1:M
            shared_is[si] = zero(Int32)
        end
    end
    sync_threads()

    for (thread_i, inter_i) in enumerate(inter_ig:stride:length(neighbors))
        si = (thread_i - 1) * blockDim().x + tidx
        i, j = neighbors[inter_i]
        f = force(coords[i], coords[j], atoms[i], atoms[j])
        shared_fs[1, si] = f[1]
        shared_fs[2, si] = f[2]
        shared_fs[3, si] = f[3]
        shared_is[si] = i
        shared_js[si] = j
    end
    sync_threads()

    if tidx == 1
        for si in 1:M
            i = shared_is[si]
            if iszero(i)
                break
            end
            j = shared_js[si]
            dx, dy, dz = shared_fs[1, si], shared_fs[2, si], shared_fs[3, si]
            Atomix.@atomic :monotonic forces[1, i] += -dx
            Atomix.@atomic :monotonic forces[2, i] += -dy
            Atomix.@atomic :monotonic forces[3, i] += -dz
            Atomix.@atomic :monotonic forces[1, j] += dx
            Atomix.@atomic :monotonic forces[2, j] += dy
            Atomix.@atomic :monotonic forces[3, j] += dz
        end
    end
    return
end

function grad_kernel!(forces::CuDeviceMatrix{T}, d_forces, coords, d_coords, atoms, d_atoms,
                      neighbors, shared_mem_size::Val{M}) where {T, M}
    shared_fs = CuStaticSharedArray(T, (3, M))
    d_shared_fs = CuStaticSharedArray(T, (3, M))
    sync_threads()

    Enzyme.autodiff_deferred(
        kernel!,
        Duplicated(forces, d_forces),
        Duplicated(coords, d_coords),
        Duplicated(atoms, d_atoms),
        Const(neighbors),
        Const(shared_mem_size),
        Duplicated(shared_fs, d_shared_fs),
    )
    return
end

cu_forces_mat = CuArray(zeros(Float32, 3, n_atoms))
d_cu_forces_mat = CuArray(rand(Float32, 3, n_atoms))
d_cu_coords = zero(cu_coords)
d_cu_atoms = CuArray([Atom(0.0f0, 0.0f0) for _ in 1:n_atoms])
n_threads = 256
n_blocks = 800
shared_mem_size = 512

CUDA.@sync @cuda threads=n_threads blocks=n_blocks grad_kernel!(cu_forces_mat, d_cu_forces_mat,
        cu_coords, d_cu_coords, cu_atoms, d_cu_atoms, cu_neighbors, Val(shared_mem_size))

Running the above with -g0 works but otherwise crashes with the same error.

@wsmoses wsmoses changed the title sync_threads gives segfault in GPU kernel CUDA LLVM Debug Info Segfault Jan 30, 2023
@wsmoses
Copy link
Member

wsmoses commented Jan 30, 2023

This happens for simple CUDA code, and presumably is related to the recent GPUCompiler.jl/Enzyme debug info pieces.

using CUDA
using Enzyme

function mul_kernel(A)
    i = threadIdx().x
    if i <= length(A)
        A[i] *= A[i]
    end
    return nothing
end

function grad_mul_kernel(A, dA)
    Enzyme.autodiff_deferred(mul_kernel, Const, Duplicated(A, dA))
    return nothing
end

A = CUDA.ones(64,)
dA = similar(A)
dA .= 1
@cuda threads=length(A) grad_mul_kernel(A, dA)

@vchuravy
Copy link
Member

; Function Attrs: alwaysinline
define void @diffejulia_kernel__3872_inner20wrap([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #6 {
entry:
  %".fca.0.extract'ipev.i" = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0, !dbg !217
  %.fca.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0, !dbg !217
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !219, !range !132
  %.not.i = icmp eq i32 %2, 0, !dbg !226
  br i1 %.not.i, label %L14.i.i, label %diffejulia_kernel__3872_inner20.exit, !dbg !228

L14.i.i:                                          ; preds = %entry
  %.fca.2.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0, !dbg !217
  %3 = icmp slt i64 %.fca.2.0.extract.i, 1, !dbg !229
  br i1 %3, label %L27.i.i, label %L25.i.i, !dbg !243

L25.i.i:                                          ; preds = %L14.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract.i to float addrspace(1)*, !dbg !228
  %5 = bitcast i8 addrspace(1)* %".fca.0.extract'ipev.i" to float addrspace(1)*, !dbg !228
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !244, !tbaa !180, !alias.scope !250, !noalias !253
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !244, !tbaa !180, !alias.scope !253, !noalias !250
  br label %diffejulia_kernel__3872_inner20.exit

L27.i.i:                                          ; preds = %L14.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !243
  unreachable

diffejulia_kernel__3872_inner20.exit:             ; preds = %L25.i.i, %entry
  ret void
}

Has no debuginfo attached and the NVPTX backend failed on the MF.getFunction().getSubprogram().

The function itself has no caller anymore and got inlined into the kernel function

(gdb) p F.getParent()->dump()
; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

%printf_args.0 = type { i64 }
%printf_args.2.1 = type { i32, i64, i64, i32 }

@0 = private unnamed_addr addrspace(1) constant [36 x i8] c"ERROR: Out-of-bounds array access.\0A\00", align 1
@exception = private unnamed_addr addrspace(1) constant [10 x i8] c"exception\00", align 1
@di_func = private unnamed_addr addrspace(1) constant [19 x i8] c"#throw_boundserror\00", align 1
@di_file = private unnamed_addr addrspace(1) constant [63 x i8] c"/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/quirks.jl\00", align 1
@di_func1 = private unnamed_addr addrspace(1) constant [12 x i8] c"checkbounds\00", align 1
@di_file2 = private unnamed_addr addrspace(1) constant [19 x i8] c"./abstractarray.jl\00", align 1
@di_func3 = private unnamed_addr addrspace(1) constant [10 x i8] c"#arrayset\00", align 1
@di_func5 = private unnamed_addr addrspace(1) constant [10 x i8] c"setindex!\00", align 1
@di_file6 = private unnamed_addr addrspace(1) constant [62 x i8] c"/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/array.jl\00", align 1
@di_func7 = private unnamed_addr addrspace(1) constant [8 x i8] c"kernel!\00", align 1
@di_file8 = private unnamed_addr addrspace(1) constant [34 x i8] c"/home/vchuravy/src/Enzyme/cuda.jl\00", align 1
@1 = private unnamed_addr addrspace(1) constant [61 x i8] c"ERROR: a %s was thrown during kernel execution.\0AStacktrace:\0A\00", align 1
@2 = private unnamed_addr addrspace(1) constant [110 x i8] c"WARNING: could not signal exception status to the host, execution will continue.\0A         Please file a bug.\0A\00", align 1
@3 = private unnamed_addr addrspace(1) constant [19 x i8] c" [%i] %s at %s:%i\0A\00", align 1

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata %0, metadata %1, metadata %2) #0

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1

declare i32 @vprintf(i8* %0, i8* %1) local_unnamed_addr

define private fastcc void @gpu_report_exception_name() unnamed_addr #2 !dbg !43 {
top:
  %0 = alloca %printf_args.0, align 8
  %1 = addrspacecast %printf_args.0* %0 to %printf_args.0 addrspace(5)*
  %2 = bitcast %printf_args.0* %0 to i8*, !dbg !52
  call void @llvm.lifetime.start.p0i8(i64 noundef 8, i8* noundef nonnull %2), !dbg !52
  %3 = bitcast %printf_args.0 addrspace(5)* %1 to i64 addrspace(5)*
  store i64 ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @exception to [10 x i8]*) to i64), i64 addrspace(5)* %3, align 8, !dbg !52
  %4 = call i32 @vprintf(i8* noundef getelementptr ([61 x i8], [61 x i8]* addrspacecast ([61 x i8] addrspace(1)* @1 to [61 x i8]*), i64 0, i64 0), i8* noundef nonnull %2), !dbg !52
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %2), !dbg !52
  ret void, !dbg !62
}

define private fastcc void @gpu_report_exception_frame(i32 noundef signext %0, i64 noundef zeroext %1, i64 noundef zeroext %2, i32 noundef signext %3) unnamed_addr #2 !dbg !63 {
top:
  %4 = alloca %printf_args.2.1, align 8
  %5 = addrspacecast %printf_args.2.1* %4 to %printf_args.2.1 addrspace(5)*
  call void @llvm.dbg.value(metadata i32 %0, metadata !70, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i64 %1, metadata !71, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i64 %2, metadata !72, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i32 %3, metadata !73, metadata !DIExpression()), !dbg !74
  %6 = bitcast %printf_args.2.1* %4 to i8*, !dbg !75
  call void @llvm.lifetime.start.p0i8(i64 noundef 32, i8* noundef nonnull %6), !dbg !75
  %7 = bitcast %printf_args.2.1 addrspace(5)* %5 to i32 addrspace(5)*
  store i32 %0, i32 addrspace(5)* %7, align 8, !dbg !75
  %8 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 1
  store i64 %1, i64 addrspace(5)* %8, align 8, !dbg !75
  %9 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 2
  store i64 %2, i64 addrspace(5)* %9, align 8, !dbg !75
  %10 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 3
  store i32 %3, i32 addrspace(5)* %10, align 8, !dbg !75
  %11 = call i32 @vprintf(i8* noundef getelementptr ([19 x i8], [19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @3 to [19 x i8]*), i64 0, i64 0), i8* noundef nonnull %6), !dbg !75
  call void @llvm.lifetime.end.p0i8(i64 noundef 32, i8* noundef nonnull %6), !dbg !75
  ret void, !dbg !82
}

; Function Attrs: nounwind
declare void @llvm.nvvm.membar.sys() #3

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata %0, metadata %1, metadata %2) #0

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg %0, i8* nocapture %1) #4

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg %0, i8* nocapture %1) #4

define ptx_kernel void @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #2 !dbg !83 {
conversion:
  %.fca.0.extract5 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
  %.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0
  call void @llvm.dbg.declare(metadata { i8 addrspace(1)*, i64, [1 x i64], i64 }* undef, metadata !96, metadata !DIExpression(DW_OP_deref)), !dbg !98
  call void @llvm.dbg.declare(metadata { i8 addrspace(1)*, i64, [1 x i64], i64 }* undef, metadata !97, metadata !DIExpression(DW_OP_deref)), !dbg !98
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !99, !range !132
  %.not.i.i = icmp eq i32 %2, 0, !dbg !133
  br i1 %.not.i.i, label %L14.i.i.i, label %diffejulia_kernel__3872_inner20wrap.exit, !dbg !137

L14.i.i.i:                                        ; preds = %conversion
  %.fca.2.0.extract7 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0
  %3 = icmp slt i64 %.fca.2.0.extract7, 1, !dbg !138
  br i1 %3, label %L27.i.i.i, label %L25.i.i.i, !dbg !168

L25.i.i.i:                                        ; preds = %L14.i.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract5 to float addrspace(1)*, !dbg !137
  %5 = bitcast i8 addrspace(1)* %.fca.0.extract to float addrspace(1)*, !dbg !137
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !169, !tbaa !180, !alias.scope !183, !noalias !186
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !169, !tbaa !180, !alias.scope !186, !noalias !183
  br label %diffejulia_kernel__3872_inner20wrap.exit, !dbg !188

L27.i.i.i:                                        ; preds = %L14.i.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !168
  unreachable, !dbg !188

diffejulia_kernel__3872_inner20wrap.exit:         ; preds = %L25.i.i.i, %conversion
  ret void, !dbg !189
}

; Function Attrs: noinline noreturn
define private fastcc void @julia__throw_boundserror_3879([1 x i64] %state) unnamed_addr #5 !dbg !190 {
top:
  %0 = call i32 @vprintf(i8* noundef getelementptr ([36 x i8], [36 x i8]* addrspacecast ([36 x i8] addrspace(1)* @0 to [36 x i8]*), i64 0, i64 0), i8* noundef null), !dbg !202
  call fastcc void @gpu_report_exception_name() #9, !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 1, i64 noundef ptrtoint ([19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @di_func to [19 x i8]*) to i64), i64 noundef ptrtoint ([63 x i8]* addrspacecast ([63 x i8] addrspace(1)* @di_file to [63 x i8]*) to i64), i32 noundef 4), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 2, i64 noundef ptrtoint ([12 x i8]* addrspacecast ([12 x i8] addrspace(1)* @di_func1 to [12 x i8]*) to i64), i64 noundef ptrtoint ([19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @di_file2 to [19 x i8]*) to i64), i32 noundef 668), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 3, i64 noundef ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @di_func3 to [10 x i8]*) to i64), i64 noundef ptrtoint ([62 x i8]* addrspacecast ([62 x i8] addrspace(1)* @di_file6 to [62 x i8]*) to i64), i32 noundef 151), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 4, i64 noundef ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @di_func5 to [10 x i8]*) to i64), i64 noundef ptrtoint ([62 x i8]* addrspacecast ([62 x i8] addrspace(1)* @di_file6 to [62 x i8]*) to i64), i32 noundef 194), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 5, i64 noundef ptrtoint ([8 x i8]* addrspacecast ([8 x i8] addrspace(1)* @di_func7 to [8 x i8]*) to i64), i64 noundef ptrtoint ([34 x i8]* addrspacecast ([34 x i8] addrspace(1)* @di_file8 to [34 x i8]*) to i64), i32 noundef 5), !dbg !216
  call fastcc void @gpu_signal_exception([1 x i64] %state), !dbg !216
  call void asm sideeffect "exit;", ""() #10, !dbg !216
  unreachable, !dbg !216
}

; Function Attrs: alwaysinline
define void @diffejulia_kernel__3872_inner20wrap([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #6 {
entry:
  %".fca.0.extract'ipev.i" = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0, !dbg !217
  %.fca.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0, !dbg !217
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !219, !range !132
  %.not.i = icmp eq i32 %2, 0, !dbg !226
  br i1 %.not.i, label %L14.i.i, label %diffejulia_kernel__3872_inner20.exit, !dbg !228

L14.i.i:                                          ; preds = %entry
  %.fca.2.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0, !dbg !217
  %3 = icmp slt i64 %.fca.2.0.extract.i, 1, !dbg !229
  br i1 %3, label %L27.i.i, label %L25.i.i, !dbg !243

L25.i.i:                                          ; preds = %L14.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract.i to float addrspace(1)*, !dbg !228
  %5 = bitcast i8 addrspace(1)* %".fca.0.extract'ipev.i" to float addrspace(1)*, !dbg !228
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !244, !tbaa !180, !alias.scope !250, !noalias !253
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !244, !tbaa !180, !alias.scope !253, !noalias !250
  br label %diffejulia_kernel__3872_inner20.exit

L27.i.i:                                          ; preds = %L14.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !243
  unreachable

diffejulia_kernel__3872_inner20.exit:             ; preds = %L25.i.i, %entry
  ret void
}

; Function Attrs: nofree
define private fastcc void @gpu_signal_exception([1 x i64] %state) unnamed_addr #7 !dbg !255 {
top:
  %state.i.fca.0.extract = extractvalue [1 x i64] %state, 0, !dbg !261
  %.not = icmp eq i64 %state.i.fca.0.extract, 0, !dbg !270
  br i1 %.not, label %L12, label %L8, !dbg !270

L8:                                               ; preds = %top
  %0 = inttoptr i64 %state.i.fca.0.extract to i64*, !dbg !271
  store i64 1, i64* %0, align 1, !dbg !271, !tbaa !276
  call void @llvm.nvvm.membar.sys(), !dbg !280
  br label %L15, !dbg !283

L12:                                              ; preds = %top
  %1 = call i32 @vprintf(i8* noundef getelementptr ([110 x i8], [110 x i8]* addrspacecast ([110 x i8] addrspace(1)* @2 to [110 x i8]*), i64 0, i64 0), i8* noundef null), !dbg !284
  br label %L15, !dbg !284

L15:                                              ; preds = %L12, %L8
  ret void, !dbg !290
}

attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #1 = { nounwind readnone }
attributes #2 = { "probe-stack"="inline-asm" }
attributes #3 = { nounwind }
attributes #4 = { argmemonly nofree nosync nounwind willreturn }
attributes #5 = { noinline noreturn "probe-stack"="inline-asm" }
attributes #6 = { alwaysinline "probe-stack"="inline-asm" }
attributes #7 = { nofree "enzyme_inactive" "probe-stack"="inline-asm" }
attributes #8 = { mustprogress willreturn }
attributes #9 = { inaccessiblememonly writeonly }
attributes #10 = { inaccessiblememonly noreturn nounwind writeonly }

!llvm.module.flags = !{!0, !1, !2}
!llvm.dbg.cu = !{!3, !6, !7, !9, !11, !12, !13, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28, !29, !30, !31, !32, !33, !34, !35, !36, !37, !38, !40}
!julia.kernel = !{!41}
!nvvm.annotations = !{!42}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"stack-protector-guard", !"global"}
!3 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !4, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!4 = !DIFile(filename: "/home/vchuravy/src/Enzyme/cuda.jl", directory: ".")
!5 = !{}
!6 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !4, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!7 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !8, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!8 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/quirks.jl", directory: ".")
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!10 = !DIFile(filename: "/home/vchuravy/.julia/packages/GPUCompiler/qdoh1/src/runtime.jl", directory: ".")
!11 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!12 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!13 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!14 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/runtime.jl", directory: ".")
!15 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!16 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!17 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!18 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!19 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!20 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!21 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!22 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!23 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!24 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!25 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!26 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!27 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!28 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!29 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!30 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!31 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!32 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!33 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!34 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!35 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!36 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!37 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!38 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !39, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!39 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/memory_dynamic.jl", directory: ".")
!40 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!41 = !{void ([1 x i64], { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE}
!42 = !{void ([1 x i64], { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE, !"kernel", i32 1}
!43 = distinct !DISubprogram(name: "report_exception_name", linkageName: "julia_report_exception_name_3745", scope: null, file: !14, line: 62, type: !44, scopeLine: 62, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !49)
!44 = !DISubroutineType(types: !45)
!45 = !{!46, !47, !48}
!46 = !DICompositeType(tag: DW_TAG_structure_type, name: "Nothing", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139823314235296")
!47 = !DICompositeType(tag: DW_TAG_structure_type, name: "#report_exception_name", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820601748592")
!48 = !DIBasicType(name: "Ptr", size: 64, encoding: DW_ATE_unsigned)
!49 = !{!50, !51}
!50 = !DILocalVariable(name: "#self#", arg: 1, scope: !43, file: !14, line: 62, type: !47)
!51 = !DILocalVariable(name: "ex", arg: 2, scope: !43, file: !14, line: 62, type: !48)
!52 = !DILocation(line: 40, scope: !53, inlinedAt: !56)
!53 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!54 = !DIFile(filename: "/home/vchuravy/.julia/packages/LLVM/qc3sa/src/interop/base.jl", directory: ".")
!55 = !DISubroutineType(types: !5)
!56 = !DILocation(line: 38, scope: !57, inlinedAt: !59)
!57 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!58 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/output.jl", directory: ".")
!59 = !DILocation(line: 38, scope: !60, inlinedAt: !61)
!60 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!61 = !DILocation(line: 63, scope: !43)
!62 = !DILocation(line: 67, scope: !43)
!63 = distinct !DISubprogram(name: "report_exception_frame", linkageName: "julia_report_exception_frame_4361", scope: null, file: !14, line: 70, type: !64, scopeLine: 70, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !68)
!64 = !DISubroutineType(types: !65)
!65 = !{!46, !66, !67, !48, !48, !67}
!66 = !DICompositeType(tag: DW_TAG_structure_type, name: "#report_exception_frame", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820596709744")
!67 = !DIBasicType(name: "Int32", size: 32, encoding: DW_ATE_unsigned)
!68 = !{!69, !70, !71, !72, !73}
!69 = !DILocalVariable(name: "#self#", arg: 1, scope: !63, file: !14, line: 70, type: !66)
!70 = !DILocalVariable(name: "idx", arg: 2, scope: !63, file: !14, line: 70, type: !67)
!71 = !DILocalVariable(name: "func", arg: 3, scope: !63, file: !14, line: 70, type: !48)
!72 = !DILocalVariable(name: "file", arg: 4, scope: !63, file: !14, line: 70, type: !48)
!73 = !DILocalVariable(name: "line", arg: 5, scope: !63, file: !14, line: 70, type: !67)
!74 = !DILocation(line: 0, scope: !63)
!75 = !DILocation(line: 40, scope: !76, inlinedAt: !77)
!76 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!77 = !DILocation(line: 38, scope: !78, inlinedAt: !79)
!78 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!79 = !DILocation(line: 38, scope: !80, inlinedAt: !81)
!80 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!81 = !DILocation(line: 71, scope: !63)
!82 = !DILocation(line: 72, scope: !63)
!83 = distinct !DISubprogram(name: "grad_kernel!", linkageName: "julia_grad_kernel!_3137", scope: null, file: !4, line: 10, type: !84, scopeLine: 10, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !94)
!84 = !DISubroutineType(types: !85)
!85 = !{!86, !87, !88, !88}
!86 = !DICompositeType(tag: DW_TAG_structure_type, name: "Nothing", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140737009807264")
!87 = !DICompositeType(tag: DW_TAG_structure_type, name: "#grad_kernel!", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140733935981904")
!88 = !DICompositeType(tag: DW_TAG_structure_type, name: "CuDeviceArray", size: 256, align: 64, elements: !89, runtimeLang: DW_LANG_Julia, identifier: "140734134399664")
!89 = !{!90, !91, !92, !91}
!90 = !DIBasicType(name: "LLVMPtr", size: 64, encoding: DW_ATE_unsigned)
!91 = !DIBasicType(name: "Int64", size: 64, encoding: DW_ATE_unsigned)
!92 = !DICompositeType(tag: DW_TAG_structure_type, name: "Tuple", size: 64, align: 64, elements: !93, runtimeLang: DW_LANG_Julia, identifier: "140737012287328")
!93 = !{!91}
!94 = !{!95, !96, !97}
!95 = !DILocalVariable(name: "#self#", arg: 1, scope: !83, file: !4, line: 10, type: !87)
!96 = !DILocalVariable(name: "a", arg: 2, scope: !83, file: !4, line: 10, type: !88)
!97 = !DILocalVariable(name: "da", arg: 3, scope: !83, file: !4, line: 10, type: !88)
!98 = !DILocation(line: 10, scope: !83)
!99 = !DILocation(line: 40, scope: !100, inlinedAt: !101)
!100 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!101 = distinct !DILocation(line: 6, scope: !102, inlinedAt: !104)
!102 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!103 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/indexing.jl", directory: ".")
!104 = distinct !DILocation(line: 6, scope: !105, inlinedAt: !106)
!105 = distinct !DISubprogram(name: "_index;", linkageName: "_index", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!106 = distinct !DILocation(line: 46, scope: !107, inlinedAt: !108)
!107 = distinct !DISubprogram(name: "threadIdx_x;", linkageName: "threadIdx_x", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!108 = distinct !DILocation(line: 92, scope: !109, inlinedAt: !110)
!109 = distinct !DISubprogram(name: "#threadIdx;", linkageName: "#threadIdx", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!110 = distinct !DILocation(line: 4, scope: !111, inlinedAt: !118)
!111 = distinct !DISubprogram(name: "kernel!", linkageName: "julia_kernel!_3872", scope: null, file: !4, line: 3, type: !112, scopeLine: 3, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !115)
!112 = !DISubroutineType(types: !113)
!113 = !{!86, !114, !88}
!114 = !DICompositeType(tag: DW_TAG_structure_type, name: "#kernel!", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140733935979904")
!115 = !{!116, !117}
!116 = !DILocalVariable(name: "#self#", arg: 1, scope: !111, file: !4, line: 3, type: !114)
!117 = !DILocalVariable(name: "a", arg: 2, scope: !111, file: !4, line: 3, type: !88)
!118 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !119)
!119 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !120)
!120 = distinct !DILocation(line: 6678, scope: !121, inlinedAt: !123)
!121 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!122 = !DIFile(filename: "/home/vchuravy/src/Enzyme/src/compiler.jl", directory: ".")
!123 = !DILocation(line: 6403, scope: !124, inlinedAt: !125)
!124 = distinct !DISubprogram(name: "enzyme_call;", linkageName: "enzyme_call", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!125 = !DILocation(line: 6366, scope: !126, inlinedAt: !127)
!126 = distinct !DISubprogram(name: "CombinedAdjointThunk;", linkageName: "CombinedAdjointThunk", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!127 = !DILocation(line: 396, scope: !128, inlinedAt: !130)
!128 = distinct !DISubprogram(name: "autodiff_deferred;", linkageName: "autodiff_deferred", scope: !129, file: !129, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!129 = !DIFile(filename: "/home/vchuravy/src/Enzyme/src/Enzyme.jl", directory: ".")
!130 = !DILocation(line: 410, scope: !128, inlinedAt: !131)
!131 = !DILocation(line: 11, scope: !83)
!132 = !{i32 0, i32 1023}
!133 = !DILocation(line: 477, scope: !134, inlinedAt: !136)
!134 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !135, file: !135, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!135 = !DIFile(filename: "promotion.jl", directory: ".")
!136 = distinct !DILocation(line: 427, scope: !134, inlinedAt: !110)
!137 = !DILocation(line: 4, scope: !111, inlinedAt: !118)
!138 = !DILocation(line: 489, scope: !139, inlinedAt: !141)
!139 = distinct !DISubprogram(name: "ifelse;", linkageName: "ifelse", scope: !140, file: !140, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!140 = !DIFile(filename: "essentials.jl", directory: ".")
!141 = distinct !DILocation(line: 488, scope: !142, inlinedAt: !143)
!142 = distinct !DISubprogram(name: "max;", linkageName: "max", scope: !135, file: !135, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!143 = distinct !DILocation(line: 440, scope: !144, inlinedAt: !146)
!144 = distinct !DISubprogram(name: "OneTo;", linkageName: "OneTo", scope: !145, file: !145, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!145 = !DIFile(filename: "range.jl", directory: ".")
!146 = distinct !DILocation(line: 453, scope: !144, inlinedAt: !147)
!147 = distinct !DILocation(line: 455, scope: !148, inlinedAt: !149)
!148 = distinct !DISubprogram(name: "oneto;", linkageName: "oneto", scope: !145, file: !145, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!149 = distinct !DILocation(line: 221, scope: !150, inlinedAt: !152)
!150 = distinct !DISubprogram(name: "map;", linkageName: "map", scope: !151, file: !151, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!151 = !DIFile(filename: "tuple.jl", directory: ".")
!152 = distinct !DILocation(line: 95, scope: !153, inlinedAt: !155)
!153 = distinct !DISubprogram(name: "axes;", linkageName: "axes", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!154 = !DIFile(filename: "abstractarray.jl", directory: ".")
!155 = distinct !DILocation(line: 116, scope: !156, inlinedAt: !157)
!156 = distinct !DISubprogram(name: "axes1;", linkageName: "axes1", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!157 = distinct !DILocation(line: 341, scope: !158, inlinedAt: !159)
!158 = distinct !DISubprogram(name: "eachindex;", linkageName: "eachindex", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!159 = distinct !DILocation(line: 653, scope: !160, inlinedAt: !161)
!160 = distinct !DISubprogram(name: "checkbounds;", linkageName: "checkbounds", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!161 = distinct !DILocation(line: 668, scope: !160, inlinedAt: !162)
!162 = distinct !DILocation(line: 151, scope: !163, inlinedAt: !165)
!163 = distinct !DISubprogram(name: "#arrayset;", linkageName: "#arrayset", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!164 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/array.jl", directory: ".")
!165 = distinct !DILocation(line: 194, scope: !166, inlinedAt: !167)
!166 = distinct !DISubprogram(name: "setindex!;", linkageName: "setindex!", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!167 = distinct !DILocation(line: 5, scope: !111, inlinedAt: !118)
!168 = !DILocation(line: 668, scope: !160, inlinedAt: !162)
!169 = !DILocation(line: 40, scope: !100, inlinedAt: !170)
!170 = distinct !DILocation(line: 44, scope: !171, inlinedAt: !173)
!171 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!172 = !DIFile(filename: "/home/vchuravy/.julia/packages/LLVM/qc3sa/src/interop/pointer.jl", directory: ".")
!173 = distinct !DILocation(line: 44, scope: !174, inlinedAt: !175)
!174 = distinct !DISubprogram(name: "pointerset;", linkageName: "pointerset", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!175 = distinct !DILocation(line: 84, scope: !176, inlinedAt: !177)
!176 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!177 = distinct !DILocation(line: 162, scope: !178, inlinedAt: !179)
!178 = distinct !DISubprogram(name: "arrayset_bits;", linkageName: "arrayset_bits", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!179 = distinct !DILocation(line: 153, scope: !163, inlinedAt: !165)
!180 = !{!181, !181, i64 0, i64 0}
!181 = !{!"custom_tbaa_addrspace(1)", !182, i64 0}
!182 = !{!"custom_tbaa"}
!183 = !{!184}
!184 = distinct !{!184, !185, !"primal"}
!185 = distinct !{!185, !" diff: %"}
!186 = !{!187}
!187 = distinct !{!187, !185, !"shadow_0"}
!188 = !DILocation(line: 6678, scope: !121, inlinedAt: !123)
!189 = !DILocation(line: 15, scope: !83)
!190 = distinct !DISubprogram(name: "#throw_boundserror", linkageName: "julia_#throw_boundserror_3879", scope: null, file: !8, line: 40, type: !191, scopeLine: 40, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !198)
!191 = !DISubroutineType(types: !192)
!192 = !{!193, !197, !88, !92}
!193 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !194, size: 64, align: 64)
!194 = !DICompositeType(tag: DW_TAG_structure_type, name: "jl_value_t", file: !195, line: 71, align: 64, elements: !196)
!195 = !DIFile(filename: "julia.h", directory: "")
!196 = !{!193}
!197 = !DICompositeType(tag: DW_TAG_structure_type, name: "#throw_boundserror", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140737013229216")
!198 = !{!199, !200, !201}
!199 = !DILocalVariable(name: "#self#", arg: 1, scope: !190, file: !8, line: 40, type: !197)
!200 = !DILocalVariable(name: "A", arg: 2, scope: !190, file: !8, line: 40, type: !88)
!201 = !DILocalVariable(name: "I", arg: 3, scope: !190, file: !8, line: 40, type: !92)
!202 = !DILocation(line: 40, scope: !203, inlinedAt: !204)
!203 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!204 = !DILocation(line: 38, scope: !205, inlinedAt: !206)
!205 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!206 = !DILocation(line: 38, scope: !207, inlinedAt: !208)
!207 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!208 = !DILocation(line: 173, scope: !205, inlinedAt: !209)
!209 = !DILocation(line: 0, scope: !210, inlinedAt: !212)
!210 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!211 = !DIFile(filename: "none", directory: ".")
!212 = !DILocation(line: 0, scope: !213, inlinedAt: !214)
!213 = distinct !DISubprogram(name: "_cuprint;", linkageName: "_cuprint", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!214 = !DILocation(line: 222, scope: !205, inlinedAt: !215)
!215 = !DILocation(line: 3, scope: !190)
!216 = !DILocation(line: 4, scope: !190)
!217 = !DILocation(line: 0, scope: !111, inlinedAt: !218)
!218 = distinct !DILocation(line: 0, scope: !111)
!219 = !DILocation(line: 40, scope: !100, inlinedAt: !220)
!220 = distinct !DILocation(line: 6, scope: !102, inlinedAt: !221)
!221 = distinct !DILocation(line: 6, scope: !105, inlinedAt: !222)
!222 = distinct !DILocation(line: 46, scope: !107, inlinedAt: !223)
!223 = distinct !DILocation(line: 92, scope: !109, inlinedAt: !224)
!224 = distinct !DILocation(line: 4, scope: !111, inlinedAt: !225)
!225 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !218)
!226 = !DILocation(line: 477, scope: !134, inlinedAt: !227)
!227 = distinct !DILocation(line: 427, scope: !134, inlinedAt: !224)
!228 = !DILocation(line: 4, scope: !111, inlinedAt: !225)
!229 = !DILocation(line: 489, scope: !139, inlinedAt: !230)
!230 = distinct !DILocation(line: 488, scope: !142, inlinedAt: !231)
!231 = distinct !DILocation(line: 440, scope: !144, inlinedAt: !232)
!232 = distinct !DILocation(line: 453, scope: !144, inlinedAt: !233)
!233 = distinct !DILocation(line: 455, scope: !148, inlinedAt: !234)
!234 = distinct !DILocation(line: 221, scope: !150, inlinedAt: !235)
!235 = distinct !DILocation(line: 95, scope: !153, inlinedAt: !236)
!236 = distinct !DILocation(line: 116, scope: !156, inlinedAt: !237)
!237 = distinct !DILocation(line: 341, scope: !158, inlinedAt: !238)
!238 = distinct !DILocation(line: 653, scope: !160, inlinedAt: !239)
!239 = distinct !DILocation(line: 668, scope: !160, inlinedAt: !240)
!240 = distinct !DILocation(line: 151, scope: !163, inlinedAt: !241)
!241 = distinct !DILocation(line: 194, scope: !166, inlinedAt: !242)
!242 = distinct !DILocation(line: 5, scope: !111, inlinedAt: !225)
!243 = !DILocation(line: 668, scope: !160, inlinedAt: !240)
!244 = !DILocation(line: 40, scope: !100, inlinedAt: !245)
!245 = distinct !DILocation(line: 44, scope: !171, inlinedAt: !246)
!246 = distinct !DILocation(line: 44, scope: !174, inlinedAt: !247)
!247 = distinct !DILocation(line: 84, scope: !176, inlinedAt: !248)
!248 = distinct !DILocation(line: 162, scope: !178, inlinedAt: !249)
!249 = distinct !DILocation(line: 153, scope: !163, inlinedAt: !241)
!250 = !{!251}
!251 = distinct !{!251, !252, !"primal"}
!252 = distinct !{!252, !" diff: %"}
!253 = !{!254}
!254 = distinct !{!254, !252, !"shadow_0"}
!255 = distinct !DISubprogram(name: "signal_exception", linkageName: "julia_signal_exception_3895", scope: null, file: !14, line: 35, type: !256, scopeLine: 35, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !259)
!256 = !DISubroutineType(types: !257)
!257 = !{!46, !258}
!258 = !DICompositeType(tag: DW_TAG_structure_type, name: "#signal_exception", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820601937760")
!259 = !{!260}
!260 = !DILocalVariable(name: "#self#", arg: 1, scope: !255, file: !14, line: 35, type: !258)
!261 = !DILocation(line: 40, scope: !262, inlinedAt: !263)
!262 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!263 = !DILocation(line: 0, scope: !264, inlinedAt: !265)
!264 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!265 = !DILocation(line: 0, scope: !266, inlinedAt: !267)
!266 = distinct !DISubprogram(name: "kernel_state;", linkageName: "kernel_state", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!267 = !DILocation(line: 33, scope: !268, inlinedAt: !269)
!268 = distinct !DISubprogram(name: "exception_flag;", linkageName: "exception_flag", scope: !14, file: !14, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!269 = !DILocation(line: 36, scope: !255)
!270 = !DILocation(line: 37, scope: !255)
!271 = !DILocation(line: 118, scope: !272, inlinedAt: !274)
!272 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !273, file: !273, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!273 = !DIFile(filename: "pointer.jl", directory: ".")
!274 = !DILocation(line: 118, scope: !272, inlinedAt: !275)
!275 = !DILocation(line: 38, scope: !255)
!276 = !{!277, !277, i64 0}
!277 = !{!"jtbaa_data", !278, i64 0}
!278 = !{!"jtbaa", !279, i64 0}
!279 = !{!"jtbaa"}
!280 = !DILocation(line: 121, scope: !281, inlinedAt: !283)
!281 = distinct !DISubprogram(name: "threadfence_system;", linkageName: "threadfence_system", scope: !282, file: !282, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!282 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/synchronization.jl", directory: ".")
!283 = !DILocation(line: 39, scope: !255)
!284 = !DILocation(line: 40, scope: !262, inlinedAt: !285)
!285 = !DILocation(line: 38, scope: !286, inlinedAt: !287)
!286 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!287 = !DILocation(line: 38, scope: !288, inlinedAt: !289)
!288 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!289 = !DILocation(line: 41, scope: !255)
!290 = !DILocation(line: 46, scope: !255)

@vchuravy
Copy link
Member

@wsmoses already post-enzyme we are missing !dbg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants