CUDA LLVM Debug Info Segfault #576

jgreener64 · 2023-01-12T16:44:21Z

I am on Enzyme 0.10.15, CUDA 3.12.1 and Julia 1.8.2. sync_threads in a GPU kernel causes a segfault:

using CUDA, Enzyme

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

function kernel!(xs)
    sync_threads() # Works with this line commented out
    return
end

function grad_kernel!()
    xs   = CuStaticSharedArray(Float32, 10)
    d_xs = CuStaticSharedArray(Float32, 10)
    sync_threads()

    Enzyme.autodiff_deferred(
        kernel!,
        Duplicated(xs, d_xs),
    )
    return
end

CUDA.@sync @cuda grad_kernel!()

signal (11): Segmentation fault
in expression starting at REPL[1]:1
_ZN4llvm10DwarfDebug23emitInitialLocDirectiveERKNS_15MachineFunctionEj at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter31emitInitialRawDwarfLocDirectiveERKNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter22emitFunctionEntryLabelEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter18emitFunctionHeaderEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter16emitFunctionBodyEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter20runOnMachineFunctionERNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/.julia/packages/LLVM/9gCXO/lib/13/libLLVM_h.jl:947 [inlined]
emit at /home/jgreener/.julia/packages/LLVM/9gCXO/src/targetmachine.jl:45
mcgen at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/mcgen.jl:73
unknown function (ip: 0x7f6d660d81cf)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:430 [inlined]
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:427 [inlined]
#emit_asm#120 at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:68
emit_asm##kw at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:62 [inlined]
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:354
#224 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:347 [inlined]
JuliaContext at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:76
unknown function (ip: 0x7f6d66104c2a)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:346
cached_compilation at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/cache.jl:90
#cufunction#221 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:299
cufunction at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:292
unknown function (ip: 0x7f6d6610458f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
do_call at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_body at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:467
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:750
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64841.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_30403.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_56736.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
true_main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:575
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 60089744 (Pool: 60049914; Big: 39830); GC: 55
Segmentation fault (core dumped)

Errror with Enzyme.API.printall!(true):

after simplification :
; Function Attrs: mustprogress willreturn
define void @preprocess_julia_kernel__4340_inner1({ i8 addrspace(3)*, i64, [1 x i64], i64 } %0) local_unnamed_addr #3 !dbg !13 {
entry:
  %1 = call {}*** @julia.get_pgcstack() #4
  call void @llvm.nvvm.barrier0() #4, !dbg !14
  ret void, !dbg !17
}

; Function Attrs: mustprogress willreturn
define internal void @diffejulia_kernel__4340_inner1({ i8 addrspace(3)*, i64, [1 x i64], i64 } %0, { i8 addrspace(3)*, i64, [1 x i64], i64 } %"'") local_unnamed_addr #3 !dbg !18 {
entry:
  %1 = call {}*** @julia.get_pgcstack() #4
  call void @llvm.nvvm.barrier0() #4, !dbg !19
  br label %invertentry, !dbg !22

invertentry:                                      ; preds = %entry
  call void @llvm.nvvm.barrier0(), !dbg !19
  ret void
}


signal (11): Segmentation fault
in expression starting at REPL[1]:1
_ZN4llvm10DwarfDebug23emitInitialLocDirectiveERKNS_15MachineFunctionEj at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter31emitInitialRawDwarfLocDirectiveERKNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter22emitFunctionEntryLabelEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter18emitFunctionHeaderEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm10AsmPrinter16emitFunctionBodyEv at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm15NVPTXAsmPrinter20runOnMachineFunctionERNS_15MachineFunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/soft/julia/julia-1.8.2/bin/../lib/julia/libLLVM-13jl.so (unknown line)
LLVMTargetMachineEmitToMemoryBuffer at /home/jgreener/.julia/packages/LLVM/9gCXO/lib/13/libLLVM_h.jl:947 [inlined]
emit at /home/jgreener/.julia/packages/LLVM/9gCXO/src/targetmachine.jl:45
mcgen at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/mcgen.jl:73
unknown function (ip: 0x7f9e1796740f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:430 [inlined]
macro expansion at /home/jgreener/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
macro expansion at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:427 [inlined]
#emit_asm#120 at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:68
emit_asm##kw at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/utils.jl:62 [inlined]
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:354
#224 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:347 [inlined]
JuliaContext at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/driver.jl:76
unknown function (ip: 0x7f9e17993e6a)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
cufunction_compile at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:346
cached_compilation at /home/jgreener/.julia/packages/GPUCompiler/hi5Wg/src/cache.jl:90
#cufunction#221 at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:299
cufunction at /home/jgreener/.julia/dev/CUDA/src/compiler/execution.jl:292
unknown function (ip: 0x7f9e179937cf)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
do_call at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_body at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:467
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/interpreter.c:750
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64841.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
#967 at ./client.jl:419
jfptr_YY.967_30403.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_56736.clone_1 at /home/jgreener/soft/julia/julia-1.8.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
true_main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:575
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/jlapi.c:719
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 51509610 (Pool: 51472048; Big: 37562); GC: 46
Segmentation fault (core dumped)

CUDA version info:

CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.161.3, for CUDA 11.4
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+470.161.3
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 47.531 GiB / 47.544 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 45.943 GiB / 47.541 GiB available)

The text was updated successfully, but these errors were encountered:

vchuravy · 2023-01-13T16:03:11Z

So James and I just run into this when playing around with your reproducer from #511 (comment)

using CUDA, Enzyme, StaticArrays, LinearAlgebra, Atomix, UnsafeAtomicsLLVM

CUDA.limit!(CUDA.CU_LIMIT_MALLOC_HEAP_SIZE, 1*1024^3)

struct Atom
    σ::Float32
    ϵ::Float32
end

function find_neighbors(coords)
    n_atoms = length(coords)
    neighbors = Tuple{Int, Int}[]
    for i in 1:n_atoms
        for j in (i + 1):n_atoms
            if norm(coords[i] - coords[j]) <= 1.0
                push!(neighbors, (i, j))
            end
        end
    end
    return neighbors
end

n_atoms = 1024
coords = rand(SVector{3, Float32}, n_atoms) .* 2.7f0
atoms = [Atom(0.02f0, 0.02f0) for _ in 1:n_atoms]
cu_coords = CuArray(coords)
cu_atoms = CuArray(atoms)
neighbors = find_neighbors(coords)
cu_neighbors = CuArray(neighbors)

function force(c1, c2, a1, a2)
    dr = c2 - c1
    invr2 = inv(sum(abs2, dr))
    σ = (a1.σ + a2.σ) / 2
    ϵ = sqrt(a1.ϵ * a2.ϵ)
    six_term = (σ^2 * invr2) ^ 3
    f = (24 * ϵ * invr2) * (2 * six_term ^ 2 - six_term)
    return f * dr
end

function kernel!(forces::CuDeviceMatrix{T}, coords_var, atoms_var, neighbors_var,
                 ::Val{M}, shared_fs) where {T, M}
    coords = CUDA.Const(coords_var)
    atoms = CUDA.Const(atoms_var)
    neighbors = CUDA.Const(neighbors_var)

    tidx = threadIdx().x
    inter_ig = (blockIdx().x - 1) * blockDim().x + tidx
    stride = gridDim().x * blockDim().x
    shared_is = CuStaticSharedArray(Int32, M)
    shared_js = CuStaticSharedArray(Int32, M)

    if tidx == 1
        for si in 1:M
            shared_is[si] = zero(Int32)
        end
    end
    sync_threads()

    for (thread_i, inter_i) in enumerate(inter_ig:stride:length(neighbors))
        si = (thread_i - 1) * blockDim().x + tidx
        i, j = neighbors[inter_i]
        f = force(coords[i], coords[j], atoms[i], atoms[j])
        shared_fs[1, si] = f[1]
        shared_fs[2, si] = f[2]
        shared_fs[3, si] = f[3]
        shared_is[si] = i
        shared_js[si] = j
    end
    sync_threads()

    if tidx == 1
        for si in 1:M
            i = shared_is[si]
            if iszero(i)
                break
            end
            j = shared_js[si]
            dx, dy, dz = shared_fs[1, si], shared_fs[2, si], shared_fs[3, si]
            Atomix.@atomic :monotonic forces[1, i] += -dx
            Atomix.@atomic :monotonic forces[2, i] += -dy
            Atomix.@atomic :monotonic forces[3, i] += -dz
            Atomix.@atomic :monotonic forces[1, j] += dx
            Atomix.@atomic :monotonic forces[2, j] += dy
            Atomix.@atomic :monotonic forces[3, j] += dz
        end
    end
    return
end

function grad_kernel!(forces::CuDeviceMatrix{T}, d_forces, coords, d_coords, atoms, d_atoms,
                      neighbors, shared_mem_size::Val{M}) where {T, M}
    shared_fs = CuStaticSharedArray(T, (3, M))
    d_shared_fs = CuStaticSharedArray(T, (3, M))
    sync_threads()

    Enzyme.autodiff_deferred(
        kernel!,
        Duplicated(forces, d_forces),
        Duplicated(coords, d_coords),
        Duplicated(atoms, d_atoms),
        Const(neighbors),
        Const(shared_mem_size),
        Duplicated(shared_fs, d_shared_fs),
    )
    return
end

cu_forces_mat = CuArray(zeros(Float32, 3, n_atoms))
d_cu_forces_mat = CuArray(rand(Float32, 3, n_atoms))
d_cu_coords = zero(cu_coords)
d_cu_atoms = CuArray([Atom(0.0f0, 0.0f0) for _ in 1:n_atoms])
n_threads = 256
n_blocks = 800
shared_mem_size = 512

CUDA.@sync @cuda threads=n_threads blocks=n_blocks grad_kernel!(cu_forces_mat, d_cu_forces_mat,
        cu_coords, d_cu_coords, cu_atoms, d_cu_atoms, cu_neighbors, Val(shared_mem_size))

Running the above with -g0 works but otherwise crashes with the same error.

wsmoses · 2023-01-30T18:03:32Z

This happens for simple CUDA code, and presumably is related to the recent GPUCompiler.jl/Enzyme debug info pieces.

using CUDA
using Enzyme

function mul_kernel(A)
    i = threadIdx().x
    if i <= length(A)
        A[i] *= A[i]
    end
    return nothing
end

function grad_mul_kernel(A, dA)
    Enzyme.autodiff_deferred(mul_kernel, Const, Duplicated(A, dA))
    return nothing
end

A = CUDA.ones(64,)
dA = similar(A)
dA .= 1
@cuda threads=length(A) grad_mul_kernel(A, dA)

vchuravy · 2023-01-31T02:23:46Z

; Function Attrs: alwaysinline
define void @diffejulia_kernel__3872_inner20wrap([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #6 {
entry:
  %".fca.0.extract'ipev.i" = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0, !dbg !217
  %.fca.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0, !dbg !217
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !219, !range !132
  %.not.i = icmp eq i32 %2, 0, !dbg !226
  br i1 %.not.i, label %L14.i.i, label %diffejulia_kernel__3872_inner20.exit, !dbg !228

L14.i.i:                                          ; preds = %entry
  %.fca.2.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0, !dbg !217
  %3 = icmp slt i64 %.fca.2.0.extract.i, 1, !dbg !229
  br i1 %3, label %L27.i.i, label %L25.i.i, !dbg !243

L25.i.i:                                          ; preds = %L14.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract.i to float addrspace(1)*, !dbg !228
  %5 = bitcast i8 addrspace(1)* %".fca.0.extract'ipev.i" to float addrspace(1)*, !dbg !228
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !244, !tbaa !180, !alias.scope !250, !noalias !253
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !244, !tbaa !180, !alias.scope !253, !noalias !250
  br label %diffejulia_kernel__3872_inner20.exit

L27.i.i:                                          ; preds = %L14.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !243
  unreachable

diffejulia_kernel__3872_inner20.exit:             ; preds = %L25.i.i, %entry
  ret void
}

Has no debuginfo attached and the NVPTX backend failed on the MF.getFunction().getSubprogram().

The function itself has no caller anymore and got inlined into the kernel function

(gdb) p F.getParent()->dump()
; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

%printf_args.0 = type { i64 }
%printf_args.2.1 = type { i32, i64, i64, i32 }

@0 = private unnamed_addr addrspace(1) constant [36 x i8] c"ERROR: Out-of-bounds array access.\0A\00", align 1
@exception = private unnamed_addr addrspace(1) constant [10 x i8] c"exception\00", align 1
@di_func = private unnamed_addr addrspace(1) constant [19 x i8] c"#throw_boundserror\00", align 1
@di_file = private unnamed_addr addrspace(1) constant [63 x i8] c"/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/quirks.jl\00", align 1
@di_func1 = private unnamed_addr addrspace(1) constant [12 x i8] c"checkbounds\00", align 1
@di_file2 = private unnamed_addr addrspace(1) constant [19 x i8] c"./abstractarray.jl\00", align 1
@di_func3 = private unnamed_addr addrspace(1) constant [10 x i8] c"#arrayset\00", align 1
@di_func5 = private unnamed_addr addrspace(1) constant [10 x i8] c"setindex!\00", align 1
@di_file6 = private unnamed_addr addrspace(1) constant [62 x i8] c"/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/array.jl\00", align 1
@di_func7 = private unnamed_addr addrspace(1) constant [8 x i8] c"kernel!\00", align 1
@di_file8 = private unnamed_addr addrspace(1) constant [34 x i8] c"/home/vchuravy/src/Enzyme/cuda.jl\00", align 1
@1 = private unnamed_addr addrspace(1) constant [61 x i8] c"ERROR: a %s was thrown during kernel execution.\0AStacktrace:\0A\00", align 1
@2 = private unnamed_addr addrspace(1) constant [110 x i8] c"WARNING: could not signal exception status to the host, execution will continue.\0A         Please file a bug.\0A\00", align 1
@3 = private unnamed_addr addrspace(1) constant [19 x i8] c" [%i] %s at %s:%i\0A\00", align 1

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata %0, metadata %1, metadata %2) #0

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1

declare i32 @vprintf(i8* %0, i8* %1) local_unnamed_addr

define private fastcc void @gpu_report_exception_name() unnamed_addr #2 !dbg !43 {
top:
  %0 = alloca %printf_args.0, align 8
  %1 = addrspacecast %printf_args.0* %0 to %printf_args.0 addrspace(5)*
  %2 = bitcast %printf_args.0* %0 to i8*, !dbg !52
  call void @llvm.lifetime.start.p0i8(i64 noundef 8, i8* noundef nonnull %2), !dbg !52
  %3 = bitcast %printf_args.0 addrspace(5)* %1 to i64 addrspace(5)*
  store i64 ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @exception to [10 x i8]*) to i64), i64 addrspace(5)* %3, align 8, !dbg !52
  %4 = call i32 @vprintf(i8* noundef getelementptr ([61 x i8], [61 x i8]* addrspacecast ([61 x i8] addrspace(1)* @1 to [61 x i8]*), i64 0, i64 0), i8* noundef nonnull %2), !dbg !52
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %2), !dbg !52
  ret void, !dbg !62
}

define private fastcc void @gpu_report_exception_frame(i32 noundef signext %0, i64 noundef zeroext %1, i64 noundef zeroext %2, i32 noundef signext %3) unnamed_addr #2 !dbg !63 {
top:
  %4 = alloca %printf_args.2.1, align 8
  %5 = addrspacecast %printf_args.2.1* %4 to %printf_args.2.1 addrspace(5)*
  call void @llvm.dbg.value(metadata i32 %0, metadata !70, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i64 %1, metadata !71, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i64 %2, metadata !72, metadata !DIExpression()), !dbg !74
  call void @llvm.dbg.value(metadata i32 %3, metadata !73, metadata !DIExpression()), !dbg !74
  %6 = bitcast %printf_args.2.1* %4 to i8*, !dbg !75
  call void @llvm.lifetime.start.p0i8(i64 noundef 32, i8* noundef nonnull %6), !dbg !75
  %7 = bitcast %printf_args.2.1 addrspace(5)* %5 to i32 addrspace(5)*
  store i32 %0, i32 addrspace(5)* %7, align 8, !dbg !75
  %8 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 1
  store i64 %1, i64 addrspace(5)* %8, align 8, !dbg !75
  %9 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 2
  store i64 %2, i64 addrspace(5)* %9, align 8, !dbg !75
  %10 = getelementptr inbounds %printf_args.2.1, %printf_args.2.1 addrspace(5)* %5, i64 0, i32 3
  store i32 %3, i32 addrspace(5)* %10, align 8, !dbg !75
  %11 = call i32 @vprintf(i8* noundef getelementptr ([19 x i8], [19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @3 to [19 x i8]*), i64 0, i64 0), i8* noundef nonnull %6), !dbg !75
  call void @llvm.lifetime.end.p0i8(i64 noundef 32, i8* noundef nonnull %6), !dbg !75
  ret void, !dbg !82
}

; Function Attrs: nounwind
declare void @llvm.nvvm.membar.sys() #3

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata %0, metadata %1, metadata %2) #0

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg %0, i8* nocapture %1) #4

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg %0, i8* nocapture %1) #4

define ptx_kernel void @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #2 !dbg !83 {
conversion:
  %.fca.0.extract5 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
  %.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0
  call void @llvm.dbg.declare(metadata { i8 addrspace(1)*, i64, [1 x i64], i64 }* undef, metadata !96, metadata !DIExpression(DW_OP_deref)), !dbg !98
  call void @llvm.dbg.declare(metadata { i8 addrspace(1)*, i64, [1 x i64], i64 }* undef, metadata !97, metadata !DIExpression(DW_OP_deref)), !dbg !98
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !99, !range !132
  %.not.i.i = icmp eq i32 %2, 0, !dbg !133
  br i1 %.not.i.i, label %L14.i.i.i, label %diffejulia_kernel__3872_inner20wrap.exit, !dbg !137

L14.i.i.i:                                        ; preds = %conversion
  %.fca.2.0.extract7 = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0
  %3 = icmp slt i64 %.fca.2.0.extract7, 1, !dbg !138
  br i1 %3, label %L27.i.i.i, label %L25.i.i.i, !dbg !168

L25.i.i.i:                                        ; preds = %L14.i.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract5 to float addrspace(1)*, !dbg !137
  %5 = bitcast i8 addrspace(1)* %.fca.0.extract to float addrspace(1)*, !dbg !137
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !169, !tbaa !180, !alias.scope !183, !noalias !186
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !169, !tbaa !180, !alias.scope !186, !noalias !183
  br label %diffejulia_kernel__3872_inner20wrap.exit, !dbg !188

L27.i.i.i:                                        ; preds = %L14.i.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !168
  unreachable, !dbg !188

diffejulia_kernel__3872_inner20wrap.exit:         ; preds = %L25.i.i.i, %conversion
  ret void, !dbg !189
}

; Function Attrs: noinline noreturn
define private fastcc void @julia__throw_boundserror_3879([1 x i64] %state) unnamed_addr #5 !dbg !190 {
top:
  %0 = call i32 @vprintf(i8* noundef getelementptr ([36 x i8], [36 x i8]* addrspacecast ([36 x i8] addrspace(1)* @0 to [36 x i8]*), i64 0, i64 0), i8* noundef null), !dbg !202
  call fastcc void @gpu_report_exception_name() #9, !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 1, i64 noundef ptrtoint ([19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @di_func to [19 x i8]*) to i64), i64 noundef ptrtoint ([63 x i8]* addrspacecast ([63 x i8] addrspace(1)* @di_file to [63 x i8]*) to i64), i32 noundef 4), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 2, i64 noundef ptrtoint ([12 x i8]* addrspacecast ([12 x i8] addrspace(1)* @di_func1 to [12 x i8]*) to i64), i64 noundef ptrtoint ([19 x i8]* addrspacecast ([19 x i8] addrspace(1)* @di_file2 to [19 x i8]*) to i64), i32 noundef 668), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 3, i64 noundef ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @di_func3 to [10 x i8]*) to i64), i64 noundef ptrtoint ([62 x i8]* addrspacecast ([62 x i8] addrspace(1)* @di_file6 to [62 x i8]*) to i64), i32 noundef 151), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 4, i64 noundef ptrtoint ([10 x i8]* addrspacecast ([10 x i8] addrspace(1)* @di_func5 to [10 x i8]*) to i64), i64 noundef ptrtoint ([62 x i8]* addrspacecast ([62 x i8] addrspace(1)* @di_file6 to [62 x i8]*) to i64), i32 noundef 194), !dbg !216
  call fastcc void @gpu_report_exception_frame(i32 noundef 5, i64 noundef ptrtoint ([8 x i8]* addrspacecast ([8 x i8] addrspace(1)* @di_func7 to [8 x i8]*) to i64), i64 noundef ptrtoint ([34 x i8]* addrspacecast ([34 x i8] addrspace(1)* @di_file8 to [34 x i8]*) to i64), i32 noundef 5), !dbg !216
  call fastcc void @gpu_signal_exception([1 x i64] %state), !dbg !216
  call void asm sideeffect "exit;", ""() #10, !dbg !216
  unreachable, !dbg !216
}

; Function Attrs: alwaysinline
define void @diffejulia_kernel__3872_inner20wrap([1 x i64] %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { i8 addrspace(1)*, i64, [1 x i64], i64 } %1) local_unnamed_addr #6 {
entry:
  %".fca.0.extract'ipev.i" = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %1, 0, !dbg !217
  %.fca.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0, !dbg !217
  %2 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #8, !dbg !219, !range !132
  %.not.i = icmp eq i32 %2, 0, !dbg !226
  br i1 %.not.i, label %L14.i.i, label %diffejulia_kernel__3872_inner20.exit, !dbg !228

L14.i.i:                                          ; preds = %entry
  %.fca.2.0.extract.i = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 2, 0, !dbg !217
  %3 = icmp slt i64 %.fca.2.0.extract.i, 1, !dbg !229
  br i1 %3, label %L27.i.i, label %L25.i.i, !dbg !243

L25.i.i:                                          ; preds = %L14.i.i
  %4 = bitcast i8 addrspace(1)* %.fca.0.extract.i to float addrspace(1)*, !dbg !228
  %5 = bitcast i8 addrspace(1)* %".fca.0.extract'ipev.i" to float addrspace(1)*, !dbg !228
  store float 1.000000e+00, float addrspace(1)* %4, align 4, !dbg !244, !tbaa !180, !alias.scope !250, !noalias !253
  store float 0.000000e+00, float addrspace(1)* %5, align 4, !dbg !244, !tbaa !180, !alias.scope !253, !noalias !250
  br label %diffejulia_kernel__3872_inner20.exit

L27.i.i:                                          ; preds = %L14.i.i
  call fastcc void @julia__throw_boundserror_3879([1 x i64] %state), !dbg !243
  unreachable

diffejulia_kernel__3872_inner20.exit:             ; preds = %L25.i.i, %entry
  ret void
}

; Function Attrs: nofree
define private fastcc void @gpu_signal_exception([1 x i64] %state) unnamed_addr #7 !dbg !255 {
top:
  %state.i.fca.0.extract = extractvalue [1 x i64] %state, 0, !dbg !261
  %.not = icmp eq i64 %state.i.fca.0.extract, 0, !dbg !270
  br i1 %.not, label %L12, label %L8, !dbg !270

L8:                                               ; preds = %top
  %0 = inttoptr i64 %state.i.fca.0.extract to i64*, !dbg !271
  store i64 1, i64* %0, align 1, !dbg !271, !tbaa !276
  call void @llvm.nvvm.membar.sys(), !dbg !280
  br label %L15, !dbg !283

L12:                                              ; preds = %top
  %1 = call i32 @vprintf(i8* noundef getelementptr ([110 x i8], [110 x i8]* addrspacecast ([110 x i8] addrspace(1)* @2 to [110 x i8]*), i64 0, i64 0), i8* noundef null), !dbg !284
  br label %L15, !dbg !284

L15:                                              ; preds = %L12, %L8
  ret void, !dbg !290
}

attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #1 = { nounwind readnone }
attributes #2 = { "probe-stack"="inline-asm" }
attributes #3 = { nounwind }
attributes #4 = { argmemonly nofree nosync nounwind willreturn }
attributes #5 = { noinline noreturn "probe-stack"="inline-asm" }
attributes #6 = { alwaysinline "probe-stack"="inline-asm" }
attributes #7 = { nofree "enzyme_inactive" "probe-stack"="inline-asm" }
attributes #8 = { mustprogress willreturn }
attributes #9 = { inaccessiblememonly writeonly }
attributes #10 = { inaccessiblememonly noreturn nounwind writeonly }

!llvm.module.flags = !{!0, !1, !2}
!llvm.dbg.cu = !{!3, !6, !7, !9, !11, !12, !13, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28, !29, !30, !31, !32, !33, !34, !35, !36, !37, !38, !40}
!julia.kernel = !{!41}
!nvvm.annotations = !{!42}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"stack-protector-guard", !"global"}
!3 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !4, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!4 = !DIFile(filename: "/home/vchuravy/src/Enzyme/cuda.jl", directory: ".")
!5 = !{}
!6 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !4, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!7 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !8, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!8 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/quirks.jl", directory: ".")
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!10 = !DIFile(filename: "/home/vchuravy/.julia/packages/GPUCompiler/qdoh1/src/runtime.jl", directory: ".")
!11 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!12 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!13 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!14 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/runtime.jl", directory: ".")
!15 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!16 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!17 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!18 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!19 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!20 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!21 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!22 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!23 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!24 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!25 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!26 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!27 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!28 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!29 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!30 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!31 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!32 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!33 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!34 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!35 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!36 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!37 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !10, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!38 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !39, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!39 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/memory_dynamic.jl", directory: ".")
!40 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !14, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !5, nameTableKind: None)
!41 = !{void ([1 x i64], { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE}
!42 = !{void ([1 x i64], { i8 addrspace(1)*, i64, [1 x i64], i64 }, { i8 addrspace(1)*, i64, [1 x i64], i64 })* @_Z23julia_grad_kernel__313713CuDeviceArrayI7Float32Li1ELi1EES_IS0_Li1ELi1EE, !"kernel", i32 1}
!43 = distinct !DISubprogram(name: "report_exception_name", linkageName: "julia_report_exception_name_3745", scope: null, file: !14, line: 62, type: !44, scopeLine: 62, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !49)
!44 = !DISubroutineType(types: !45)
!45 = !{!46, !47, !48}
!46 = !DICompositeType(tag: DW_TAG_structure_type, name: "Nothing", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139823314235296")
!47 = !DICompositeType(tag: DW_TAG_structure_type, name: "#report_exception_name", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820601748592")
!48 = !DIBasicType(name: "Ptr", size: 64, encoding: DW_ATE_unsigned)
!49 = !{!50, !51}
!50 = !DILocalVariable(name: "#self#", arg: 1, scope: !43, file: !14, line: 62, type: !47)
!51 = !DILocalVariable(name: "ex", arg: 2, scope: !43, file: !14, line: 62, type: !48)
!52 = !DILocation(line: 40, scope: !53, inlinedAt: !56)
!53 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!54 = !DIFile(filename: "/home/vchuravy/.julia/packages/LLVM/qc3sa/src/interop/base.jl", directory: ".")
!55 = !DISubroutineType(types: !5)
!56 = !DILocation(line: 38, scope: !57, inlinedAt: !59)
!57 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!58 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/output.jl", directory: ".")
!59 = !DILocation(line: 38, scope: !60, inlinedAt: !61)
!60 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !17, retainedNodes: !5)
!61 = !DILocation(line: 63, scope: !43)
!62 = !DILocation(line: 67, scope: !43)
!63 = distinct !DISubprogram(name: "report_exception_frame", linkageName: "julia_report_exception_frame_4361", scope: null, file: !14, line: 70, type: !64, scopeLine: 70, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !68)
!64 = !DISubroutineType(types: !65)
!65 = !{!46, !66, !67, !48, !48, !67}
!66 = !DICompositeType(tag: DW_TAG_structure_type, name: "#report_exception_frame", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820596709744")
!67 = !DIBasicType(name: "Int32", size: 32, encoding: DW_ATE_unsigned)
!68 = !{!69, !70, !71, !72, !73}
!69 = !DILocalVariable(name: "#self#", arg: 1, scope: !63, file: !14, line: 70, type: !66)
!70 = !DILocalVariable(name: "idx", arg: 2, scope: !63, file: !14, line: 70, type: !67)
!71 = !DILocalVariable(name: "func", arg: 3, scope: !63, file: !14, line: 70, type: !48)
!72 = !DILocalVariable(name: "file", arg: 4, scope: !63, file: !14, line: 70, type: !48)
!73 = !DILocalVariable(name: "line", arg: 5, scope: !63, file: !14, line: 70, type: !67)
!74 = !DILocation(line: 0, scope: !63)
!75 = !DILocation(line: 40, scope: !76, inlinedAt: !77)
!76 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!77 = !DILocation(line: 38, scope: !78, inlinedAt: !79)
!78 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!79 = !DILocation(line: 38, scope: !80, inlinedAt: !81)
!80 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !40, retainedNodes: !5)
!81 = !DILocation(line: 71, scope: !63)
!82 = !DILocation(line: 72, scope: !63)
!83 = distinct !DISubprogram(name: "grad_kernel!", linkageName: "julia_grad_kernel!_3137", scope: null, file: !4, line: 10, type: !84, scopeLine: 10, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !94)
!84 = !DISubroutineType(types: !85)
!85 = !{!86, !87, !88, !88}
!86 = !DICompositeType(tag: DW_TAG_structure_type, name: "Nothing", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140737009807264")
!87 = !DICompositeType(tag: DW_TAG_structure_type, name: "#grad_kernel!", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140733935981904")
!88 = !DICompositeType(tag: DW_TAG_structure_type, name: "CuDeviceArray", size: 256, align: 64, elements: !89, runtimeLang: DW_LANG_Julia, identifier: "140734134399664")
!89 = !{!90, !91, !92, !91}
!90 = !DIBasicType(name: "LLVMPtr", size: 64, encoding: DW_ATE_unsigned)
!91 = !DIBasicType(name: "Int64", size: 64, encoding: DW_ATE_unsigned)
!92 = !DICompositeType(tag: DW_TAG_structure_type, name: "Tuple", size: 64, align: 64, elements: !93, runtimeLang: DW_LANG_Julia, identifier: "140737012287328")
!93 = !{!91}
!94 = !{!95, !96, !97}
!95 = !DILocalVariable(name: "#self#", arg: 1, scope: !83, file: !4, line: 10, type: !87)
!96 = !DILocalVariable(name: "a", arg: 2, scope: !83, file: !4, line: 10, type: !88)
!97 = !DILocalVariable(name: "da", arg: 3, scope: !83, file: !4, line: 10, type: !88)
!98 = !DILocation(line: 10, scope: !83)
!99 = !DILocation(line: 40, scope: !100, inlinedAt: !101)
!100 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!101 = distinct !DILocation(line: 6, scope: !102, inlinedAt: !104)
!102 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!103 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/indexing.jl", directory: ".")
!104 = distinct !DILocation(line: 6, scope: !105, inlinedAt: !106)
!105 = distinct !DISubprogram(name: "_index;", linkageName: "_index", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!106 = distinct !DILocation(line: 46, scope: !107, inlinedAt: !108)
!107 = distinct !DISubprogram(name: "threadIdx_x;", linkageName: "threadIdx_x", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!108 = distinct !DILocation(line: 92, scope: !109, inlinedAt: !110)
!109 = distinct !DISubprogram(name: "#threadIdx;", linkageName: "#threadIdx", scope: !103, file: !103, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!110 = distinct !DILocation(line: 4, scope: !111, inlinedAt: !118)
!111 = distinct !DISubprogram(name: "kernel!", linkageName: "julia_kernel!_3872", scope: null, file: !4, line: 3, type: !112, scopeLine: 3, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !115)
!112 = !DISubroutineType(types: !113)
!113 = !{!86, !114, !88}
!114 = !DICompositeType(tag: DW_TAG_structure_type, name: "#kernel!", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140733935979904")
!115 = !{!116, !117}
!116 = !DILocalVariable(name: "#self#", arg: 1, scope: !111, file: !4, line: 3, type: !114)
!117 = !DILocalVariable(name: "a", arg: 2, scope: !111, file: !4, line: 3, type: !88)
!118 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !119)
!119 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !120)
!120 = distinct !DILocation(line: 6678, scope: !121, inlinedAt: !123)
!121 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!122 = !DIFile(filename: "/home/vchuravy/src/Enzyme/src/compiler.jl", directory: ".")
!123 = !DILocation(line: 6403, scope: !124, inlinedAt: !125)
!124 = distinct !DISubprogram(name: "enzyme_call;", linkageName: "enzyme_call", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!125 = !DILocation(line: 6366, scope: !126, inlinedAt: !127)
!126 = distinct !DISubprogram(name: "CombinedAdjointThunk;", linkageName: "CombinedAdjointThunk", scope: !122, file: !122, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!127 = !DILocation(line: 396, scope: !128, inlinedAt: !130)
!128 = distinct !DISubprogram(name: "autodiff_deferred;", linkageName: "autodiff_deferred", scope: !129, file: !129, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3, retainedNodes: !5)
!129 = !DIFile(filename: "/home/vchuravy/src/Enzyme/src/Enzyme.jl", directory: ".")
!130 = !DILocation(line: 410, scope: !128, inlinedAt: !131)
!131 = !DILocation(line: 11, scope: !83)
!132 = !{i32 0, i32 1023}
!133 = !DILocation(line: 477, scope: !134, inlinedAt: !136)
!134 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !135, file: !135, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!135 = !DIFile(filename: "promotion.jl", directory: ".")
!136 = distinct !DILocation(line: 427, scope: !134, inlinedAt: !110)
!137 = !DILocation(line: 4, scope: !111, inlinedAt: !118)
!138 = !DILocation(line: 489, scope: !139, inlinedAt: !141)
!139 = distinct !DISubprogram(name: "ifelse;", linkageName: "ifelse", scope: !140, file: !140, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!140 = !DIFile(filename: "essentials.jl", directory: ".")
!141 = distinct !DILocation(line: 488, scope: !142, inlinedAt: !143)
!142 = distinct !DISubprogram(name: "max;", linkageName: "max", scope: !135, file: !135, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!143 = distinct !DILocation(line: 440, scope: !144, inlinedAt: !146)
!144 = distinct !DISubprogram(name: "OneTo;", linkageName: "OneTo", scope: !145, file: !145, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!145 = !DIFile(filename: "range.jl", directory: ".")
!146 = distinct !DILocation(line: 453, scope: !144, inlinedAt: !147)
!147 = distinct !DILocation(line: 455, scope: !148, inlinedAt: !149)
!148 = distinct !DISubprogram(name: "oneto;", linkageName: "oneto", scope: !145, file: !145, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!149 = distinct !DILocation(line: 221, scope: !150, inlinedAt: !152)
!150 = distinct !DISubprogram(name: "map;", linkageName: "map", scope: !151, file: !151, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!151 = !DIFile(filename: "tuple.jl", directory: ".")
!152 = distinct !DILocation(line: 95, scope: !153, inlinedAt: !155)
!153 = distinct !DISubprogram(name: "axes;", linkageName: "axes", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!154 = !DIFile(filename: "abstractarray.jl", directory: ".")
!155 = distinct !DILocation(line: 116, scope: !156, inlinedAt: !157)
!156 = distinct !DISubprogram(name: "axes1;", linkageName: "axes1", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!157 = distinct !DILocation(line: 341, scope: !158, inlinedAt: !159)
!158 = distinct !DISubprogram(name: "eachindex;", linkageName: "eachindex", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!159 = distinct !DILocation(line: 653, scope: !160, inlinedAt: !161)
!160 = distinct !DISubprogram(name: "checkbounds;", linkageName: "checkbounds", scope: !154, file: !154, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!161 = distinct !DILocation(line: 668, scope: !160, inlinedAt: !162)
!162 = distinct !DILocation(line: 151, scope: !163, inlinedAt: !165)
!163 = distinct !DISubprogram(name: "#arrayset;", linkageName: "#arrayset", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!164 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/array.jl", directory: ".")
!165 = distinct !DILocation(line: 194, scope: !166, inlinedAt: !167)
!166 = distinct !DISubprogram(name: "setindex!;", linkageName: "setindex!", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!167 = distinct !DILocation(line: 5, scope: !111, inlinedAt: !118)
!168 = !DILocation(line: 668, scope: !160, inlinedAt: !162)
!169 = !DILocation(line: 40, scope: !100, inlinedAt: !170)
!170 = distinct !DILocation(line: 44, scope: !171, inlinedAt: !173)
!171 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!172 = !DIFile(filename: "/home/vchuravy/.julia/packages/LLVM/qc3sa/src/interop/pointer.jl", directory: ".")
!173 = distinct !DILocation(line: 44, scope: !174, inlinedAt: !175)
!174 = distinct !DISubprogram(name: "pointerset;", linkageName: "pointerset", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!175 = distinct !DILocation(line: 84, scope: !176, inlinedAt: !177)
!176 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !172, file: !172, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!177 = distinct !DILocation(line: 162, scope: !178, inlinedAt: !179)
!178 = distinct !DISubprogram(name: "arrayset_bits;", linkageName: "arrayset_bits", scope: !164, file: !164, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !5)
!179 = distinct !DILocation(line: 153, scope: !163, inlinedAt: !165)
!180 = !{!181, !181, i64 0, i64 0}
!181 = !{!"custom_tbaa_addrspace(1)", !182, i64 0}
!182 = !{!"custom_tbaa"}
!183 = !{!184}
!184 = distinct !{!184, !185, !"primal"}
!185 = distinct !{!185, !" diff: %"}
!186 = !{!187}
!187 = distinct !{!187, !185, !"shadow_0"}
!188 = !DILocation(line: 6678, scope: !121, inlinedAt: !123)
!189 = !DILocation(line: 15, scope: !83)
!190 = distinct !DISubprogram(name: "#throw_boundserror", linkageName: "julia_#throw_boundserror_3879", scope: null, file: !8, line: 40, type: !191, scopeLine: 40, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !198)
!191 = !DISubroutineType(types: !192)
!192 = !{!193, !197, !88, !92}
!193 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !194, size: 64, align: 64)
!194 = !DICompositeType(tag: DW_TAG_structure_type, name: "jl_value_t", file: !195, line: 71, align: 64, elements: !196)
!195 = !DIFile(filename: "julia.h", directory: "")
!196 = !{!193}
!197 = !DICompositeType(tag: DW_TAG_structure_type, name: "#throw_boundserror", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "140737013229216")
!198 = !{!199, !200, !201}
!199 = !DILocalVariable(name: "#self#", arg: 1, scope: !190, file: !8, line: 40, type: !197)
!200 = !DILocalVariable(name: "A", arg: 2, scope: !190, file: !8, line: 40, type: !88)
!201 = !DILocalVariable(name: "I", arg: 3, scope: !190, file: !8, line: 40, type: !92)
!202 = !DILocation(line: 40, scope: !203, inlinedAt: !204)
!203 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!204 = !DILocation(line: 38, scope: !205, inlinedAt: !206)
!205 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!206 = !DILocation(line: 38, scope: !207, inlinedAt: !208)
!207 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!208 = !DILocation(line: 173, scope: !205, inlinedAt: !209)
!209 = !DILocation(line: 0, scope: !210, inlinedAt: !212)
!210 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!211 = !DIFile(filename: "none", directory: ".")
!212 = !DILocation(line: 0, scope: !213, inlinedAt: !214)
!213 = distinct !DISubprogram(name: "_cuprint;", linkageName: "_cuprint", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !5)
!214 = !DILocation(line: 222, scope: !205, inlinedAt: !215)
!215 = !DILocation(line: 3, scope: !190)
!216 = !DILocation(line: 4, scope: !190)
!217 = !DILocation(line: 0, scope: !111, inlinedAt: !218)
!218 = distinct !DILocation(line: 0, scope: !111)
!219 = !DILocation(line: 40, scope: !100, inlinedAt: !220)
!220 = distinct !DILocation(line: 6, scope: !102, inlinedAt: !221)
!221 = distinct !DILocation(line: 6, scope: !105, inlinedAt: !222)
!222 = distinct !DILocation(line: 46, scope: !107, inlinedAt: !223)
!223 = distinct !DILocation(line: 92, scope: !109, inlinedAt: !224)
!224 = distinct !DILocation(line: 4, scope: !111, inlinedAt: !225)
!225 = distinct !DILocation(line: 0, scope: !111, inlinedAt: !218)
!226 = !DILocation(line: 477, scope: !134, inlinedAt: !227)
!227 = distinct !DILocation(line: 427, scope: !134, inlinedAt: !224)
!228 = !DILocation(line: 4, scope: !111, inlinedAt: !225)
!229 = !DILocation(line: 489, scope: !139, inlinedAt: !230)
!230 = distinct !DILocation(line: 488, scope: !142, inlinedAt: !231)
!231 = distinct !DILocation(line: 440, scope: !144, inlinedAt: !232)
!232 = distinct !DILocation(line: 453, scope: !144, inlinedAt: !233)
!233 = distinct !DILocation(line: 455, scope: !148, inlinedAt: !234)
!234 = distinct !DILocation(line: 221, scope: !150, inlinedAt: !235)
!235 = distinct !DILocation(line: 95, scope: !153, inlinedAt: !236)
!236 = distinct !DILocation(line: 116, scope: !156, inlinedAt: !237)
!237 = distinct !DILocation(line: 341, scope: !158, inlinedAt: !238)
!238 = distinct !DILocation(line: 653, scope: !160, inlinedAt: !239)
!239 = distinct !DILocation(line: 668, scope: !160, inlinedAt: !240)
!240 = distinct !DILocation(line: 151, scope: !163, inlinedAt: !241)
!241 = distinct !DILocation(line: 194, scope: !166, inlinedAt: !242)
!242 = distinct !DILocation(line: 5, scope: !111, inlinedAt: !225)
!243 = !DILocation(line: 668, scope: !160, inlinedAt: !240)
!244 = !DILocation(line: 40, scope: !100, inlinedAt: !245)
!245 = distinct !DILocation(line: 44, scope: !171, inlinedAt: !246)
!246 = distinct !DILocation(line: 44, scope: !174, inlinedAt: !247)
!247 = distinct !DILocation(line: 84, scope: !176, inlinedAt: !248)
!248 = distinct !DILocation(line: 162, scope: !178, inlinedAt: !249)
!249 = distinct !DILocation(line: 153, scope: !163, inlinedAt: !241)
!250 = !{!251}
!251 = distinct !{!251, !252, !"primal"}
!252 = distinct !{!252, !" diff: %"}
!253 = !{!254}
!254 = distinct !{!254, !252, !"shadow_0"}
!255 = distinct !DISubprogram(name: "signal_exception", linkageName: "julia_signal_exception_3895", scope: null, file: !14, line: 35, type: !256, scopeLine: 35, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !259)
!256 = !DISubroutineType(types: !257)
!257 = !{!46, !258}
!258 = !DICompositeType(tag: DW_TAG_structure_type, name: "#signal_exception", align: 8, elements: !5, runtimeLang: DW_LANG_Julia, identifier: "139820601937760")
!259 = !{!260}
!260 = !DILocalVariable(name: "#self#", arg: 1, scope: !255, file: !14, line: 35, type: !258)
!261 = !DILocation(line: 40, scope: !262, inlinedAt: !263)
!262 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !54, file: !54, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!263 = !DILocation(line: 0, scope: !264, inlinedAt: !265)
!264 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!265 = !DILocation(line: 0, scope: !266, inlinedAt: !267)
!266 = distinct !DISubprogram(name: "kernel_state;", linkageName: "kernel_state", scope: !211, file: !211, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!267 = !DILocation(line: 33, scope: !268, inlinedAt: !269)
!268 = distinct !DISubprogram(name: "exception_flag;", linkageName: "exception_flag", scope: !14, file: !14, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!269 = !DILocation(line: 36, scope: !255)
!270 = !DILocation(line: 37, scope: !255)
!271 = !DILocation(line: 118, scope: !272, inlinedAt: !274)
!272 = distinct !DISubprogram(name: "unsafe_store!;", linkageName: "unsafe_store!", scope: !273, file: !273, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!273 = !DIFile(filename: "pointer.jl", directory: ".")
!274 = !DILocation(line: 118, scope: !272, inlinedAt: !275)
!275 = !DILocation(line: 38, scope: !255)
!276 = !{!277, !277, i64 0}
!277 = !{!"jtbaa_data", !278, i64 0}
!278 = !{!"jtbaa", !279, i64 0}
!279 = !{!"jtbaa"}
!280 = !DILocation(line: 121, scope: !281, inlinedAt: !283)
!281 = distinct !DISubprogram(name: "threadfence_system;", linkageName: "threadfence_system", scope: !282, file: !282, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!282 = !DIFile(filename: "/home/vchuravy/.julia/packages/CUDA/BbliS/src/device/intrinsics/synchronization.jl", directory: ".")
!283 = !DILocation(line: 39, scope: !255)
!284 = !DILocation(line: 40, scope: !262, inlinedAt: !285)
!285 = !DILocation(line: 38, scope: !286, inlinedAt: !287)
!286 = distinct !DISubprogram(name: "macro expansion;", linkageName: "macro expansion", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!287 = !DILocation(line: 38, scope: !288, inlinedAt: !289)
!288 = distinct !DISubprogram(name: "_cuprintf;", linkageName: "_cuprintf", scope: !58, file: !58, type: !55, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !22, retainedNodes: !5)
!289 = !DILocation(line: 41, scope: !255)
!290 = !DILocation(line: 46, scope: !255)

vchuravy · 2023-01-31T02:28:32Z

@wsmoses already post-enzyme we are missing !dbg

vchuravy · 2023-01-31T02:35:19Z

https://github.com/EnzymeAD/Enzyme.jl/blame/0699223eb71f9afe2572bd4c4c59539d8409deb6/src/compiler.jl#L5262-L5264

vchuravy added cesmix priority labels Jan 13, 2023

jgreener64 mentioned this issue Jan 13, 2023

CUDA.@atomic error in GPU kernel #511

Open

wsmoses changed the title ~~sync_threads gives segfault in GPU kernel~~ CUDA LLVM Debug Info Segfault Jan 30, 2023

wsmoses mentioned this issue Jan 30, 2023

Segfault with CUDA kernel that sums two vectors #572

Closed

wsmoses assigned vchuravy Jan 30, 2023

vchuravy mentioned this issue Jan 31, 2023

Fix seqfault in emission of NVPTX due to missing DIsubprogram #582

Merged

vchuravy closed this as completed in #582 Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA LLVM Debug Info Segfault #576

CUDA LLVM Debug Info Segfault #576

jgreener64 commented Jan 12, 2023

vchuravy commented Jan 13, 2023

wsmoses commented Jan 30, 2023

vchuravy commented Jan 31, 2023

vchuravy commented Jan 31, 2023

vchuravy commented Jan 31, 2023

CUDA LLVM Debug Info Segfault #576

CUDA LLVM Debug Info Segfault #576

Comments

jgreener64 commented Jan 12, 2023

vchuravy commented Jan 13, 2023

wsmoses commented Jan 30, 2023

vchuravy commented Jan 31, 2023

vchuravy commented Jan 31, 2023

vchuravy commented Jan 31, 2023