-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add p-norm functionality. #688
Conversation
return LinearAlgebra.norm(x) | ||
end | ||
if p>2 | ||
return LinearAlgebra.tr(LinearAlgebra.Diagonal(abs.(x))^p)^(1/p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that these operations are implemented using CUBLAS, so are themselves restricted to the types CUBLAS supports (plain, basic C types). Ideally we'd have something more generic. What about #84 (comment), is that not valid or slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a generic versions could also go into GPUArrays.jl, while a specialized version for CUBLAS-supported types could live in CUDA.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maleadt Could you elaborate on the types CUBLAS does not support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapreduce
is supported by CUDA.jl, I don't understand how that would trigger scalar indexing?
If you look at the CUBLAS docs, you'll see it's a C library that only supports a limited set of element types. That's why we have type unions like CUBLASFloat in CUDA.jl
@maleadt I have added the tests. Do let me know if there is anything that needs to be changed. |
Codecov Report
@@ Coverage Diff @@
## master #688 +/- ##
==========================================
+ Coverage 77.86% 77.89% +0.03%
==========================================
Files 118 118
Lines 7119 7126 +7
==========================================
+ Hits 5543 5551 +8
+ Misses 1576 1575 -1
Continue to review full report at Codecov.
|
@maleadt I tried using the And this is the output I have received - julia> norm(a,3)
ERROR: LLVM error: Cannot select: 0x66598b50: f32 = fpow 0x6658aba8, 0x6658a8d0, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6658aba8: f32,ch = CopyFromReg 0x65e04818, Register:f32 %40, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6dca6988: f32 = Register %40
0x6658a8d0: f32,ch = CopyFromReg 0x65e04818, Register:f32 %19, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
0x6658aee8: f32 = Register %19
In function: _Z33julia_partial_mapreduce_grid_44259_identity2__7Float3216CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE4_1_2IS5_ES3_IS7_IS1_Li1ELi1EEEE
Stacktrace:
[1] handle_error(::Cstring) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\core\context.jl:105
[2] macro expansion at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\util.jl:114 [inlined]
[3] LLVMTargetMachineEmitToMemoryBuffer(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\lib\libLLVM_h.jl:3612
[4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\targetmachine.jl:44
[5] mcgen(::GPUCompiler.CompilerJob, ::LLVM.Module, ::LLVM.Function, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\mcgen.jl:74
[6] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
[7] macro expansion at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:252 [inlined]
[8] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
[9] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:248
[10] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:39
[11] compile at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:35 [inlined]
[12] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:302
[13] cufunction_compile(::GPUCompiler.FunctionSpec) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:297
[14] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:40
[15] partial_mapreduce_grid at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:87 [inlined]
[16] cached_compilation at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:65 [inlined]
[17] cufunction(::typeof(CUDA.partial_mapreduce_grid), ::Type{Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:289
[18] cufunction at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:286 [inlined]
[19] macro expansion at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:100 [inlined]
[20] mapreducedim!(::typeof(identity), ::typeof(+), ::CuArray{Float32,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuArray{Float32,1}}}; init::Float32) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:192
[21] _mapreduce(::var"#1#2"{Int64}, ::typeof(+), ::CuArray{Float32,1}; dims::Colon, init::Float32) at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:62
[22] #mapreduce#15 at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:28 [inlined]
[23] norm(::CuArray{Float32,1}, ::Int64) at .\REPL[9]:1
[24] top-level scope at REPL[10]:1 Seems like a compiler-level error since LLVM is being mentioned in the error statements. How do I tackle this? Will my existing implementation work? PS: Sorry for such a late reply! I got occupied with some other work, and then I had my exams 😢 |
@maleadt Sounds right, I shall then wait for it to get merged. Will rebase the changes I have made then ;). |
Maybe I should start work on this now that the PR has been merged. |
What does this PR do?
norm(x::CuArray, p::Integer)
I will add the coming modifications over here.
Another aspect I should point out, it still does not work for
norm(x::CuArray, p::Inf)
, gives us the same old scalar getindex error. Might have to think of something much better for this, becauseLinearAlgebra.opnorm()
also has not been implemented forCUDA.jl
.