Add p-norm functionality. #688

SomTambe · 2021-01-29T19:38:31Z

What does this PR do?

Addresses issue Missing norm(x, p) kernel #84 by adding a norm(x::CuArray, p::Integer)
As it says, it adds p-norm functionality.

I will add the coming modifications over here.

Another aspect I should point out, it still does not work for norm(x::CuArray, p::Inf), gives us the same old scalar getindex error. Might have to think of something much better for this, because LinearAlgebra.opnorm() also has not been implemented for CUDA.jl.

maleadt · 2021-01-30T07:54:50Z

lib/cublas/linalg.jl

+        return LinearAlgebra.norm(x)
+    end
+    if p>2
+        return LinearAlgebra.tr(LinearAlgebra.Diagonal(abs.(x))^p)^(1/p)


The problem is that these operations are implemented using CUBLAS, so are themselves restricted to the types CUBLAS supports (plain, basic C types). Ideally we'd have something more generic. What about #84 (comment), is that not valid or slower?

Such a generic versions could also go into GPUArrays.jl, while a specialized version for CUBLAS-supported types could live in CUDA.jl.

@maleadt When we use the method you mentioned in #84, it throws out a scalar indexing warning, which wouldn't work when CUDA.allowscalar(false) is asserted. That is why I have implemented it using CUBLAS based operations.

@maleadt Could you elaborate on the types CUBLAS does not support?

mapreduce is supported by CUDA.jl, I don't understand how that would trigger scalar indexing?

If you look at the CUBLAS docs, you'll see it's a C library that only supports a limited set of element types. That's why we have type unions like CUBLASFloat in CUDA.jl

SomTambe · 2021-02-01T13:09:23Z

@maleadt I have added the tests. Do let me know if there is anything that needs to be changed.

codecov · 2021-02-01T19:35:53Z

Codecov Report

Merging #688 (fcc1c26) into master (61c7879) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #688      +/-   ##
==========================================
+ Coverage   77.86%   77.89%   +0.03%     
==========================================
  Files         118      118              
  Lines        7119     7126       +7     
==========================================
+ Hits         5543     5551       +8     
+ Misses       1576     1575       -1

Impacted Files	Coverage Δ
lib/cublas/linalg.jl	`88.57% <100.00%> (+0.60%)`	⬆️
lib/cudadrv/memory.jl	`82.51% <0.00%> (+0.44%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61c7879...c04d918. Read the comment docs.

lib/cublas/linalg.jl

SomTambe · 2021-03-03T18:49:40Z

@maleadt I tried using the mapreduce version as you suggested. Here is what I tried -
LinearAlgebra.norm(v::CuArray{T},p::Integer) where T = mapreduce(x->x^p,+,v;init=zero(T))^(1/p)

And this is the output I have received -

julia> norm(a,3)
ERROR: LLVM error: Cannot select: 0x66598b50: f32 = fpow 0x6658aba8, 0x6658a8d0, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
  0x6658aba8: f32,ch = CopyFromReg 0x65e04818, Register:f32 %40, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
    0x6dca6988: f32 = Register %40
  0x6658a8d0: f32,ch = CopyFromReg 0x65e04818, Register:f32 %19, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
    0x6658aee8: f32 = Register %19
In function: _Z33julia_partial_mapreduce_grid_44259_identity2__7Float3216CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE4_1_2IS5_ES3_IS7_IS1_Li1ELi1EEEE
Stacktrace:
 [1] handle_error(::Cstring) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\core\context.jl:105
 [2] macro expansion at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\util.jl:114 [inlined]
 [3] LLVMTargetMachineEmitToMemoryBuffer(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\lib\libLLVM_h.jl:3612
 [4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\targetmachine.jl:44
 [5] mcgen(::GPUCompiler.CompilerJob, ::LLVM.Module, ::LLVM.Function, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\mcgen.jl:74
 [6] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
 [7] macro expansion at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:252 [inlined]
 [8] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
 [9] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:248
 [10] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:39
 [11] compile at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:35 [inlined]
 [12] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:302
 [13] cufunction_compile(::GPUCompiler.FunctionSpec) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:297
 [14] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:40
 [15] partial_mapreduce_grid at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:87 [inlined]
 [16] cached_compilation at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:65 [inlined]
 [17] cufunction(::typeof(CUDA.partial_mapreduce_grid), ::Type{Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:289
 [18] cufunction at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:286 [inlined]
 [19] macro expansion at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:100 [inlined]
 [20] mapreducedim!(::typeof(identity), ::typeof(+), ::CuArray{Float32,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuArray{Float32,1}}}; init::Float32) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:192
 [21] _mapreduce(::var"#1#2"{Int64}, ::typeof(+), ::CuArray{Float32,1}; dims::Colon, init::Float32) at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:62
 [22] #mapreduce#15 at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:28 [inlined]
 [23] norm(::CuArray{Float32,1}, ::Int64) at .\REPL[9]:1
 [24] top-level scope at REPL[10]:1

Seems like a compiler-level error since LLVM is being mentioned in the error statements. How do I tackle this? Will my existing implementation work?

PS: Sorry for such a late reply! I got occupied with some other work, and then I had my exams 😢

maleadt · 2021-03-05T15:20:02Z

That's #71, and is expected to be fixed by #750.

SomTambe · 2021-03-05T15:39:24Z

@maleadt Sounds right, I shall then wait for it to get merged. Will rebase the changes I have made then ;).

SomTambe · 2021-03-29T16:42:10Z

Maybe I should start work on this now that the PR has been merged.

Add p-norm functionality.

9180f52

maleadt reviewed Jan 30, 2021

View reviewed changes

maleadt added the needs tests Tests are requested. label Jan 30, 2021

SomTambe added 2 commits February 1, 2021 12:25

Add tests.

e424e01

Set CUDA.allowscalar to false before testing.

1e34822

Modified scalar iteration to a macro.

fcc1c26

maleadt reviewed Feb 1, 2021

View reviewed changes

lib/cublas/linalg.jl Show resolved Hide resolved

SomTambe added 3 commits February 2, 2021 06:33

Changed to if/ else if/ else.

11dd2ac

Corrected syntax.

f23d649

Corrected syntax again.

c04d918

maleadt mentioned this pull request Apr 28, 2021

Implement p-norm. JuliaGPU/GPUArrays.jl#350

Merged

maleadt closed this in JuliaGPU/GPUArrays.jl#350 Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add p-norm functionality. #688

Add p-norm functionality. #688

SomTambe commented Jan 29, 2021 •

edited

Loading

maleadt Jan 30, 2021

maleadt Jan 30, 2021

SomTambe Jan 30, 2021

SomTambe Jan 30, 2021

maleadt Feb 1, 2021

SomTambe commented Feb 1, 2021

codecov bot commented Feb 1, 2021 •

edited

Loading

SomTambe commented Mar 3, 2021 •

edited

Loading

maleadt commented Mar 5, 2021

SomTambe commented Mar 5, 2021

SomTambe commented Mar 29, 2021

Add p-norm functionality. #688

Add p-norm functionality. #688

Conversation

SomTambe commented Jan 29, 2021 • edited Loading

maleadt Jan 30, 2021

Choose a reason for hiding this comment

maleadt Jan 30, 2021

Choose a reason for hiding this comment

SomTambe Jan 30, 2021

Choose a reason for hiding this comment

SomTambe Jan 30, 2021

Choose a reason for hiding this comment

maleadt Feb 1, 2021

Choose a reason for hiding this comment

SomTambe commented Feb 1, 2021

codecov bot commented Feb 1, 2021 • edited Loading

Codecov Report

SomTambe commented Mar 3, 2021 • edited Loading

maleadt commented Mar 5, 2021

SomTambe commented Mar 5, 2021

SomTambe commented Mar 29, 2021

SomTambe commented Jan 29, 2021 •

edited

Loading

codecov bot commented Feb 1, 2021 •

edited

Loading

SomTambe commented Mar 3, 2021 •

edited

Loading