Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add p-norm functionality. #688

Closed
wants to merge 7 commits into from
Closed

Add p-norm functionality. #688

wants to merge 7 commits into from

Conversation

SomTambe
Copy link

@SomTambe SomTambe commented Jan 29, 2021

What does this PR do?

I will add the coming modifications over here.

Another aspect I should point out, it still does not work for norm(x::CuArray, p::Inf), gives us the same old scalar getindex error. Might have to think of something much better for this, because LinearAlgebra.opnorm() also has not been implemented for CUDA.jl.

return LinearAlgebra.norm(x)
end
if p>2
return LinearAlgebra.tr(LinearAlgebra.Diagonal(abs.(x))^p)^(1/p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that these operations are implemented using CUBLAS, so are themselves restricted to the types CUBLAS supports (plain, basic C types). Ideally we'd have something more generic. What about #84 (comment), is that not valid or slower?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a generic versions could also go into GPUArrays.jl, while a specialized version for CUBLAS-supported types could live in CUDA.jl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maleadt When we use the method you mentioned in #84, it throws out a scalar indexing warning, which wouldn't work when CUDA.allowscalar(false) is asserted. That is why I have implemented it using CUBLAS based operations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maleadt Could you elaborate on the types CUBLAS does not support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapreduce is supported by CUDA.jl, I don't understand how that would trigger scalar indexing?

If you look at the CUBLAS docs, you'll see it's a C library that only supports a limited set of element types. That's why we have type unions like CUBLASFloat in CUDA.jl

@maleadt maleadt added the needs tests Tests are requested. label Jan 30, 2021
@SomTambe
Copy link
Author

SomTambe commented Feb 1, 2021

@maleadt I have added the tests. Do let me know if there is anything that needs to be changed.

@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #688 (fcc1c26) into master (61c7879) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #688      +/-   ##
==========================================
+ Coverage   77.86%   77.89%   +0.03%     
==========================================
  Files         118      118              
  Lines        7119     7126       +7     
==========================================
+ Hits         5543     5551       +8     
+ Misses       1576     1575       -1     
Impacted Files Coverage Δ
lib/cublas/linalg.jl 88.57% <100.00%> (+0.60%) ⬆️
lib/cudadrv/memory.jl 82.51% <0.00%> (+0.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61c7879...c04d918. Read the comment docs.

@SomTambe
Copy link
Author

SomTambe commented Mar 3, 2021

@maleadt I tried using the mapreduce version as you suggested. Here is what I tried -
LinearAlgebra.norm(v::CuArray{T},p::Integer) where T = mapreduce(x->x^p,+,v;init=zero(T))^(1/p)

And this is the output I have received -

julia> norm(a,3)
ERROR: LLVM error: Cannot select: 0x66598b50: f32 = fpow 0x6658aba8, 0x6658a8d0, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
  0x6658aba8: f32,ch = CopyFromReg 0x65e04818, Register:f32 %40, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
    0x6dca6988: f32 = Register %40
  0x6658a8d0: f32,ch = CopyFromReg 0x65e04818, Register:f32 %19, math.jl:913 @[ REPL[9]:1 @[ broadcast.jl:648 @[ broadcast.jl:621 @[ broadcast.jl:575 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:80 @[ C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:117 ] ] ] ] ] ]
    0x6658aee8: f32 = Register %19
In function: _Z33julia_partial_mapreduce_grid_44259_identity2__7Float3216CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE4_1_2IS5_ES3_IS7_IS1_Li1ELi1EEEE
Stacktrace:
 [1] handle_error(::Cstring) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\core\context.jl:105
 [2] macro expansion at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\util.jl:114 [inlined]
 [3] LLVMTargetMachineEmitToMemoryBuffer(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\lib\libLLVM_h.jl:3612
 [4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\LLVM\7Q46C\src\targetmachine.jl:44
 [5] mcgen(::GPUCompiler.CompilerJob, ::LLVM.Module, ::LLVM.Function, ::LLVM.API.LLVMCodeGenFileType) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\mcgen.jl:74
 [6] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
 [7] macro expansion at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:252 [inlined]
 [8] macro expansion at C:\Users\tambe\.julia\packages\TimerOutputs\ZmKD7\src\TimerOutput.jl:206 [inlined]
 [9] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:248
 [10] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:39
 [11] compile at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\driver.jl:35 [inlined]
 [12] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:302
 [13] cufunction_compile(::GPUCompiler.FunctionSpec) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:297
 [14] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:40
 [15] partial_mapreduce_grid at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:87 [inlined]
 [16] cached_compilation at C:\Users\tambe\.julia\packages\GPUCompiler\uTpNx\src\cache.jl:65 [inlined]
 [17] cufunction(::typeof(CUDA.partial_mapreduce_grid), ::Type{Tuple{typeof(identity),typeof(+),Float32,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}},Val{true},CuDeviceArray{Float32,2,1},Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuDeviceArray{Float32,1,1}}}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:289
 [18] cufunction at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:286 [inlined]
 [19] macro expansion at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\compiler\execution.jl:100 [inlined]
 [20] mapreducedim!(::typeof(identity), ::typeof(+), ::CuArray{Float32,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},var"#1#2"{Int64},Tuple{CuArray{Float32,1}}}; init::Float32) at C:\Users\tambe\.julia\packages\CUDA\wTQsK\src\mapreduce.jl:192
 [21] _mapreduce(::var"#1#2"{Int64}, ::typeof(+), ::CuArray{Float32,1}; dims::Colon, init::Float32) at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:62
 [22] #mapreduce#15 at C:\Users\tambe\.julia\packages\GPUArrays\WV76E\src\host\mapreduce.jl:28 [inlined]
 [23] norm(::CuArray{Float32,1}, ::Int64) at .\REPL[9]:1
 [24] top-level scope at REPL[10]:1

Seems like a compiler-level error since LLVM is being mentioned in the error statements. How do I tackle this? Will my existing implementation work?

PS: Sorry for such a late reply! I got occupied with some other work, and then I had my exams 😢

@maleadt
Copy link
Member

maleadt commented Mar 5, 2021

That's #71, and is expected to be fixed by #750.

@SomTambe
Copy link
Author

SomTambe commented Mar 5, 2021

@maleadt Sounds right, I shall then wait for it to get merged. Will rebase the changes I have made then ;).

@SomTambe
Copy link
Author

Maybe I should start work on this now that the PR has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Tests are requested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants