Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect return value of tuple with compile_shlib #101

Open
baggepinnen opened this issue Mar 20, 2023 · 7 comments
Open

Incorrect return value of tuple with compile_shlib #101

baggepinnen opened this issue Mar 20, 2023 · 7 comments

Comments

@baggepinnen
Copy link
Contributor

baggepinnen commented Mar 20, 2023

The following code works as expected when calling compile, but returns the wrong result when using compile_shlib

using StaticArrays, LinearAlgebra, StaticCompiler

T = Float32

Base.@ccallable function controller1(xt::Tuple{Float32,Float32}, ut::Tuple{Float32})::Tuple{Float32,Float32}
    T = Float32
    A_ = @SMatrix T[1 1; 0 1]
    B_ = @SMatrix T[0; 1]
    x = SVector(xt)
    u = SVector(ut)
    xp = A_ * x #+ B_ * u
    xp.data
end

x = @SVector randn(T, 2) 
u = @SVector randn(T, 1) 

x′ = controller1(x.data, u.data) # test
@code_warntype controller1(x.data, u.data) # checks out

argtypes_controller1 = Tuple{typeof(x.data), typeof(u.data)}
controller1_compiled, path_controller1 = compile(controller1, argtypes_controller1, "controller1") 
x′ = controller1_compiled(x.data, u.data) # Works fine

path_controller1 = compile_shlib(controller1, argtypes_controller1, "controller1")

function c_step(x, u)
    Libc.Libdl.dlopen(path_controller1) do lib
        fn = Libc.Libdl.dlsym(lib, :julia_controller1)
        @ccall $(fn)(x::Tuple{Float32, Float32}, u::Tuple{Float32})::Tuple{Float32, Float32}
    end
end

x′ = c_step(x.data, u.data)
julia> x′ = controller1_compiled(x.data, u.data) # Works fine
(-0.22692525f0, -1.4554836f0)

julia> x′ = c_step(x.data, u.data)
(0.0f0, 0.0f0)

If I uncomment the rest of xp = A_ * x #+ B_ * u, it still works with compile, but I instead get a segfault with compile_shlib`.


julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 12 on 24 virtual cores
@baggepinnen
Copy link
Contributor Author

A much smaller example

using LinearAlgebra, StaticCompiler

function controller3(xt::Tuple{Float64,Float64})::Tuple{Float64,Float64}
    xt
end

x = (randn(Float64, 2)...,)
x′ = controller3(x) # test
@code_warntype controller3(x) # checks out

argtypes_controller3 = Tuple{typeof(x)}
controller3_compiled, path_controller3 = compile(controller3, argtypes_controller3, "controller3") 
x′ = controller3_compiled(x) # Works fine

path_controller3 = compile_shlib(controller3, argtypes_controller3, "controller3")

function c_step(x)
    Libc.Libdl.dlopen(path_controller3) do lib
        fn = Libc.Libdl.dlsym(lib, :julia_controller3)
        ccall(fn, Tuple{Float64, Float64}, (Tuple{Float64, Float64}, ), x)
    end
end

x′ = c_step(x)

@brenhinkeller
Copy link
Collaborator

brenhinkeller commented Mar 20, 2023

So as you may know, compile_shlib and compile_executable have quite a few more limitations than compile, because they don't link to libjulia.

Among other things, this means that while you can use types and dispatch as much as you want within your compiled function as long as everything's type-stable and inlined (since then your types all get compiled away), the same is not true if you try to return a Julia type from a function in a shlib. That shlib is just machine code, so has no awareness of Julia types, and while it may compile and return something if you tell it to return a Julia type, that something may not be what you expect.

Machine code of course cannot ever actually return a Julia type, only a native type (float, int/uint, bool, or pointer)! So if you want to compile something to native machine code and have it return an object of a Julia type (even something immutable, like a tuple), you'll have to figure out how Julia really does this under the hood. @ccall appears to be trying to do this for you, but evidently failing (I would guess due to hard-coding some pointer that is valid when compiled but not valid when later used).

The simplest way around this is to wrap all your Julia-typed objects in Refs (or anything else that lets you get a pointer to them) and pass around the pointers to both your inputs and your outputs as arguments -- for example:
https://github.com/brenhinkeller/StaticTools.jl#compiled-sodylib-shared-libraries
https://github.com/brenhinkeller/StaticTools.jl#calling-compiled-julia-library-from-julia

@brenhinkeller
Copy link
Collaborator

See also #100

@baggepinnen
Copy link
Contributor Author

Yeah, I might have put too much hope into ccall understanding how to convert my tuple :/ Thanks for clarifying! What do you think it would take to "teach" ccall that an NTuple{T} maps to a C-array with the corresponding C version of T, like NTuple{2, Float64} => double var[2]?

@baggepinnen
Copy link
Contributor Author

And related, would it be possible to detect that the user is making such an error (using a Julia type) and throw a helpful error message?

@brenhinkeller
Copy link
Collaborator

So my guess is that ccall understands the memory layout of the tuple, but is looking for it in the wrong place...

As was part of the issue in #100, Julia often needs a place to put things when a function returns. In that case, this was causing calls to the GC to be added, but I suspect the exact same underlying problem exists here -- except in this case I suspect Julia is solving it differently (because there are no errors about missing "gc" / "alloc" functions). What I suspect is happening instead is that the Julia compiler is actually inserting a hard-coded pointer rather than inserting a call to the GC.

One way to check may be looking at the @code_llvm output for the function in question and looking for a hard-coded memory address. These are actually pretty common in Julia code; quite a number of Julia functions will compile to LLVM IR that simply hard-codes a pointer location. And as long as you're within the same Julia session, this could be a very efficient way of telling the code exactly where to look for something. However, as soon as you quit the Julia session where you did the compilation (or possibly even before then, if the memory in question gets GC'd!), that memory location will be invalid and you'll get wrong results and/or segfaults.

An error message for this is a great idea -- I'll try adding it as a warning for now so folks can still play around #102

@jpsamaroo
Copy link
Collaborator

Looking at @code_llvm, returning a Tuple uses the sret calling convention, which means that the first argument to the function is a slot allocated on the stack that the result will be stored to (and then the function just "returns" nothing). If you tell ccall that you're returning a Tuple, it will probably assume an sret calling convention, but I'm not sure of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants