Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce TypedCallable #55111

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

RFC: Introduce TypedCallable #55111

wants to merge 2 commits into from

Conversation

topolarity
Copy link
Member

@topolarity topolarity commented Jul 13, 2024

TypedCallable provides a wrapper for callable objects, with the following benefits:

  1. Enforced type-stability (for concrete AT/RT types)
  2. Fast calling convention (frequently < 10 ns / call)
  3. Normal Julia dispatch semantics (sees new Methods, etc.) + invoke_latest
  4. Pre-compilation support
    (including --trim compatibility for add --trim option for generating smaller binaries #55047)

It can be used like this:

const callbacks = @TypedCallable{(::Int,::Int)->Bool}[]

register_callback!(callbacks, f::F) where {F<:Function} =
    push!(callbacks, @TypedCallable f(::Int,::Int)::Bool)

register_callback!(callbacks, (x,y)->(x == y))
register_callback!(callbacks, (x,y)->(x != y))

@btime callbacks[rand(1:2)](1,1)
# Calling a random (or runtime-known) callback is fast:
#  15.104 ns (0 allocations: 0 bytes)

This is very similar to the existing FunctionWrappers.jl, but there are a few key differences:

  • Better type support: TypedCallable supports the full range of Julia types (incl. Varargs), and it has access to all of Julia's "internal" calling conventions so calls are fast (and allocation-free) for a wider range of input types.
  • Improved dispatch handling: The @cfunction functionality used by FunctionWrappers has several dispatch bugs, which cause wrappers to occasionally not see new Methods. These bugs are fixed (or soon to be fixed) for TypedCallable.
  • Pre-compilation support including for juliac / --trim

By the way, many of the improvements here are actually thanks to the OpaqueClosure introduced by @Keno - This type just builds on top of OpaqueClosure to provide an interface with Julia's usual dispatch semantics.

TODO:

TypedCallable provides a wrapper for callable objects, with the following benefits:
    1. Enforced type-stability (for concrete AT/RT types)
    2. Fast calling convention (frequently < 10 ns / call)
    3. Normal Julia dispatch semantics (sees new Methods, etc.) + invoke_latest
    4. Pre-compilation support (including `--trim` compatibility)

It can be used like this:
```julia
callbacks = @TypedCallable{(::Int,::Int)->Bool}[]

register_callback!(callbacks, f::F) where {F<:Function} =
    push!(callbacks, @TypedCallable f(::Int,::Int)::Bool)

register_callback!(callbacks, (x,y)->(x == y))
register_callback!(callbacks, (x,y)->(x != y))

@Btime callbacks[rand(1:2)](1,1)
```

This is very similar to the existing `FunctionWrappers.jl`, but there
are a few key differences:
  - Better type support: TypedCallable supports the full range of Julia
    types (incl. Varargs), and it has access to all of Julia's "internal"
    calling conventions so calls are fast (and allocation-free) for a
    wider range of input types
  - Improved dispatch handling: The `@cfunction` functionality used by
    FunctionWrappers has several dispatch bugs, which cause wrappers to
    occasionally not see new Methods. These bugs are fixed (or soon to
    be fixed) for TypedCallable.
  - Pre-compilation support including for `juliac` / `--trim` (#55047)

Many of the improvements here are actually thanks to the `OpaqueClosure`
introduced by @Keno - This type just builds on top of OpaqueClosure to
provide an interface with Julia's usual dispatch semantics.

Co-authored-by: Gabriel Baraldi <[email protected]>

# Args is a Tuple{Vararg{Union{Slot{T},Some{T}}}} where Slot{T} represents
# an uncurried argument slot, and Some{T} represents an argument to curry.
@noinline @generated function Core.OpaqueClosure(Args::Tuple, ::Slot{RT}) where RT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessarily complicated. Why not have AT be passthrough and specify (nreq, isva) as a Val?

Copy link
Member Author

@topolarity topolarity Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to support syntaxes like @TypedCallable add_node(self, ::Node)::Nothing or @TypedCallable show(self)::Nothing where we close over more than just the first argument

Copy link
Member

@vtjnash vtjnash Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also banned in precompile, since @generated is not permitted to return a new :opaque_closure object. Can we do without this hand-generated complexity? Stuff like Args.parameters is typically not actually recommended in a generated function either, as it returns something with incorrect type identity (makes the transofrm not pure). I remember doing something like make(f, AT, RT) = (Base.compilerbarrier(:const, Base.Experimental.@opaque(AT->RT, (args...)->f(args...)))::Core.OpaqueClosure{A,R})

The Slot/Splat seems to be just a partial re-implementation of lambdas, but seems a bit less reliable since it has none of the normal lowering, and makes it so that the call is not a subtype of its argument signature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also banned in precompile, since @generated is not permitted to return a new :opaque_closure object.

this opts-out of PartialOpaque support via #54734 so that this is allowed in pre-compile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not inference that is banned, it is the construct itself, since it allocates new state (a Method) which is forbidden during pure operations (a generator)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - I thought that was only forbidden during the execution of the generator, but guess it also applies to the side effects of lowering the generated expression?

@Keno
Copy link
Member

Keno commented Jul 13, 2024

Seems generally sane.

whose invalidations you may have no control over.
"""
mutable struct TypedCallable{AT,RT}
@atomic oc::Base.RefValue{Core.OpaqueClosure{AT,RT}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to keep the atomic operations small, since otherwise the OC is inlined and we start emitting jl_(un)lock_value, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just refuse to inline @atomic annotated structs that are larger than our max atomic size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd be in support of that - #51495 (comment) is also related

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main advantage of that is we could do atomic reads without needing to take the lock, so reading may be more scalable. Changing the implementation to use a seqlock would also fix that, still without requiring the extra allocation of this. The test in #51495 was benchmarking a store of a large object with not using a large object, so it wasn't directly comparing equivalent things.


TypedCallable can also be used as an "invalidation barrier", since the caller of a
TypedCallable is not affected by any invalidations of its callee(s). This doesn't
completely cure the original invalidation, but it stops it from propagating all the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe useful to note that this will also block other information propagation? I am thinking constant-propagation, effects, escape-analysis etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants