-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support arbitrary bitwidth integers #45486
Comments
While I would like arbitrary integers for something I eventually want to do, my understanding is exposing them is a fairly dangerous gamble because the LLVM backends may not properly handle integers that are not multiples of 8-bits (or at least that is my recollection of part of the discussion that was had leading to #35526, but that discussion was unfortunately lost to the Slack-hole). As such, they open up more places for bugs and segfaults to creap in. |
we definitely shouldn't do this before we see how it plays out in other languages. for sizes larger than 64 bits, I think this might not have benefits over an efficient bigint, and for smaller sizes, freaking with non byte aligned objects will be a total pain to implement. |
I don't know if we would actually be responsible for aligning the data in this case. From what was mentioned in #39315, it sounds like the LLVM backend should do that for us as it sees fit (e.g. it should ingest the non-standard integers and convert them to a form usable value in the backend). The main issue is that LLVM backends might not have support for doing that, or might not do it properly. I know that the usecase I have in mind, I wouldn't want any alignment done before passing to the backend. Since my eventual goal is to try taking the LLVM IR and feeding it into the OneAPI or Vitis HLS toolchains and get hardware out of it. So for the hardware I don't want padding, because that padding would negate the benefit of using a smaller integer type in the first place. |
We are about to see how it plays out in C I believe. Let's see if n2709, proposing |
Following this conversation for https://github.com/JuliaStrings/InlineStrings.jl. |
As the Project Editor for C who integrated these changes (and has a thousand more things they need to do to stuff them into the C standard), please go ahead. Please add this. Please go even further beyond than the rest of us have ever dreamed of going. |
Thanks for the encouragement @workingjubilee and @ThePhD . It's time to hack this up and see where we get stuck. I hope it is as easy as removing some artificial limitations for cc: @rfourquet julia> using BitIntegers
julia> BitIntegers.@define_integers 24
@uint24_str (macro with 1 method)
julia> x = uint24"1" 0x000001
julia> @code_llvm identity(x)
; @ operators.jl:513 within `identity`
define i24 @julia_identity_371(i24 zeroext %0) #0 {
top:
ret i24 %0
}
julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
[1] top-level scope
@ ~/.julia/packages/BitIntegers/6M5fx/src/BitIntegers.jl:60 |
The immediate obstacle is Lines 1540 to 1541 in f64463d
We can work around this by calling julia> raw_new_primitive_type(name, nbits, mod = Main, supertype = Unsigned, parameters = Core.svec()) =
@ccall jl_new_primitivetype(
pointer_from_objref(name)::Ptr{Nothing},
pointer_from_objref(mod)::Ptr{Nothing},
pointer_from_objref(supertype)::Ptr{Nothing},
pointer_from_objref(parameters)::Ptr{Nothing},
nbits::Csize_t
)::Ref{DataType}
raw_new_primitive_type (generic function with 5 methods)
julia> const foobar = raw_new_primitive_type(:foobar, 17)
foobar
julia> typeof(foobar)
DataType
julia> sizeof(foobar)
3
julia> x = Core.checked_trunc_uint(foobar, 5);
julia> @code_llvm identity(x)
; @ operators.jl:526 within `identity`
; Function Attrs: uwtable
define i24 @julia_identity_567(i24 zeroext %0) #0 {
top:
ret i24 %0
} The fundamental issue is that Lines 512 to 531 in f64463d
While we may still need the size in bytes, we may want to track how many bits are unused (0 - 7). Thus we need to steal 3 bits from somewhere, perhaps from the |
We also could just (at least for an initial version) only allow multiple of 8 sizes. I'm not sure how many people need a 57 bit integer. (4 and 12 bit would be somewhat nice for DNA/HDR images and stuff, but rounding to the byte isn't exactly unreasonable). |
We already do this. julia> primitive type foobar <: Unsigned 24 end
julia> primitive type foobar40 <: Unsigned 40 end
With BitIntegers.jl we get a fully functional integer: julia> BitIntegers.@define_integers 24
@uint24_str (macro with 1 method)
julia> uint24"4096"
0x001000
julia> int24"4096"
4096
julia> int24"4096" + uint24"4096"
0x002000 This issue is about getting those 4 and 12 bit integers. julia> primitive type UInt4 <: Unsigned 4 end
ERROR: invalid number of bits in primitive type UInt4
Stacktrace:
[1] top-level scope
@ REPL[129]:1
julia> primitive type UInt12 <: Unsigned 12 end
ERROR: invalid number of bits in primitive type UInt12
Stacktrace:
[1] top-level scope
@ REPL[130]:1 |
It's also worth considering older color formats as well, which may e.g. pack 5 bits per R, G, and B. Obviously, those pack into 16 bits nicely, possibly using extra bit for a mask, and so it's not that hard to manually unpack, do the math, mask, repack, etc. etc. But allowing people to not have to write such code, reimplementing it over and over, is what these types are for. |
Are those 15 bits packed together such that 8 pixels would only take up 15 bytes (120 bits)? My main foray into this is camera hardware that uses 12-bit or 14-bit analog-to-digitial conversion and then transferring those packed bits into a raw binary file. One could use SIMD based shuffles to unpack the camera frame data into uint16, but there were some applications where we just wanted to sample a few values. There are two distinct issues I see here:
My current sense is that the compiler, LLVM specifically, knows how to do item 1 above, specifically regarding precision. Item 2 is interesting, helped by item 1, but needs more specification. I would be surprised if the compiler knows how to do this. I want to take advantage of existing compiler features, particularly if they are in use by clang for C23 BitInt support. |
Feel free to make a PR to change that to bits instead of bytes internally. Be aware the compiler is free to add padding, and will therefore likely round up to bytes for performance. |
As I understand, the purpose of non byte aligned types is for dealing with very large arrays, so the scalar representation doesn't matter as much as what happens when you have 1 million of them in an array. |
While I am interested in bitpacking applications, that's not really what N2763 is really focused on. Where it does talk about arrays it seems that it rounds up to the nearest byte, although this is a platform specific implementation detail.
That is a
We may still want store and access the byte size because of what I mentioned above. What we may want to store is a separate bit alignment |
wait so if this datatype always rounds up to the nearest byte in storage, why do we care about it at all? |
|
From my reading of this thread, we are missing a definition for the scoping of where the arbitrary bit width integers reside, and at what level everything happens at. My original thoughts for this type of feature was that the Julia compiler would just generate the appropriate LLVM @vtjnash, what do you mean by
Are you referring to the Julia compiler adding padding before (or during) its LLVM passes, or to padding that might be added in the LLVM backend during native code generation? In reply to @oscardssmith's two comments:
CPUs aren't the only backends that can benefit from these arbitrary integers, and those other backends wouldn't want extra padding/alignment added to values. For instance, newer GPUs are starting to include Int4 support in some of their cores, so they want native 4-bit integers without padding to 8-bits (@maleadt might be able to clarify how that actually works, but I would imagine it might want an LLVM |
Indeed, this is my thought as well, but there are still some questions about what a Julia array of these looks like. Previously C23's https://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html
|
The problem with pushing this to LLVM is that for the places this actually matters (e.g. representation within arrays) Julia needs to explicitly choose a layout for the type. |
Ok, fine, let's talk about arrays. Zig is ahead of us in this department. Zig has normal arrays which has a memory layout where each element is a multiple of a byte. Via Godbolt, we can see an array of Zig also has a I would recommend pursuing a similar pattern. The main question I have with implementation is do we modify the |
LLVM currently has support for arbitrary bitwidth integers.
https://llvm.org/docs/LangRef.html#integer-type
https://reviews.llvm.org/rG5f0903e9bec97e67bf34d887bcbe9d05790de934
https://reviews.llvm.org/rG6c75ab5f66b4
https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329?u=programmerjake
Rust:
https://internals.rust-lang.org/t/pre-rfc-arbitrary-bit-width-integers/15603
rust-lang/rfcs#2581
Zig:
https://ziglang.org/documentation/master/#Primitive-Types
C23:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
The text was updated successfully, but these errors were encountered: