Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arbitrary bitwidth integers #45486

Open
mkitti opened this issue May 27, 2022 · 22 comments
Open

Support arbitrary bitwidth integers #45486

mkitti opened this issue May 27, 2022 · 22 comments
Labels
feature Indicates new feature / enhancement requests types and dispatch Types, subtyping and method dispatch

Comments

@imciner2
Copy link
Contributor

While I would like arbitrary integers for something I eventually want to do, my understanding is exposing them is a fairly dangerous gamble because the LLVM backends may not properly handle integers that are not multiples of 8-bits (or at least that is my recollection of part of the discussion that was had leading to #35526, but that discussion was unfortunately lost to the Slack-hole). As such, they open up more places for bugs and segfaults to creap in.

@oscardssmith
Copy link
Member

we definitely shouldn't do this before we see how it plays out in other languages. for sizes larger than 64 bits, I think this might not have benefits over an efficient bigint, and for smaller sizes, freaking with non byte aligned objects will be a total pain to implement.

@imciner2
Copy link
Contributor

I don't know if we would actually be responsible for aligning the data in this case. From what was mentioned in #39315, it sounds like the LLVM backend should do that for us as it sees fit (e.g. it should ingest the non-standard integers and convert them to a form usable value in the backend). The main issue is that LLVM backends might not have support for doing that, or might not do it properly.

I know that the usecase I have in mind, I wouldn't want any alignment done before passing to the backend. Since my eventual goal is to try taking the LLVM IR and feeding it into the OneAPI or Vitis HLS toolchains and get hardware out of it. So for the hardware I don't want padding, because that padding would negate the benefit of using a smaller integer type in the first place.

@mkitti
Copy link
Contributor Author

mkitti commented May 30, 2022

We are about to see how it plays out in C I believe. Let's see if n2709, proposing _BitInt(N) makes it into the next C standard, C23. That I think will be our cue to proceed on this.

@quinnj
Copy link
Member

quinnj commented Jun 10, 2022

Following this conversation for https://github.com/JuliaStrings/InlineStrings.jl.

@workingjubilee
Copy link

workingjubilee commented Feb 18, 2023

An update of N2709 was accepted: N2763.
The last round of _BitInt(N) fixes that I am aware of are in N3035.
You can read N3088 which is the latest C23 draft and includes the latest spec for _BitInt(N), as I understand it.

@ThePhD
Copy link

ThePhD commented Feb 18, 2023

As the Project Editor for C who integrated these changes (and has a thousand more things they need to do to stuff them into the C standard), please go ahead.

Please add this.

Please go even further beyond than the rest of us have ever dreamed of going.

@mkitti
Copy link
Contributor Author

mkitti commented Feb 18, 2023

Thanks for the encouragement @workingjubilee and @ThePhD . It's time to hack this up and see where we get stuck.

I hope it is as easy as removing some artificial limitations for primitive type in Julia itself and in BitIntegers.jl.

cc: @rfourquet

julia> using BitIntegers          
julia> BitIntegers.@define_integers 24
@uint24_str (macro with 1 method)

julia>  x = uint24"1"             0x000001

julia> @code_llvm identity(x)
;  @ operators.jl:513 within `identity`
define i24 @julia_identity_371(i24 zeroext %0) #0 {
top:
  ret i24 %0
}

julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
 [1] top-level scope
   @ ~/.julia/packages/BitIntegers/6M5fx/src/BitIntegers.jl:60

@mkitti
Copy link
Contributor Author

mkitti commented Feb 19, 2023

The immediate obstacle is Core._primitivetype will throw an error for non-multiples of 8.

julia/src/builtins.c

Lines 1540 to 1541 in f64463d

if (nb < 1 || nb >= (1 << 23) || (nb & 7) != 0)
jl_errorf("invalid number of bits in primitive type %s",

We can work around this by calling jl_new_primitivetype directly, but this will basically just round up to the nearest byte-multiple.

julia> raw_new_primitive_type(name, nbits, mod = Main, supertype = Unsigned, parameters = Core.svec()) =
           @ccall jl_new_primitivetype(
               pointer_from_objref(name)::Ptr{Nothing},
               pointer_from_objref(mod)::Ptr{Nothing},
               pointer_from_objref(supertype)::Ptr{Nothing},
               pointer_from_objref(parameters)::Ptr{Nothing},
               nbits::Csize_t
           )::Ref{DataType}
raw_new_primitive_type (generic function with 5 methods)


julia> const foobar = raw_new_primitive_type(:foobar, 17)
foobar

julia> typeof(foobar)
DataType

julia> sizeof(foobar)
3

julia> x = Core.checked_trunc_uint(foobar, 5);

julia> @code_llvm identity(x)
;  @ operators.jl:526 within `identity`
; Function Attrs: uwtable
define i24 @julia_identity_567(i24 zeroext %0) #0 {
top:
  ret i24 %0
}

The fundamental issue is that jl_datatype_layout_t tracks the size in bytes.

julia/src/julia.h

Lines 512 to 531 in f64463d

typedef struct {
uint32_t size;
uint32_t nfields;
uint32_t npointers; // number of pointers embedded inside
int32_t first_ptr; // index of the first pointer (or -1)
uint16_t alignment; // strictest alignment over all fields
uint16_t haspadding : 1; // has internal undefined bytes
uint16_t fielddesc_type : 2; // 0 -> 8, 1 -> 16, 2 -> 32, 3 -> foreign type
uint16_t padding : 13;
// union {
// jl_fielddesc8_t field8[nfields];
// jl_fielddesc16_t field16[nfields];
// jl_fielddesc32_t field32[nfields];
// };
// union { // offsets relative to data start in words
// uint8_t ptr8[npointers];
// uint16_t ptr16[npointers];
// uint32_t ptr32[npointers];
// };
} jl_datatype_layout_t;

While we may still need the size in bytes, we may want to track how many bits are unused (0 - 7). Thus we need to steal 3 bits from somewhere, perhaps from the padding field?

@oscardssmith
Copy link
Member

We also could just (at least for an initial version) only allow multiple of 8 sizes. I'm not sure how many people need a 57 bit integer. (4 and 12 bit would be somewhat nice for DNA/HDR images and stuff, but rounding to the byte isn't exactly unreasonable).

@mkitti
Copy link
Contributor Author

mkitti commented Feb 19, 2023

We also could just (at least for an initial version) only allow multiple of 8 sizes

We already do this.

julia> primitive type foobar <: Unsigned 24 end

julia> primitive type foobar40 <: Unsigned 40 end

With BitIntegers.jl we get a fully functional integer:

julia> BitIntegers.@define_integers 24
@uint24_str (macro with 1 method)

julia> uint24"4096"
0x001000

julia> int24"4096"
4096

julia> int24"4096" + uint24"4096"
0x002000

This issue is about getting those 4 and 12 bit integers.

julia> primitive type UInt4 <: Unsigned 4 end
ERROR: invalid number of bits in primitive type UInt4
Stacktrace:
 [1] top-level scope
   @ REPL[129]:1

julia> primitive type UInt12 <: Unsigned 12 end
ERROR: invalid number of bits in primitive type UInt12
Stacktrace:
 [1] top-level scope
   @ REPL[130]:1

@workingjubilee
Copy link

It's also worth considering older color formats as well, which may e.g. pack 5 bits per R, G, and B.

Obviously, those pack into 16 bits nicely, possibly using extra bit for a mask, and so it's not that hard to manually unpack, do the math, mask, repack, etc. etc. But allowing people to not have to write such code, reimplementing it over and over, is what these types are for.

@mkitti
Copy link
Contributor Author

mkitti commented Feb 19, 2023

Are those 15 bits packed together such that 8 pixels would only take up 15 bytes (120 bits)?

My main foray into this is camera hardware that uses 12-bit or 14-bit analog-to-digitial conversion and then transferring those packed bits into a raw binary file.

One could use SIMD based shuffles to unpack the camera frame data into uint16, but there were some applications where we just wanted to sample a few values.

There are two distinct issues I see here:

  1. Creating integer types with a certain precision, offset, padding, byte order, and signed/unsigned. Here I'm mostly concerned about precision. The actual size of the integer may be in integral bytes. We could have a 12-bit or a 15-bit precision integer that takes up two bytes.
  2. Packing bit integers into bytes and unpacking them. Two 12-bit integers only need to take up three bytes for example.

My current sense is that the compiler, LLVM specifically, knows how to do item 1 above, specifically regarding precision.

Item 2 is interesting, helped by item 1, but needs more specification. I would be surprised if the compiler knows how to do this.

I want to take advantage of existing compiler features, particularly if they are in use by clang for C23 BitInt support.

@vtjnash
Copy link
Member

vtjnash commented Feb 19, 2023

Feel free to make a PR to change that to bits instead of bytes internally. Be aware the compiler is free to add padding, and will therefore likely round up to bytes for performance.

@oscardssmith
Copy link
Member

As I understand, the purpose of non byte aligned types is for dealing with very large arrays, so the scalar representation doesn't matter as much as what happens when you have 1 million of them in an array.

@mkitti
Copy link
Contributor Author

mkitti commented Feb 19, 2023

As I understand, the purpose of non byte aligned types is for dealing with very large arrays, so the scalar representation doesn't matter as much as what happens when you have 1 million of them in an array.

While I am interested in bitpacking applications, that's not really what N2763 is really focused on. Where it does talk about arrays it seems that it rounds up to the nearest byte, although this is a platform specific implementation detail.

_BitInt(N) types align with existing calling conventions. They have the same size and alignment as the
smallest basic type that can contain them. Types that are larger than __int64_t are conceptually treated
as struct of register size chunks. The number of chunks is the smallest number that can contain the type.
With the Clang implementation on Intel64 platforms, _BitInt types are bit-aligned to the next greatest
power-of-2 up to 64 bits: the bit alignment A is min(64, next power-of-2(>=N)). The size of these types is
the smallest multiple of the alignment greater than or equal to N. Formally, let M be the smallest integer
such that AM >= N. The size of these types for the purposes of layout and sizeof is the number of bits
aligned to this calculated alignment, A
M. This permits the use of these types in allocated arrays using
the common sizeof(Array)/sizeof(ElementType) pattern. The authors will discuss the ABI requirements
with the different ABI groups.

That is a _BitInt(7) would still have sizeof(_BitInt(7)) == 1 byte. An array of these would not save space versus an Int8 or UInt8 based on the bit alignment and layout.

Feel free to make a PR to change that to bits instead of bytes internally. Be aware the compiler is free to add padding, and will therefore likely round up to bytes for performance.

We may still want store and access the byte size because of what I mentioned above. What we may want to store is a separate bit alignment

@oscardssmith
Copy link
Member

wait so if this datatype always rounds up to the nearest byte in storage, why do we care about it at all?

@mkitti
Copy link
Contributor Author

mkitti commented Feb 20, 2023

  1. Rounding up to the nearest byte in storage may be platform specific behavior.

  2. From N2763, Introduction:

An integer type with an explicit bit-width allows programmers to portably express their intent in code
more clearly. Today, the programmer must pick an integer datatype of the next larger size and manually
perform mask and shifting operations. However, this is error prone because integer widths vary from
platform to platform and exact-width types from stdint.h are optional, it is not always obvious from
code inspection when a mask or a shift operation is incorrect, implicit conversion can lead to surprising
behavior, and the resulting code is harder for static analyzers and optimizers to reason about. By
allowing the programmer to express the specific bit-width they need for their problem domain directly
from the type system, the programmer can write less code that is more likely to be correct and easier
for tooling and humans to reason about.

@imciner2
Copy link
Contributor

From my reading of this thread, we are missing a definition for the scoping of where the arbitrary bit width integers reside, and at what level everything happens at.

My original thoughts for this type of feature was that the Julia compiler would just generate the appropriate LLVM i<bit> types for the requested size (e.g. for an int that is 7 bits it would generate LLVM IR with i7 as the variable type). This would push most of these semantics to the LLVM backend level, so things like padding would be determined by each target.

@vtjnash, what do you mean by

Be aware the compiler is free to add padding, and will therefore likely round up to bytes for performance.

Are you referring to the Julia compiler adding padding before (or during) its LLVM passes, or to padding that might be added in the LLVM backend during native code generation?

In reply to @oscardssmith's two comments:

As I understand, the purpose of non byte aligned types is for dealing with very large arrays, so the scalar representation doesn't matter as much as what happens when you have 1 million of them in an array.

wait so if this datatype always rounds up to the nearest byte in storage, why do we care about it at all?

CPUs aren't the only backends that can benefit from these arbitrary integers, and those other backends wouldn't want extra padding/alignment added to values. For instance, newer GPUs are starting to include Int4 support in some of their cores, so they want native 4-bit integers without padding to 8-bits (@maleadt might be able to clarify how that actually works, but I would imagine it might want an LLVM i4 as the type for those signals). Also, my original usecase for arbitrary integer sizes was actually FPGA HLS backends, where when I say I want a 5-bit integer it means I really only want 5 signal lines, so padding it to 8 lines (by doing an 8-bit integer) negates any performance/area/power savings of my design choice.

@mkitti
Copy link
Contributor Author

mkitti commented Feb 20, 2023

My original thoughts for this type of feature was that the Julia compiler would just generate the appropriate LLVM i<bit> types for the requested size (e.g. for an int that is 7 bits it would generate LLVM IR with i7 as the variable type). This would push most of these semantics to the LLVM backend level, so things like padding would be determined by each target.

Indeed, this is my thought as well, but there are still some questions about what a Julia array of these looks like.

Previously C23's _BitInt was called _ExtInt in LLVM. Here's an older blog post from Erich Keane of LLVM / clang:

https://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html

The patch as accepted and committed into LLVM solves most of the above problems by providing the _ExtInt class of types. These types translate directly into the corresponding LLVM-IR integer types. The _ExtInt keyword is a type-specifier (like int) that accepts a required integral constant expression parameter representing the number of bits to be used. More succinctly: _ExtInt(7) is a signed integer type using 7 bits.

@oscardssmith
Copy link
Member

The problem with pushing this to LLVM is that for the places this actually matters (e.g. representation within arrays) Julia needs to explicitly choose a layout for the type.

@mkitti
Copy link
Contributor Author

mkitti commented Feb 20, 2023

Ok, fine, let's talk about arrays. Zig is ahead of us in this department.

Zig has normal arrays which has a memory layout where each element is a multiple of a byte. Via Godbolt, we can see an array of i9 of six elements takes up 12 bytes: Godbolt link

Zig also has a PackedIntArray in its standard library. This is the packed configuration.
https://ziglang.org/documentation/master/std/#A;std:PackedIntArray

I would recommend pursuing a similar pattern. Vector{UInt12} should allocate 2 bytes per element. Bit packing should probably be left to packages that could utilize specialized SIMD routines for packing and unpacking.

The main question I have with implementation is do we modify the size field of jl_datatype_layout_t to now represent bits, or do we take another three bits from the padding field to indicate the bit alignment?

@nsajko nsajko added types and dispatch Types, subtyping and method dispatch feature Indicates new feature / enhancement requests labels Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

8 participants