`Box<T, A: BuildAlloc>` #12

scottjmaddox · 2019-05-04T00:17:06Z

The current push to associate an allocator to boxes (among other types) is targeting something like Box<T, A: Alloc>. This limits the design space of custom allocators, though, because it forces one of two approaches for accessing the Alloc:

Use a single static global state (or pointer or reference to the state).
Embed a pointer or reference to the state inside every box.

There are many kinds of allocators that these limitations rule out. For example, perhaps you want to have multiple independent "Arena" or "Region" allocators that each allocate from a dedicated slab of memory. Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1. So you're forced to use approach 2, which is extremely inefficient, requiring a whole extra pointer to be stored in each Box.

One approach for avoiding this limitation would be something like the following:

pub struct Box<T: ?Sized, A: BuildAlloc>(Unique<T>, A);

pub trait BuildAlloc {
    type Alloc: Alloc;

    /// Build an `Alloc` given a `ptr` pointer to memory with the given
    /// `layout` that was allocated by the associated allocator.
    ///
    /// # Safety
    ///
    /// This function is unsafe because undefined behavior can result
    /// if the caller does not ensure all of the following:
    ///
    /// * `ptr` must denote a block of memory currently allocated via
    ///   this allocator,
    ///
    /// * `layout` must *fit* that block of memory,
    ///
    /// * In addition to fitting the block of memory `layout`, the
    ///   alignment of the `layout` must match the alignment used
    ///   to allocate that block of memory.
    unsafe fn build_alloc(&mut self, ptr: NonNull<u8>, layout: Layout) -> Self::Alloc;
}

Note that BuildAlloc provides the method build_alloc, which accepts &mut self and a pointer and layout for memory that was allocated by the associated allocator, and returns an Alloc implementation. This gives many options for locating the allocator's state and generating an Alloc. Approach 1 (global state) is of course still possible, and so is approach 2 (embed a pointer in every box). But this also makes approach 3 possible:

Calculate a pointer to the allocator state from a pointer to heap memory allocated by that allocator.

This makes many more clever and efficient kinds of allocators possible, such as an allocator that allocates all 8-byte allocations from aligned 1 MiB blocks, and stashes a pointer to its own state at the beginning of each of those 1 MiB blocks. Such an allocator can implement BuildAlloc to check the size of the allocated memory and if it is 8-bytes then mask off the lowest 10 bits from ptr to get a (1 MiB aligned) pointer to the allocator's state, from which it can construct an Alloc.

To clarify, BuildAlloc makes it possible to calculate a pointer to the allocator's state from the Box itself:

impl<T: ?Sized, A: BuildAlloc>  Box<T, A> {
    fn get_alloc(&self) -> A::Alloc {
        self.1.build_alloc(
            NonNull::from(self.0).cast::<u8>(),
            Layout::for_value(self.as_ref())
        )
    }
}

Since the Box provides a pointer to memory allocated by the allocator, the allocator can use its control of the pointers it returns and the memory around those pointers to efficiently locate its own state and generate an Alloc. This makes many more kinds of clever and efficient allocators possible.

Relevant comments from PR #58457: Associate an allocator to boxes:

Associate an allocator to boxes rust#58457 (comment)
Associate an allocator to boxes rust#58457 (comment)
Associate an allocator to boxes rust#58457 (comment)
Associate an allocator to boxes rust#58457 (comment)
- this comment has the most complete usage example, although it defines AllocFrom (old name of BuildAlloc) to have a method for each container type, when really we just need a single method that accepts a pointer to heap memory

Edited 2019-05-04: Rename AllocFrom to BuildAlloc for consistency with BuildHasher.
Edited 2019-05-07: Adjust build_alloc signature to match dealloc.

The text was updated successfully, but these errors were encountered:

TimDiekmann · 2019-05-04T00:30:39Z

So if I understand correctly, AllocFrom is related to Alloc just like BuildHasher is related to Hasher?

scottjmaddox · 2019-05-04T00:39:54Z

So if I understand correctly, AllocFrom is related to Alloc just like BuildHasher is related to Hasher?

Almost exactly. The method signature is different, though, because the whole point is to give the allocator access to a pointer to its own heap memory from which to generate its Alloc implementation.

Calling it BuildAlloc for consistency would be perfectly reasonable, though.

glandium · 2019-05-04T09:17:04Z

I do have this very use-case, actually. And it didn't strike me that it could be solved this way, and I must say I like it. But there are details to figure out. Like... how do you get the AllocFrom value you attach to the Box from the Alloc.alloc? Because Box::new_in would have to take an Alloc as input, and the AllocFrom would need to be created somehow.

(Note that last iteration in rust-lang/rust#58457 doesn't have bounds on the Box type itself at all. They're all at the implementation level. Which actually gives some additional flexibility)

SimonSapin · 2019-05-04T10:05:25Z

Aren’t bounds on the type definition required to allow impl Drop to have the same bounds?

TimDiekmann · 2019-05-04T11:00:03Z

Calling it BuildAlloc for consistency would be perfectly reasonable, though.

This is exactly what I had in mind. Do you want to change the issue and the title to use BuildAlloc instead of AllocFrom?

scottjmaddox · 2019-05-04T22:47:25Z

how do you get the AllocFrom value you attach to the Box from the Alloc.alloc? Because Box::new_in would have to take an Alloc as input, and the AllocFrom would need to be created somehow.

Yes, the type signature for new_in would need to be something like this:

impl<T: ?Sized, A: BuildAlloc>  Box<T, A> {
    pub fn new_in(x: T, a: A::Alloc) -> Box<T, A>;
}

And, as you point out, you need some way to create a BuildAlloc from an Alloc. So a new method on Alloc will need to be added, something like get_build_alloc, and an associated type, type BuildAlloc: BuildAlloc.

I wasn't sure if such cyclic associated types are allowed, but it looks like they are.

Edit: fix new_in type sig (remove &self, add x: T).

glandium · 2019-05-04T22:58:08Z

Aren’t bounds on the type definition required to allow impl Drop to have the same bounds?

There's an interesting problem there... Drop currently does nothing. It's actually implemented by the compiler, which then goes on generating calls to box_free on its own.

glandium · 2019-05-04T23:01:18Z

So a new method on Alloc will need to be added, something like get_build_alloc

Are there use cases where different allocations could have a different BuildAlloc that would point back to the same Alloc? Those would be left out with this approach.

scottjmaddox · 2019-05-05T07:31:40Z

Are there use cases where different allocations could have a different BuildAlloc that would point back to the same Alloc? Those would be left out with this approach.

None that I can think of. That doesn't mean they don't exist, though. Allocators traditionally use a single global free function that accepts a pointer to the memory to be freed and calculates a pointer to the allocator from that (or from global static state). So all of the history of allocator development has assumed the equivalent of a single BuildAlloc per allocator.

Ultimately, you can put whatever logic you want in BuildAlloc. So it's more a question of can what you want to do be done more efficiently with multiple BuildAllocs per Alloc. And the answer is probably "maybe". But if that's the only way to support BuildAlloc with the current type system, then it's still a huge improvement over the limitations inherent to Box<T, A: Alloc>. Don't let perfect be the enemy of good, and whatnot.

TimDiekmann · 2019-05-05T19:02:19Z

How could BuildAlloc play together with #9?

gnzlbg · 2019-05-05T20:58:31Z

Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1.

I'm not sure why this is the case. If you put whatever you need in a static, or in a thread local, you can just have a ZST as an allocator, whose methods access it.

scott-maddox · 2019-05-06T21:26:48Z

Since you want to have multiple independent allocators, you cannot (efficiently) use approach 1.

I'm not sure why this is the case. If you put whatever you need in a static, or in a thread local, you can just have a ZST as an allocator, whose methods access it.

But they're not really multiple independent allocators then, are they? They're one allocator that pretends to be multiple allocators. And there's an overhead associated with that. If the allocators are truly separate, then you could, for example, safely make them !Send and !Sync and do away with all thread synchronization overhead (and avoid thread local storage overhead).

gnzlbg · 2019-05-07T07:25:49Z

But they're not really multiple independent allocators then, are they?

This is what I meant:

struct A; // my allocator type
fn foo() {
    static A0: A = A; // allocator 0
    static A1: A = A; // allocator 1
    #[derive(alloc_handle(A0))] struct AH0; // ZST handle to 0
    // for exposition only, the derive expands to this:
    impl Alloc for AH0 {
        fn alloc(...) -> ... { A0.alloc(...) }
        // ...
    }
    #[derive(alloc_handle(A1))] struct AH1; // ZST handle to 1
    let b0: Box<T, AH0>; // box using A0
    let b0: Box<T, AH0>; // another box using A0
    let b1: Box<T, AH1>;  // box using A1
    // ^^ all boxes are pointer sized cause the handle is a ZST
}

Whether you can have multiple handles to the same allocator or not, depends on your allocator impl. Whether you want to have one or multiple allocators, with different handles, is kind of up to you.

If you can put your allocators in statics, or thread locals, then you can have zero-sized handles to them. If you cannot, then your handle needs to have some state to find the allocator, but I don't see any way around that.

SimonSapin · 2019-05-07T14:16:44Z

@gnzlbg This uses different Rust types for different handles, so there’s a bounded number of such types determined at compile-time.

This is not what people mean by “multiple instances” of an allocator. For example:

fn library_entry_point(some_input: Foo) {
    let handle = ArenaAllocator::new();
    // …
    let b = Box::new_in(something, handle.clone());
    let v = Vec::new_in(handle.clone());
    // …
}

A program that uses this library could spawn a large number of threads that each have their own separate arena. At compile time you don’t how many threads there’s gonna be, so you can have a Rust type for each of them.

For v to be able to tell which of these arenas it belongs to, handle cannot be zero-size. Or, conversely, if the handle is zero-size then the allocator must have global state that is shared with all handles of the same type.

gnzlbg · 2019-05-07T15:15:05Z

@SimonSapin makes sense, I was misunderstanding which problem this solves, BuildAlloc feels useful.

All Alloc methods that need to identify an allocation take a pointer to an allocation (e.g. a NonNull<u8>) and its Layout. Why is BuildAlloc::build_alloc:

fn build_alloc<T: ?Sized>(&self, heap_ptr: *const T) -> Self::Alloc;
// nitpick: heap_ptr does not necessarily point to the heap

generic over T instead of

unsafe fn build_alloc(&self, ptr: *const u8, layout: Layout) -> Self::Alloc;

? How can the proposed API be safe ?

The OP mentions:

Such an allocator can implement BuildAlloc to check the size of the allocated memory using mem::size_of()

but Layout::size() would do the same, and just because somebody passes build_alloc a pointer to T does not mean that the allocation was performed with a Layout that matches the size_of::<T>, or even with the same allocator for that matter. Assuming that would be unsound, so this API probably also needs to be unsafe.

Counter example, Vec<T> could pass a ptr: *const T to build_alloc, but mem::size_of::<T> has likely nothing to do with the Layout that Vec<T> used to perform its allocation. For this API to be used correctly there, Vec<T> would need to construct a new type internally, that has the same size as the dynamic allocation it refers to. The only way I can imagine this could be done, is with a gigant match expression with isize::MAX match arms, each of which need to instantiate a different type. With today's toolchain this is impossible.

scottjmaddox · 2019-05-07T20:31:16Z

@gnzlbg You're right, it should be unsafe fn build_alloc(&self, ptr: NonNull<u8>, layout: Layout) -> Self::Alloc;, in order to match Dealloc. I'll update the OP.

The reason I had proposed making the API safe is because the implementer must agree to ensure it is safe. For example, the implementer must not de-reference the heap_ptr (unless it somehow knows it can do so safely). However, we need to pass the Layout separately to support all possible allocations, and thus the interface has the same potential for misuse as Dealloc and thus must be marked unsafe.

scottjmaddox · 2019-05-07T20:39:21Z

I also can't think of any reason build_alloc shouldn't take &mut self. You are guaranteed to have a &mut self during dealloc and realloc. If you are cloning a &Box, you are guaranteed to be able to clone the BuildAlloc which you could then get a &mut self to.

There might be some use cases where this could come in handy; maybe tracking the number of reallocations within the BuildAlloc?

So I'll change that as well.

TimDiekmann · 2019-10-11T11:47:08Z

@scottjmaddox You may take a look at https://github.com/TimDiekmann/alloc-wg? I have used BuildAlloc as parameter for Box and RawVec. Is this the way you had in mind?
ZST allocators like Global and System implements both, the builder and the allocator.

Ekleog · 2020-05-12T22:35:16Z

I've just come across this issue. As it suggests adding a new function that makes it possible to have better control over how exactly the allocation is performed, would it be possible to take advantage of this method addition to make the method fallible, so that it's also possible to control how to behave when the allocation fails? (One example use case would be using an arena allocator for some untrusted data coming from the network, and wanting to abort just this specific client's instance if something OOMs here)

SimonSapin · 2020-05-12T22:48:31Z

Fallibility is a separate "axis" from using a provided allocator v.s. the global one. All four combinations are potentially useful.

Ekleog · 2020-05-12T23:36:18Z

They are, but only the version that takes both an allocator and is fallible is required to be implement all the other ones: it's easy to do Box::new_in(t, alloc).unwrap_oom() or Box::new_in(t, std::alloc::GLOBAL_ALLOCATOR) (assuming here a function like unwrap_oom that'd do the proper handle_alloc_error on Result<T, AllocErr>, and a global variable that allows read-only access to the global allocator).

Now, if you want to make all four versions it's great too :) what I want to point out is just that if you plan on adding a single function for the time being, then it'd probably be better if it were the fallible one. (Also, I'm not sure we want to encourage the whole allocating ecosystem to provide 4 functions for each use case by setting that example in libstd)

SimonSapin · 2020-05-13T04:49:11Z

I should have mentioned the RFC https://rust-lang.github.io/rfcs/2116-alloc-me-maybe.html and its tracking issue rust-lang/rust#48043 for try_reserve on collections both have discussion of what the APIs should look like, for both this minimal set of methods and a future, more comprehensive fallible methods (try_push, try_extend, …)

Yes Box::try_new_in is the most basic/powerful combination that others can be defined on top of, and trying to reduce multiplicative explosion of the number of methods is a good point, but exposing it still involves making API design decisions.

TimDiekmann · 2020-08-19T10:15:48Z

I like to resolve the issues which suggest a change to AllocRef before I start implementing it for collections. I have played around with BuildAllocRef a lot some time ago and I liked the concept in general. However, for most use cases, it's probably overkill. On the other hand, the ordinary Rust user will not implement many allocators himself, so this should not be such a big problem.

To go further with this proposal, I think we need to work out an interface where each signature should be fully defined. Currently, it's a broad idea, of where to build AllocRef and where to pass BuildAllocRef. For testing purposes we either could use a branch on the alloc-wg crate or I try to keep things updated upstream.

I'll throw in some points here:

We probably need an associative type on both AllocRef and BuildAllocRef.
For collections we want the generic parameter to be A: AllocRef instead of B: BuildAllocRef, as type inference would not work otherwise (see Type annotations required everywhere TimDiekmann/alloc-wg#5).
What do we store inside containers (BuildAllocRef vs AllocRef)? What to we want to pass to (e.g.) Vec::new_in?
for BuildAllocRef:
- Is it efficient enough to build AllocRef every time we need it?
for AllocRef:
- How we get a BuildAllocRef from AllocRef? I think this is needed for cloning for example.

TimDiekmann · 2020-08-19T17:35:22Z

I have tested around a bit again with BuildAllocRef using more or less exactly the code from the OP. You can see the code in the buildalloc-test branch at alloc-wg. You have to clone it manually and create a doc to view it, as I haven't uploaded it somewhere yet.

Some notes on the implementation:

For alloc, there is no pointer. Even if you have a Layout available (the one you want to use for alloc), it's feeling unintuitive. The idea was, that we pass NonNull::dangling(), but a new method for creating an AllocRef may be useful. Maybe build_alloc_ref and get_alloc_ref, or build_alloc_ref and create_alloc_ref, or new_alloc_ref and build_alloc_ref?
I haven't tested around implementing BuildAllocRef for other types than ZSTs yet.
Every test passed without any changes
No modification to AllocRef was needed, which allows us to play around with it upstream easily!

Amanieu · 2020-08-21T14:55:31Z

I think we can defer BuildAlloc until later in a forward-compatible way. For now just define Box as Box<T, A: AllocRef>. Later we can extend it like this:

// All AllocRef implement BuildAllocRef
trait AllocRef: BuildAllocRef {}

// We provide a blanket implementation of BuildAllocRef for all existing AllocRef
impl<T: AllocRef> BuildAllocRef for T {
    // Just returns Self
}

// Relax the A bound to BuildAllocRef. This should be a backwards-compatible change.
struct Box<T, A: BuildAllocRef> {}

TimDiekmann · 2020-08-21T15:10:56Z

The blanket implementation is a great idea! As no modifications to AllocRef are needed as far as I have tested, this would work I guess.

Amanieu · 2020-09-06T23:20:40Z

One thing that I feel hasn't been fully addressed is how you can construct a Box<T, BuildAllocRef> where the BuildAllocRef is a ZST but the AllocRef isn't. This seems important as this is the main use case for having BuildAllocRef at all.

TimDiekmann added A-Alloc Trait Issues regarding the Alloc trait Discussion Issues containing a specific idea, which needs to be discussed in the WG labels May 4, 2019

scottjmaddox changed the title ~~Box<T, A: AllocFrom>~~ Box<T, A: BuildAlloc> May 4, 2019

gnzlbg mentioned this issue May 8, 2019

Separate dealloc from Alloc into other trait #9

Closed

TimDiekmann mentioned this issue Oct 1, 2019

Attempt to combine several proposals #28

Closed

TimDiekmann added Proposal and removed Discussion Issues containing a specific idea, which needs to be discussed in the WG labels Oct 17, 2019

TimDiekmann mentioned this issue Dec 20, 2019

Proof Of Concept Crate: Simplistic Bump Allocator rust-gamedev/wg#71

Open

TimDiekmann mentioned this issue Sep 6, 2020

Change AllocRef to take &self #55

Closed

TimDiekmann added the disposition-postpone label Sep 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Box<T, A: BuildAlloc>` #12

`Box<T, A: BuildAlloc>` #12

scottjmaddox commented May 4, 2019 •

edited

Loading

TimDiekmann commented May 4, 2019

scottjmaddox commented May 4, 2019 •

edited

Loading

glandium commented May 4, 2019

SimonSapin commented May 4, 2019

TimDiekmann commented May 4, 2019

scottjmaddox commented May 4, 2019 •

edited

Loading

glandium commented May 4, 2019

glandium commented May 4, 2019

scottjmaddox commented May 5, 2019

TimDiekmann commented May 5, 2019

gnzlbg commented May 5, 2019

scott-maddox commented May 6, 2019

gnzlbg commented May 7, 2019 •

edited

Loading

SimonSapin commented May 7, 2019

gnzlbg commented May 7, 2019 •

edited

Loading

scottjmaddox commented May 7, 2019

scottjmaddox commented May 7, 2019

TimDiekmann commented Oct 11, 2019

Ekleog commented May 12, 2020

SimonSapin commented May 12, 2020

Ekleog commented May 12, 2020

SimonSapin commented May 13, 2020

TimDiekmann commented Aug 19, 2020 •

edited

Loading

TimDiekmann commented Aug 19, 2020 •

edited

Loading

Amanieu commented Aug 21, 2020

TimDiekmann commented Aug 21, 2020

Amanieu commented Sep 6, 2020

Box<T, A: BuildAlloc> #12

Box<T, A: BuildAlloc> #12

Comments

scottjmaddox commented May 4, 2019 • edited Loading

TimDiekmann commented May 4, 2019

scottjmaddox commented May 4, 2019 • edited Loading

glandium commented May 4, 2019

SimonSapin commented May 4, 2019

TimDiekmann commented May 4, 2019

scottjmaddox commented May 4, 2019 • edited Loading

glandium commented May 4, 2019

glandium commented May 4, 2019

scottjmaddox commented May 5, 2019

TimDiekmann commented May 5, 2019

gnzlbg commented May 5, 2019

scott-maddox commented May 6, 2019

gnzlbg commented May 7, 2019 • edited Loading

SimonSapin commented May 7, 2019

gnzlbg commented May 7, 2019 • edited Loading

scottjmaddox commented May 7, 2019

scottjmaddox commented May 7, 2019

TimDiekmann commented Oct 11, 2019

Ekleog commented May 12, 2020

SimonSapin commented May 12, 2020

Ekleog commented May 12, 2020

SimonSapin commented May 13, 2020

TimDiekmann commented Aug 19, 2020 • edited Loading

TimDiekmann commented Aug 19, 2020 • edited Loading

Amanieu commented Aug 21, 2020

TimDiekmann commented Aug 21, 2020

Amanieu commented Sep 6, 2020

`Box<T, A: BuildAlloc>` #12

`Box<T, A: BuildAlloc>` #12

scottjmaddox commented May 4, 2019 •

edited

Loading

scottjmaddox commented May 4, 2019 •

edited

Loading

scottjmaddox commented May 4, 2019 •

edited

Loading

gnzlbg commented May 7, 2019 •

edited

Loading

gnzlbg commented May 7, 2019 •

edited

Loading

TimDiekmann commented Aug 19, 2020 •

edited

Loading

TimDiekmann commented Aug 19, 2020 •

edited

Loading