Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FrozenMap unsound if Hash panics in such a way as to prevent HashMap rehashing to succeed. #48

Open
steffahn opened this issue Jul 16, 2023 · 19 comments

Comments

@steffahn
Copy link

steffahn commented Jul 16, 2023

It’s a bit tricky to reproduce, but here’s a way that seems to reliably do it, as of now:

use std::{
    collections::HashMap,
    panic::{catch_unwind, AssertUnwindSafe},
    sync::Mutex,
};

#[derive(PartialEq, Eq, Debug)]
struct H(u32);

impl std::hash::Hash for H {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        if PANIC_ON.lock().unwrap().as_ref() == Some(&self.0) {
            panic!();
        }
        0_u32.hash(state);
    }
}

static PANIC_ON: Mutex<Option<u32>> = Mutex::new(None);

fn main() {
    let mut map = HashMap::new();
    for i in 1..=28 {
        map.insert(H(i), String::new());
    }
    for i in 1..=27 {
        map.remove(&H(i));
    }
    *map.get_mut(&H(28)).unwrap() = String::from("Hello World!");

    let map = elsa::FrozenMap::from(map);

    let hello_world = map.get(&H(28)).unwrap();

    println!("exists: {hello_world}");

    let _ = catch_unwind(AssertUnwindSafe(|| {
        *PANIC_ON.lock().unwrap() = Some(28);
        map.insert(H(1), String::new());
    }));

    println!("gone: {hello_world}");
}

(run this code online on rustexplorer.com)

See here and here in the hashbrown source-code for more information.

@Manishearth
Copy link
Owner

Ugh, that's unfortunate. I guess FrozenMap would have to use an internal hashmap implementation that stores the hashes or something. (Basically, have CachedHash<T: Hash> {x: T, y: u64} that hashes to the cached u64)

Would it be possible to get hashbrown to assign a garbage hash to such entries instead? This would still be correct according to the garbage-in-garbage-out pattern of HashMap behavior when the user writes silly implementations of Hash and Eq.

It would be nice if we could rely on the invariant that insert-only maps never drop.

@Manishearth Manishearth changed the title HashMap::insert *can* drop values, hence FrozenMap is unsound FrozenMap unsound if Hash panics in such a way as to prevent HashMap reallocation from recomputing hashes. Jul 16, 2023
@Manishearth
Copy link
Owner

Another way to fix this would be to catch_unwind I guess; or potentially seed insert with a destructor bomb. Unsure what the perf implications of that are, and potentially people may not want the abort from the destructor bomb. Thoughts?

@steffahn
Copy link
Author

steffahn commented Jul 16, 2023

Would it be possible to get hashbrown to assign a garbage hash to such entries instead? This would still be correct according to the garbage-in-garbage-out pattern of HashMap behavior when the user writes silly implementations of Hash and Eq.

I agree that this is somewhat surprising behavior from HashMap, hence it’s be nice if it could be improved on that end.

Also – though I haven’t quite thought through / experimented with what happens if the same thing is forced (through a silly BuildHasher) on an IndexMap (which uses HashMap<usize> internally, storing its keys and values in a separate Vec) – I believe that IndexMap most likely does not have the same problem, as there should probably be no way a call to insert can end up in an element of that Vec getting dropped.

@Manishearth
Copy link
Owner

rust-lang/hashbrown#444

@steffahn
Copy link
Author

If only one could somehow safely convert &mut HashMap<K, V> into &mut HashMap<Newtype<K>, Newtype<V>>, then it’d pe possible to make sure that during FrozenMap::insert, the HashMap-access happens through the &mut HashMap<Newtype<K>, Newtype<V>>-view where Newtype implenents Drop in an aborting manner.

@Manishearth
Copy link
Owner

That may be sound for repr(transparent) I think

@Manishearth
Copy link
Owner

Manishearth commented Jul 16, 2023

Yeah, one of the problems is that we expose methods that give you access to the underlying hashmap. Unfortunate. Not strongly opposed to changing

@steffahn
Copy link
Author

steffahn commented Jul 16, 2023

That may be sound for repr(transparent) I think

AFAIK that isn’t the case:

For example, as far as I’m aware…

pub struct HashMap<K, V, S = DefaultHashBuilder, A: Allocator + Clone = Global> {
    pub(crate) hash_builder: S,
    pub(crate) table: RawTable<(K, V), A>,
}

being repr(Rust) means that e.g. it’s absolutely legal for the compiler to switch the order of the fields hash_builder and table between the type HashMap<K, V> and HashMap<Newtype<K>, Newtype<V>>, even if Newtype is repr(transparent).

@Manishearth
Copy link
Owner

Actually, one way to fix this would be a destructor guard type in insert that calls mem::forget(mem::take(self)) or something.

Unless there is a chance that other user code can run between the panicky hash and HashMap::insert() ending? Looks like that code basically gives up the moment this happens, so it ought to be fine, but I haven't reviewed it.


re: transparent: Yeah, so .... given that transparent() was introduced for interop with other languages there's a very very good chance that any field reordering will have to avoid it as well (because people are going to be interacting with repr(Rust) in interop). But yes, that's not settled. Might be a quick RFC to say "repr(transparent) means it behaves identically in all repr contexts, including when used as a field". But you're right that this isn't settled right now.

@steffahn
Copy link
Author

steffahn commented Jul 16, 2023

Actually, one way to fix this would be a destructor guard type in insert that calls mem::forget(mem::take(self)) or something.

Usually panics don’t get converted in memory leaks, right? But at least I agree this does sound like it would improve soundness.

Edit: Actually… no. The values in the map get dropped while unwinding, before HashMap::insert finishes.

@Manishearth
Copy link
Owner

Oh, wait, yeah, that wouldn't do anything. Never mind.

@steffahn
Copy link
Author

By the way, your new title was inaccurate, as the problem only possibly arises on the non-reallocating rehashing ;-)

This also makes this issue harder to run into. E.g. the use-case of starting up with an empty FrozenMap and only ever adding entries to it can never run into the issue; re-hashing without re-allocating only possibly happens if there are DELETED entries in the map.

@steffahn steffahn changed the title FrozenMap unsound if Hash panics in such a way as to prevent HashMap reallocation from recomputing hashes. FrozenMap unsound if Hash panics in such a way as to prevent HashMap rehashing to succeed. Jul 17, 2023
@steffahn
Copy link
Author

steffahn commented Jul 17, 2023

Another way to fix this would be to catch_unwind I guess; or potentially seed insert with a destructor bomb. Unsure what the perf implications of that are, and potentially people may not want the abort from the destructor bomb. Thoughts?

It should probably be possible to create self-referencing uses of FrozenMap where a subsequent destructor call within the insert could already unsoundly access data that was dropped in a previous one… or in itself, for that matter.

That means that at the time of catching or drop-bombing the panic leaving the HashMap::insert, it’s too late. (It might still be an improvement to avoid more of the possible unintentionally unsound use-cases.)

Edit: Here’s a demo:

/*
[dependencies]
elsa = "1.8.1"
thread_local = "1.1.7"
*/

use std::{
    collections::HashMap,
    sync::{Mutex, OnceLock},
};

use elsa::FrozenMap;
use thread_local::ThreadLocal;

#[derive(PartialEq, Eq, Debug)]
struct H(u32);

impl std::hash::Hash for H {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        if PANIC_ON.lock().unwrap().as_ref() == Some(&self.0) {
            panic!();
        }
        0_u32.hash(state);
    }
}

struct V(Mutex<Option<&'static str>>);

static PANIC_ON: Mutex<Option<u32>> = Mutex::new(None);

fn main() {
    let mut map = HashMap::new();
    for i in 1..=28 {
        map.insert(H(i), Box::new((String::new(), None)));
    }
    for i in 1..=27 {
        map.remove(&H(i));
    }
    *map.get_mut(&H(28)).unwrap() = Box::new((
        String::from("Hello World!"),
        Some(V(Mutex::new(None::<&str>))),
    ));

    let mut map = FrozenMap::from(map);

    static MAP: OnceLock<ThreadLocal<FrozenMap<H, Box<(String, Option<V>)>>>> = OnceLock::new();
    let map: &'static FrozenMap<_, _> = MAP
        .get_or_init(ThreadLocal::new)
        .get_or(|| std::mem::take(&mut map));

    let last = map.get(&H(28)).unwrap();
    let hello_world = &last.0[..];
    *last.1.as_ref().unwrap().0.lock().unwrap() = Some(hello_world);

    impl Drop for V {
        fn drop(&mut self) {
            let hello_world = self.0.lock().unwrap().unwrap();
            println!("gone: {hello_world}");
        }
    }

    println!("exists: {hello_world}");

    *PANIC_ON.lock().unwrap() = Some(28);
    map.insert(H(1), Box::new((String::new(), None)));
}

@Manishearth
Copy link
Owner

By the way, your new title was inaccurate, as the problem only possibly arises on the non-reallocating rehashing ;-)

Ah, so it's basically only when you construct a FrozenMap out of a HashMap that has experienced deletion. Hm.

@steffahn
Copy link
Author

Right, or if you modify it using the AsMut to do the deletions.

@Manishearth
Copy link
Owner

heh, so my preferred fix (cache the hash) has the side effect of removing all of the APIs that cause this problem in the first place, nice.

(But i'd really prefer to avoid removing those APIs)

@Manishearth
Copy link
Owner

If we hide away the internal implementation, we can harden it with more UnsafeCells if we want so that #50 goes from being "probably not a problem" to "definitely not a problem"

@aminya
Copy link
Contributor

aminya commented Aug 10, 2023

Which data structures are affected by this?

@Manishearth
Copy link
Owner

Anything with a hashmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants