-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a deterministic constructor for RandomState
#523
Comments
If you already control all those things then can't you inject the same seeds in getrandom syscalls, instantiate the hashmaps in the same order etc? |
Oh, I gave the wrong impression here. Loom and Shuttle focus on controlling concurrency by providing drop-in replacements for synchronization primitives. For example, instead of There's a different world where this functionality could be provided with a more heavyweight approach like binary rewriting or replacing syscalls, but that comes with different trade-offs. |
In that case, why not provide a drop-in |
That only works when I own all the relevant code. The type will be |
I'm a little confused then. From your explanation of the |
By "the same" I assume you mean the possibility to run into a type mismatch? Well, in theory yes, but practically no. I've not encountered synchronization primitives in the API of a 3p library. For |
The most comprehensive way to do this might be to compile your own Rust standard library which seeds hash maps deterministically. Then you won't have to hunt down hashmaps in dependencies because all are covered already. |
How is the API relevant? To get the determinism you would also have to replace hash maps, rand and synchronization in their implementations, not just the public interface, no? |
Proposal
Problem statement
The default hasher for
std::collections::{HashMap, HashSet}
(via theS
type parameter) isRandomState
, whose constructor generates fresh random keys for every instance. That's a good default for security, but problematic when hash collections must behave deterministically. Although in principle it is possible to use other deterministic hashers, there are use cases that effectively requireRandomState
. The problem is that today there is no way to deterministically constructRandomState
values.Motivating examples or use cases
My main interest is testing complex concurrent systems using tools like Loom and Shuttle. In these use cases it is crucial for the programs under test to be deterministic, or, rather, that all nondeterminism (e.g., scheduling choices,
rand
calls, etc.) is controlled by the tool. Otherwise test failures are not reproducible, which is frustrating and significantly diminishes the utility these testing tools can offer.One unexpected source of nondeterminism are the standard library hash collections with their default hasher
RandomState
. This manifests in different iteration orders in every execution. In 1P code it is possible to replace all hash collections with a deterministic hasher (potentially using a feature flag for testing, to keep the default behavior in production). However, since this results in a different type, we run into a type incompatibility with 3P libraries that have hash collections withRandomState
hardcoded in their public interface. Sadly that's a reality, see for example aws_sdk_dynamodb (and this related discussion in the smithy-rs code generator).Solution sketch
The proposal is to introduce a new deterministic constructor for
RandomState
. My concrete proposal in rust-lang/rust#135578 is to introduce a parameter-less constructorand keep the internal representation (currently a pair of
u64
s) private. However, if experts think that the internal representation can be exposed, that's fine too.Alternatives
Today there is no safe way to construct deterministic
RandomState
values. Folks might take the shameful path of transmuting(u64, u64)
intoRandomState
.It would be even nicer if hash collections (and possibly other parts of the standard library) could just be made deterministic via a compiler option, so that one can just write
HashMap::new()
instead of having to writeHashMap::with_hasher(RandomState::deterministic())
with the solution described above. I'm happy to entertain a discussion in that direction, but I imagine that this will be a much larger effort.The text was updated successfully, but these errors were encountered: