Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: switch jaq-json hasher to foldhash #237

Merged
merged 1 commit into from
Nov 22, 2024
Merged

Conversation

tranzystorekk
Copy link
Contributor

This change replaces the hasher used in jaq-json with foldhash.

My personal motivation is that foldhash seems to be entering the ecosystem as an upcoming state-of-the-art, but there are other considerations:

  • The original motivation for ahash way back in 0.5.0 was 6114f30 and now foldhash has zero runtime dependencies, which allows dropping zerocopy from Cargo.lock
  • Benchmarks show that foldhash is at worst comparable to ahash, so no apparent perf harm in the replacement

I am aware that this change might feel insubstantial or arbitrary, so please treat it as a loose suggestion and feel free to close immediately if uninterested :)

@01mf02
Copy link
Owner

01mf02 commented Nov 19, 2024

Hi @tranzystorekk, thanks for your PR!

I ran a quick benchmark:

$ ./bench.sh ./jaq-ahash ./jaq-foldhash 
{"name": "empty", "n": 512, "time": {"./jaq-ahash": [0.43], "./jaq-foldhash": [0.39]}}
{"name": "bf-fib", "n": 13, "time": {"./jaq-ahash": [0.45], "./jaq-foldhash": [0.45]}}
{"name": "defs", "n": 100000, "time": {"./jaq-ahash": [0.06], "./jaq-foldhash": [0.06]}}
{"name": "upto", "n": 8192, "time": {"./jaq-ahash": [0.00, 0.00, 0.00], "./jaq-foldhash": [0.00, 0.00, 0.00]}}
{"name": "reduce-update", "n": 16384, "time": {"./jaq-ahash": [0.01, 0.01, 0.01], "./jaq-foldhash": [0.01, 0.01, 0.01]}}
{"name": "reverse", "n": 1048576, "time": {"./jaq-ahash": [0.05, 0.04, 0.04], "./jaq-foldhash": [0.05, 0.04, 0.04]}}
{"name": "sort", "n": 1048576, "time": {"./jaq-ahash": [0.13, 0.12, 0.12], "./jaq-foldhash": [0.12, 0.13, 0.12]}}
{"name": "group-by", "n": 1048576, "time": {"./jaq-ahash": [0.52, 0.55, 0.55], "./jaq-foldhash": [0.55, 0.54, 0.54]}}
{"name": "min-max", "n": 1048576, "time": {"./jaq-ahash": [0.21, 0.21, 0.22], "./jaq-foldhash": [0.21, 0.22, 0.22]}}
{"name": "add", "n": 1048576, "time": {"./jaq-ahash": [0.48, 0.48, 0.50], "./jaq-foldhash": [0.47, 0.49, 0.48]}}
{"name": "kv", "n": 131072, "time": {"./jaq-ahash": [0.12, 0.13, 0.12], "./jaq-foldhash": [0.12, 0.13, 0.12]}}
{"name": "kv-update", "n": 131072, "time": {"./jaq-ahash": [0.13, 0.14, 0.13], "./jaq-foldhash": [0.14, 0.14, 0.13]}}
{"name": "kv-entries", "n": 131072, "time": {"./jaq-ahash": [0.60, 0.60, 0.60], "./jaq-foldhash": [0.58, 0.59, 0.58]}}
{"name": "ex-implode", "n": 1048576, "time": {"./jaq-ahash": [0.53, 0.51, 0.51], "./jaq-foldhash": [0.53, 0.54, 0.53]}}
{"name": "reduce", "n": 1048576, "time": {"./jaq-ahash": [0.77, 0.78, 0.78], "./jaq-foldhash": [0.78, 0.80, 0.78]}}
{"name": "try-catch", "n": 1048576, "time": {"./jaq-ahash": [0.29, 0.29, 0.29], "./jaq-foldhash": [0.29, 0.30, 0.31]}}
{"name": "repeat", "n": 1048576, "time": {"./jaq-ahash": [0.15, 0.16, 0.15], "./jaq-foldhash": [0.16, 0.15, 0.15]}}
{"name": "from", "n": 1048576, "time": {"./jaq-ahash": [0.33, 0.31, 0.31], "./jaq-foldhash": [0.31, 0.32, 0.31]}}
{"name": "last", "n": 1048576, "time": {"./jaq-ahash": [0.04, 0.04, 0.04], "./jaq-foldhash": [0.04, 0.04, 0.04]}}
{"name": "pyramid", "n": 524288, "time": {"./jaq-ahash": [0.37, 0.34, 0.35], "./jaq-foldhash": [0.34, 0.36, 0.35]}}
{"name": "tree-contains", "n": 23, "time": {"./jaq-ahash": [0.07, 0.07, 0.07], "./jaq-foldhash": [0.07, 0.07, 0.07]}}
{"name": "tree-flatten", "n": 17, "time": {"./jaq-ahash": [0.80, 0.79, 0.79], "./jaq-foldhash": [0.79, 0.77, 0.79]}}
{"name": "tree-update", "n": 17, "time": {"./jaq-ahash": [0.71, 0.71, 0.77], "./jaq-foldhash": [0.71, 0.71, 0.70]}}
{"name": "tree-paths", "n": 17, "time": {"./jaq-ahash": [0.43, 0.45, 0.44], "./jaq-foldhash": [0.47, 0.48, 0.46]}}
{"name": "to-fromjson", "n": 65536, "time": {"./jaq-ahash": [0.04, 0.04, 0.04], "./jaq-foldhash": [0.04, 0.04, 0.04]}}
{"name": "ack", "n": 7, "time": {"./jaq-ahash": [0.61, 0.65, 0.63], "./jaq-foldhash": [0.59, 0.58, 0.56]}}
{"name": "range-prop", "n": 128, "time": {"./jaq-ahash": [0.36, 0.37, 0.37], "./jaq-foldhash": [0.35, 0.36, 0.35]}}
{"name": "cumsum", "n": 1048576, "time": {"./jaq-ahash": [0.28, 0.29, 0.28], "./jaq-foldhash": [0.29, 0.30, 0.29]}}
{"name": "cumsum-xy", "n": 1048576, "time": {"./jaq-ahash": [0.48, 0.50, 0.49], "./jaq-foldhash": [0.49, 0.50, 0.50]}}

The most relevant ones are probably "kv*", and we see very little difference there.

I'm a bit divided about this change: Having fewer dependencies is always nice. :) However, the given hasher shows up in the public API of jaq-json, which means that it cannot be updated without breaking the API.

As evidence for the Rust ecosystem shifting to foldhash, I see hashbrown migrating to it. Is there any other evidence that supports your observation?
I also found: https://thaliaarchi.github.io/notes/pl/langs/rust/foldhash_vs_rustc-hash.html
(Perhaps @thaliaarchi would also like to comment on this topic?)

I also did not test foldhash without the std feature, but this would need to be done too, because jaq-json should not depend on std by default.

Just throwing in an idea: While looking at hashbrown right now, I saw that it has a feature default-hasher. This might be also a nice idea for jaq-json --- making jaq_json::Val generic over its hasher and providing a default one only if default-hasher is enabled. I would gladly accept a PR for such a change, because it would even more effectively decrease the number of dependencies. ;)

I tend to be more conservative with jaq changing dependencies nowadays. Intuitively, I would not change the hasher right now, but let me think a bit about it.

@thaliaarchi
Copy link

Since both hashbrown and rustc have switched to foldhash, I'm quite convinced that it's suitable for general-purpose, but fast hashing. As noted in my comparison that you found, rustc uses a variant of foldhash which simplifies it for the data typical to a compiler and it even improves upon the old FxHash. I've switched to foldhash in my projects.

As for your concern about dependencies, I think the default-hasher default feature you describe would be a good solution. If library users care about the implementation, they can disable the default hasher and supply their own. That would be a good solution for users who want a stable hasher. I suspect the churn would be significantly less than you expect.

@01mf02 01mf02 merged commit 79fb1bc into 01mf02:main Nov 22, 2024
1 check passed
@01mf02
Copy link
Owner

01mf02 commented Nov 22, 2024

Thanks for your input, @thaliaarchi. I have thus decided to use foldhash. API stability is not such a big issue for this change after all, because it only affects jaq-json, but neither jaq-core nor jaq-std. Thanks also again to @tranzystorekk for your PR!

@tranzystorekk tranzystorekk deleted the foldhash branch November 22, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants