-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array::zip
in combination with array::map
optimises very poorly
#103555
Comments
I suspect the issue here is that
Perhaps we could massage the code into a shape that the optimizer can disentangle, but I think it would be better to offer something like your |
This is missing an actual reproducer... Tried a few sizes here: https://rust.godbolt.org/z/Kv7sTcTec I assume the report is about the case where this doesn't get unrolled, like with N=128? |
Part of this is the classic "too many copies" trio
But some of it is probably just that, being eager, it won't optimize unless it fully unrolls. |
Remove array_zip `[T; N]::zip` is "eager" but most zips are mapped. This causes poor optimization in generated code. This is a fundamental design issue and "zip" is "prime real estate" in terms of function names, so let's free it up again. - FCP concluded in rust-lang/rust#80094 (comment) - Closes rust-lang/rust#80094 - Closes rust-lang/rust#103555 Could use review to make sure we aren't losing any essential codegen tests. r? `@scottmcm`
Today for no reason in particular I was curious about the behaviour of arrays when zipped together and then mapped into a single array, roughly:
I wrote some tests for this and disassembled them in Godbolt to see what was going on behind the scenes and wow was a whole lot going on that seemed like it didn't need to be. So I wrote something kinda like what's in the stdlib for mapping and zipping arrays, just without the iterators, to see if I could do any better. Turns out I absolutely can:
Maybe the design isn't as optimal (or as safe) as it could be, but it performs perfectly well in benchmarks (read: as fast as an unguarded version and anywhere between 3-25* times faster than the
zip_with
definition shown at the top of this issue. Here's a rough table of execution times averaged over the primitive ops (+
,-
,*
,&
,|
,^
):Not really liking how things are looking for
.zip().map()
. So, question then is: is there something in Rust that needs to change to let these optimisations happen, or do I need to look at putting this in a library somewhere (or maybe stdlib wink wink nudge nudge?)?*The 25x comes from the
[u8; 8]
benchmarks.zip_with
benches at 11.230ns, andzip_with_guarded
benches at 452.593ps, which seems suspicious but I can't seem to find anything wrong with the result.The text was updated successfully, but these errors were encountered: