-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ramses-speedup #4720
Ramses-speedup #4720
Conversation
@yt-fido test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. Great idea and catch!
# Alias buffer into dictionary | ||
tmp = {} | ||
for i, field in enumerate(fields): | ||
tmp[field] = buffer[:, :, i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy isn't it? Because it's unrolling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think it might be interesting at a future time to explore the impact of using a memory map, to see if that can farm out the skipping/paging/seeking to the OS. |
@yt-fido test this please |
1 similar comment
@yt-fido test this please |
Perhaps this is a silly question, but have we evaluated the speed costs of computing the (absolute) offsets ahead-of-time for all fields, then doing absolute seeks sorted by the position in the file? Rather than doing the cumulative seek-by-field-size? |
At the moment, the file is already being read in order with a combination of absolute (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R203) and relative (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R216) seeks, but all of them should follow one another. Is this what you were referring to? |
I know we already sort by that -- I just meant to eliminate some of the overhead in computing the field skips etc by precomputing the exact position offsets and seeking directly to them. |
Oh, I see. Let me illustrate to know if I understood your point correctly:
Following your comment, I think I optimized the hell out of the reader by putting together the first absolute seek + first relative seek (1 + 2). |
Not quite, and I also don't want to side-track you too much. My idea was simply to compute the absolute of each field, but I am now realizing that one big speedup we'd miss would be reading stride 1 data of size greater than a single field, whereas with my question it would require reading a single field at once (even if the seeks were not minimized). So forget it -- it's pointless! |
@yt-fido test this please |
1 similar comment
@yt-fido test this please |
PR Summary
RAMSES files require lots of random access in the file. In particular, when you have a lot of fields in the outputs, the performance drastically decreases if you're skipping most of them. In this PR, I precompute how many "jumps" one needs to do ahead of time, and then do all jumps at once.
The code should be functionally equivalent but results in fewer reads.
Performances
main
[s]To estimate the timings, the following code is being run, with the slice as reported in the table above. This is run on Dial 3.