Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ramses-speedup #4720

Merged
merged 10 commits into from
Nov 8, 2023
Merged

Ramses-speedup #4720

merged 10 commits into from
Nov 8, 2023

Conversation

cphyc
Copy link
Member

@cphyc cphyc commented Oct 31, 2023

PR Summary

RAMSES files require lots of random access in the file. In particular, when you have a lot of fields in the outputs, the performance drastically decreases if you're skipping most of them. In this PR, I precompute how many "jumps" one needs to do ahead of time, and then do all jumps at once.

The code should be functionally equivalent but results in fewer reads.

Performances

slice Timing on main [s] Timing with this PR [s]
all 52 23
::2 43 17
::4 39 12
:4 35 9

To estimate the timings, the following code is being run, with the slice as reported in the table above. This is run on Dial 3.

import yt

# This is a (very) large simulation - 500Gib/output, 110 fields on-disk
p = "/lustre/dirac3/home/dc-cadi1/dp265/dc-katz1/MEGATRON/PRODUCTION_CP/output_00048"

# Only load the innermost part of the simulation (sufficient for benchmarking)
bbox = [[0.499]*3, [0.501]*3]

ds = yt.load(p, bbox=bbox)
ad = ds.all_data()

# Retain only the hydro fields + level
fields = [
    ("index", "grid_level"),
    *((ft, fn) for (ft, fn) in ds.field_list if ft == "ramses")
]

from time import time
before = time()
ad.get_data(fields[<slice>])
after = time()
print(f"Took {after-before:.2s}s")

@cphyc cphyc added code frontends Things related to specific frontends performance labels Oct 31, 2023
@cphyc
Copy link
Member Author

cphyc commented Nov 1, 2023

@yt-fido test this please.

@cphyc cphyc marked this pull request as ready for review November 1, 2023 11:01
@cphyc cphyc added the enhancement Making something better label Nov 1, 2023
Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome. Great idea and catch!

yt/frontends/ramses/io_utils.pyx Outdated Show resolved Hide resolved
yt/frontends/ramses/io_utils.pyx Outdated Show resolved Hide resolved
yt/frontends/ramses/io_utils.pyx Show resolved Hide resolved
# Alias buffer into dictionary
tmp = {}
for i, field in enumerate(fields):
tmp[field] = buffer[:, :, i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy isn't it? Because it's unrolling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be a copy! The whole business is to avoid copying data over and over again.
image

@matthewturk
Copy link
Member

Nice! I think it might be interesting at a future time to explore the impact of using a memory map, to see if that can farm out the skipping/paging/seeking to the OS.

@cphyc
Copy link
Member Author

cphyc commented Nov 2, 2023

@yt-fido test this please

1 similar comment
@cphyc
Copy link
Member Author

cphyc commented Nov 2, 2023

@yt-fido test this please

matthewturk
matthewturk previously approved these changes Nov 2, 2023
@matthewturk
Copy link
Member

Perhaps this is a silly question, but have we evaluated the speed costs of computing the (absolute) offsets ahead-of-time for all fields, then doing absolute seeks sorted by the position in the file? Rather than doing the cumulative seek-by-field-size?

@cphyc
Copy link
Member Author

cphyc commented Nov 3, 2023

At the moment, the file is already being read in order with a combination of absolute (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R203) and relative (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R216) seeks, but all of them should follow one another. Is this what you were referring to?

@matthewturk
Copy link
Member

I know we already sort by that -- I just meant to eliminate some of the overhead in computing the field skips etc by precomputing the exact position offsets and seeking directly to them.

@cphyc
Copy link
Member Author

cphyc commented Nov 3, 2023

Oh, I see. Let me illustrate to know if I understood your point correctly:

- iterate over levels
  - iterate over CPUs
	  1.   do an absolute seek
	  2.   do a relative seek to the first field to be read
	  3.   eventually read the ones immediately after
	  4.   do a relative seek to the next batch of fields to be read
	  5.   eventually read the ones immediately after
	  6.   [...]
 	  n.   do a relative seek to the last batch of fields to be read
	  n+1. eventually, read the ones immediately after

Following your comment, I think I optimized the hell out of the reader by putting together the first absolute seek + first relative seek (1 + 2).
With 7a953bc, I'm now limiting the number of jumps to its minimum. We could compute ahead of time the value of the skip_len, but essentially each entry will have a unique value, so we would just be displacing the computation from being inline to being out of the loop (with small extra memory footprint). Does this make sense?

@matthewturk
Copy link
Member

Not quite, and I also don't want to side-track you too much. My idea was simply to compute the absolute of each field, but I am now realizing that one big speedup we'd miss would be reading stride 1 data of size greater than a single field, whereas with my question it would require reading a single field at once (even if the seeks were not minimized). So forget it -- it's pointless!

@cphyc
Copy link
Member Author

cphyc commented Nov 3, 2023

@yt-fido test this please

1 similar comment
@cphyc
Copy link
Member Author

cphyc commented Nov 3, 2023

@yt-fido test this please

@cphyc cphyc mentioned this pull request Nov 6, 2023
@cphyc cphyc mentioned this pull request Nov 7, 2023
@neutrinoceros neutrinoceros merged commit de317c9 into yt-project:main Nov 8, 2023
@neutrinoceros neutrinoceros added this to the 4.4.0 milestone Nov 8, 2023
@cphyc cphyc deleted the ramses-speedup branch November 8, 2023 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code frontends Things related to specific frontends enhancement Making something better performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants