Ramses-speedup #4720

cphyc · 2023-10-31T18:44:52Z

PR Summary

RAMSES files require lots of random access in the file. In particular, when you have a lot of fields in the outputs, the performance drastically decreases if you're skipping most of them. In this PR, I precompute how many "jumps" one needs to do ahead of time, and then do all jumps at once.

The code should be functionally equivalent but results in fewer reads.

Performances

slice	Timing on `main` [s]	Timing with this PR [s]
all	52	23
::2	43	17
::4	39	12
:4	35	9

To estimate the timings, the following code is being run, with the slice as reported in the table above. This is run on Dial 3.

import yt

# This is a (very) large simulation - 500Gib/output, 110 fields on-disk
p = "/lustre/dirac3/home/dc-cadi1/dp265/dc-katz1/MEGATRON/PRODUCTION_CP/output_00048"

# Only load the innermost part of the simulation (sufficient for benchmarking)
bbox = [[0.499]*3, [0.501]*3]

ds = yt.load(p, bbox=bbox)
ad = ds.all_data()

# Retain only the hydro fields + level
fields = [
    ("index", "grid_level"),
    *((ft, fn) for (ft, fn) in ds.field_list if ft == "ramses")
]

from time import time
before = time()
ad.get_data(fields[<slice>])
after = time()
print(f"Took {after-before:.2s}s")

cphyc · 2023-11-01T10:36:19Z

@yt-fido test this please.

matthewturk

This looks awesome. Great idea and catch!

yt/frontends/ramses/io_utils.pyx

matthewturk · 2023-11-01T17:00:45Z

yt/frontends/ramses/io_utils.pyx

+            # Alias buffer into dictionary
+            tmp = {}
+            for i, field in enumerate(fields):
+                tmp[field] = buffer[:, :, i]


This is a copy isn't it? Because it's unrolling?

It should not be a copy! The whole business is to avoid copying data over and over again.

matthewturk · 2023-11-01T17:02:09Z

Nice! I think it might be interesting at a future time to explore the impact of using a memory map, to see if that can farm out the skipping/paging/seeking to the OS.

cphyc · 2023-11-02T09:14:13Z

@yt-fido test this please

cphyc · 2023-11-02T09:31:28Z

@yt-fido test this please

matthewturk · 2023-11-02T21:54:06Z

Perhaps this is a silly question, but have we evaluated the speed costs of computing the (absolute) offsets ahead-of-time for all fields, then doing absolute seeks sorted by the position in the file? Rather than doing the cumulative seek-by-field-size?

cphyc · 2023-11-03T09:00:07Z

At the moment, the file is already being read in order with a combination of absolute (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R203) and relative (https://github.com/yt-project/yt/pull/4720/files#diff-74cc1bfe029e80ddd26aba7f83f08aae2cd98986ec28ae0e745be83ce0658f64R216) seeks, but all of them should follow one another. Is this what you were referring to?

matthewturk · 2023-11-03T14:07:42Z

I know we already sort by that -- I just meant to eliminate some of the overhead in computing the field skips etc by precomputing the exact position offsets and seeking directly to them.

cphyc · 2023-11-03T14:53:30Z

Oh, I see. Let me illustrate to know if I understood your point correctly:

- iterate over levels
  - iterate over CPUs
	  1.   do an absolute seek
	  2.   do a relative seek to the first field to be read
	  3.   eventually read the ones immediately after
	  4.   do a relative seek to the next batch of fields to be read
	  5.   eventually read the ones immediately after
	  6.   [...]
 	  n.   do a relative seek to the last batch of fields to be read
	  n+1. eventually, read the ones immediately after

Following your comment, I think I optimized the hell out of the reader by putting together the first absolute seek + first relative seek (1 + 2).
With 7a953bc, I'm now limiting the number of jumps to its minimum. We could compute ahead of time the value of the skip_len, but essentially each entry will have a unique value, so we would just be displacing the computation from being inline to being out of the loop (with small extra memory footprint). Does this make sense?

matthewturk · 2023-11-03T14:56:32Z

Not quite, and I also don't want to side-track you too much. My idea was simply to compute the absolute of each field, but I am now realizing that one big speedup we'd miss would be reading stride 1 data of size greater than a single field, whereas with my question it would require reading a single field at once (even if the seeks were not minimized). So forget it -- it's pointless!

cphyc · 2023-11-03T15:09:19Z

@yt-fido test this please

cphyc · 2023-11-03T15:15:07Z

@yt-fido test this please

cphyc added 4 commits October 31, 2023 18:30

Fast seek through file

4419777

Make performance-critical loop C-only

bad7f66

Also make reading offset faster

ada5dda

Make performance-critical functions cpdef'ed

50711d0

cphyc added code frontends Things related to specific frontends performance labels Oct 31, 2023

cphyc marked this pull request as ready for review November 1, 2023 11:01

cphyc added 2 commits November 1, 2023 11:04

Fix the docstring

253f0ea

Avoid repetitive allocations

57fef2f

cphyc force-pushed the ramses-speedup branch from 6ce863f to 57fef2f Compare November 1, 2023 12:15

cphyc added the enhancement Making something better label Nov 1, 2023

matthewturk reviewed Nov 1, 2023

View reviewed changes

cphyc added 2 commits November 1, 2023 17:23

Make comment more obvious

d33e2b1

Making the skip_len nogil, just to be safe

a16bda2

matthewturk previously approved these changes Nov 2, 2023

View reviewed changes

Preallocate given maximal size

b3eef0d

cphyc dismissed matthewturk’s stale review via b3eef0d November 3, 2023 14:37

Spare the first jump

7a953bc

cphyc mentioned this pull request Nov 6, 2023

Read AMR domains lazily #4734

Merged

matthewturk approved these changes Nov 7, 2023

View reviewed changes

cphyc mentioned this pull request Nov 7, 2023

Fast seek through file #4736

Merged

neutrinoceros merged commit de317c9 into yt-project:main Nov 8, 2023

neutrinoceros added this to the 4.4.0 milestone Nov 8, 2023

cphyc deleted the ramses-speedup branch November 8, 2023 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ramses-speedup #4720

Ramses-speedup #4720

cphyc commented Oct 31, 2023 •

edited

Loading

cphyc commented Nov 1, 2023

matthewturk left a comment

matthewturk Nov 1, 2023

cphyc Nov 1, 2023

matthewturk commented Nov 1, 2023

cphyc commented Nov 2, 2023

cphyc commented Nov 2, 2023

matthewturk commented Nov 2, 2023

cphyc commented Nov 3, 2023

matthewturk commented Nov 3, 2023

cphyc commented Nov 3, 2023

matthewturk commented Nov 3, 2023

cphyc commented Nov 3, 2023

cphyc commented Nov 3, 2023

Ramses-speedup #4720

Ramses-speedup #4720

Conversation

cphyc commented Oct 31, 2023 • edited Loading

PR Summary

Performances

cphyc commented Nov 1, 2023

matthewturk left a comment

Choose a reason for hiding this comment

matthewturk Nov 1, 2023

Choose a reason for hiding this comment

cphyc Nov 1, 2023

Choose a reason for hiding this comment

matthewturk commented Nov 1, 2023

cphyc commented Nov 2, 2023

cphyc commented Nov 2, 2023

matthewturk commented Nov 2, 2023

cphyc commented Nov 3, 2023

matthewturk commented Nov 3, 2023

cphyc commented Nov 3, 2023

matthewturk commented Nov 3, 2023

cphyc commented Nov 3, 2023

cphyc commented Nov 3, 2023

cphyc commented Oct 31, 2023 •

edited

Loading