Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add TreeRingRadialFunction class #479

Merged
merged 4 commits into from
Jul 17, 2024
Merged

Conversation

jchiang87
Copy link
Collaborator

This avoids collisions when building the tree ring model under multiprocessing

# Make a dict indexed by det_name (a string that looks like Rxx_Syy)
self.info = {}
if not defer_load:
if only_dets:
logger.info("Reading in det_names: %s", only_dets)
self.fill_dict(only_dets=only_dets)

def fill_dict(self, only_dets=None):
"""Read the tree rings file and save the information therein to a dict, self.info.
def _read_info_blocks(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the call to this function should still respect the defer_load directive rather than run always in the constructor. It seems to me that the relevant change that fixed the problem is making the TreeRingRadialFunction class, which could store different A, B, etc. for different detectors, so they don't clobber each other if one of them is still in LookupTable.from_func while another one comes along and writes new values to those variables.

In fact I probably wouldn't bother separating out step of reading the blocks from the fill_dict function. I think it would be fine to use the old loop, but just don't store A,B,etc. as member variables. Just pass them as arguments to the TreeRingRadialFunction constructor.

The race condition stems from using member variables as temporary variables between reading them from the file and using them in building the radial function. That was just poor design. (Presumably by me.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, the current code in main is actually pretty close to as-is from Craig's original version. As far as keeping the reading of the data in _fill_dict, I had considered that, but currently each CCD will try to read the info file separately. 189 concurrent reads of that file probably isn't a big deal, but it still seemed better to read the data just once at the outset, since the amount of data to save isn't that much. Also unpacking the A, B, cfreqs, etc. in the TreeRingRadialFunction class seemed a lot cleaner than doing all that in the calling function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of this version is that if you are only running a single CCD (or a few), then it will still read all 189 CCDs worth of data at the start. I thought that was the main run time, not making the LookupTables, but maybe I'm wrong about that. Anyway, that was the point of the defer_load option, so it wouldn't take forever to setup all the tree ring stuff at the start in the main process.
I get that the implementation of that was bad, but I'm not convinced we need to completely bail on the deferment to fix it.

Copy link
Collaborator Author

@jchiang87 jchiang87 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reading the file is the issue. The current code in main also reads the entire file even for single CCD sims. The only addition here is that the data are partitioned into blocks, which I have trouble believing is that costly. I really think that the part that needs to be deferred is the evaluation of the tree ring function to make the lookup tables, which involves evaluating that sum over 20 sines and 20 cosines for ~2700 radial points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Maybe so. I guess we can go with this, and if it seems to be too slow for some system or use case, we can consider moving the I/O back into fill_dict. This is the "see if users complain" style of regression testing.

Copy link
Collaborator Author

@jchiang87 jchiang87 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a check, I added some instrumentation to print out the execution time for various parts of the code. Running at USDF, the ._read_info_blocks() function takes 2e-4 seconds for LsstComCamSim and 3e-3 seconds for LsstCam. Each evaluation of just the func = galsim.LookupTable.from_func(...) part in .fill_dict takes 0.17 seconds, so defer_load = True still saves a lot of time.

- name: Install test deps
run: |
conda install -y pytest pytest-xdist
pip install pytest-durations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two are repeated below. I think you want them there, not here. But at least don't install them twice.

@jchiang87 jchiang87 marked this pull request as ready for review July 17, 2024 03:02
@jchiang87 jchiang87 merged commit a4c2d97 into main Jul 17, 2024
3 checks passed
@jchiang87 jchiang87 deleted the u/jchiang/tree_ring_multiproc_fix branch July 17, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants