-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite abscal.match_times() to be much faster when dealing with very long lists of modelfiles #982
Conversation
… long lists of modelfiles
Errors appear to be related to the pending herasim PR: HERA-Team/hera_sim#332 |
@jsdillon I think it might be useful to try to merge the capabilities of this function with the one I wrote for LST-binning: Line 1473 in fe6902c
Both of them essentially match times with a binary search. The one I wrote also offers the capability of reading the times from the filenames instead of reading the data from the HDF5 file itself (but doesn't require this to run), as well as taking advantage of the fact that almost always, the list of "model files" are separated by a constant amount of time, so a pretty good guess can be made after reading the first file as to which to file to jump to. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #982 +/- ##
=======================================
Coverage 97.20% 97.20%
=======================================
Files 31 31
Lines 11161 11198 +37
=======================================
+ Hits 10849 10885 +36
- Misses 312 313 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I found a small bug in the function and fixed it, but would love to get your take on it, @jsdillon. It was breaking the Validation runs. While looking at this, I realized that my function for LST binning does do roughly the same thing, but it is sufficiently different such that it's difficult to merge them (it operates on JDs instead of LSTs and directly uses the fact that the files are UVH5 files). |
@steven-murray if you think this is ready, let's get it merged in |
@jsdillon it would be good to have a test that would have failed without the fix. |
@steven-murray I've added some tests that cover that line |
Between lustre's general slowness and the huge number of abscal model files for validation (>8000), it was taking 10s of minutes to find the data's matching model files in LST for abscal.
This cuts it down to a few seconds by performing a binary search and only actually reading O(logN) files' metadata.