Skip to content

Commit

Permalink
Merge pull request #302 from roman-corgi/develop
Browse files Browse the repository at this point in the history
Merge develop back into main
  • Loading branch information
semaphoreP authored Feb 14, 2025
2 parents 267354a + 07f4f4d commit 0f575a0
Show file tree
Hide file tree
Showing 53 changed files with 3,634 additions and 615 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ coverage.xml
*.log

# Sphinx documentation
docs/_build/
docs/source/_build/

# PyBuilder
target/
Expand Down
26 changes: 8 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,11 @@ def example_step(dataset, calib_data, tuneable_arg=1, another_arg="test"):

Inside the function can be nearly anything you want, but the function signature and start/end of the function should follow a few rules.

* Each function should include a docstring that descibes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [goggle style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* Each function should include a docstring that describes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [google style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* The input dataset should always be the first input
* Additional arguments and keywords exist only if you need them--many relevant parameters might already by in Dataset headers. A pipeline step can only have a single argument (the input dataset) if needed.
* All additional function arguments/keywords should only consist of the following types: int, float, str, or a class defined in corgidrp.Data.
* (Long explaination for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* (Long explanation for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* The first line of the function generally should be creating a copy of the dataset (which will be the output dataset). This way, the output dataset is not the same instance as the input dataset. This will make it easier to ensure reproducibility.
* The function should always end with updating the header and (typically) the data of the output dataset. The history of running this pipeline step should be recorded in the header.

Expand Down Expand Up @@ -133,7 +133,7 @@ End-to-end testing refers to processing data as one would when we get the real d
- if you need to create mock L1 data, please do it in the script as well.
- See the existing tests in `tests/e2e_tests/` for how to structure this script. You should only need to write a single script.
4. Test that the script runs successfully on your local machine and produces the expected output. Debug as necessary. When appropriate, test your results against those obtained from the II&T/TVAC pipeline using the same input data.
5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac userse can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size").
5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac users can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size").
6. Document your recipe on the "Corgi-DRP Implementation Document" on Confluence (see the big table in Section 2.0). You should fill out an entire row with your recipe. Under addition notes, note if your recipe took significant run time (> 1 minute) and significant memory (> 1 GB).
7. PR!

Expand Down Expand Up @@ -175,21 +175,11 @@ Before creating a pull request, review the design Principles below. Use the Gith

## FAQ

* Does my pipeline function need to save files?
* Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
* Can I create new data classes?
* Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
* You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all you can import and apply the copy module of python, see
example:
```
import copy
flatfield = data.Flatfield('flatfield.fits')
#reference copy
flatfield_copy = copy.copy(flatfield)
#deep data copy
flatfield_copy = copy.deepcopy(flatfield)
```
* Does my pipeline function need to save files?
* Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
* Can I create new data classes?
* Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
* You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all, you can use the copy function of the Image class.
* What python version should I develop in?
* Python 3.12

Expand Down
10 changes: 4 additions & 6 deletions corgidrp/caldb.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def _get_values_from_entry(self, entry, is_calib=True):
drp_version,
obsid,
naxis1,
naxis2,
naxis2
]

# rest are ext_hdr keys we can copy
Expand Down Expand Up @@ -243,7 +243,7 @@ def remove_entry(self, entry, to_disk=True):

def get_calib(self, frame, dtype, to_disk=True):
"""
Outputs the best calibration file of the given type for the input sciene frame.
Outputs the best calibration file of the given type for the input science frame.
Args:
frame (corgidrp.data.Image): an image frame to request a calibration for. If None is passed in, looks for the
Expand Down Expand Up @@ -288,14 +288,12 @@ def get_calib(self, frame, dtype, to_disk=True):
options = calibdf.loc[
(
(calibdf["EXPTIME"] == frame_dict["EXPTIME"])
& (calibdf["NAXIS1"] == frame_dict["NAXIS1"])
& (calibdf["NAXIS2"] == frame_dict["NAXIS2"])
)
]

if len(options) == 0:
raise ValueError("No valid Dark with EXPTIME={0} and dimension ({1},{2})"
.format(frame_dict["EXPTIME"], frame_dict["NAXIS1"], frame_dict["NAXIS2"]))
raise ValueError("No valid Dark with EXPTIME={0})"
.format(frame_dict["EXPTIME"]))

# select the one closest in time
result_index = np.abs(options["MJD"] - frame_dict["MJD"]).argmin()
Expand Down
110 changes: 62 additions & 48 deletions corgidrp/calibrate_kgain.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from corgidrp.detector import slice_section, detector_areas

# Dictionary with constant kgain calibration parameters
kgain_params= {
kgain_params_default= {
# ROI constants
'rowroi1': 9,
'rowroi2': 1000,
Expand All @@ -30,34 +30,30 @@
'signal_bins_N': 400,
}

def check_kgain_params(
):
""" Checks integrity of kgain parameters in the dictionary kgain_params. """
if 'offset_colroi1' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
if 'offset_colroi2' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
def check_kgain_params(kgain_params):
""" Checks integrity of kgain parameters in the dictionary kgain_params.
Args:
kgain_params (dict): Dictionary of parameters used for calibrating the k gain.
"""

if 'rowroi1' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: rowroi1.')
if 'rowroi2' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: rowroi2.')
if 'colroi1' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: colroi1.')
if 'colroi2' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: colroi2.')
if 'rn_bins1' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: rn_bins1.')
if 'rn_bins2' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: rn_bins2.')
if 'max_DN_val' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: max_DN_val.')
if 'signal_bins_N' not in kgain_params:
raise ValueError('Missing parameter in directory pointer YAML file.')
raise ValueError('Missing parameter: signal_bins_N.')

if not isinstance(kgain_params['offset_colroi1'], (float, int)):
raise TypeError('offset_colroi1 is not a number')
if not isinstance(kgain_params['offset_colroi2'], (float, int)):
raise TypeError('offset_colroi2 is not a number')
if not isinstance(kgain_params['rowroi1'], (float, int)):
raise TypeError('rowroi1 is not a number')
if not isinstance(kgain_params['rowroi2'], (float, int)):
Expand Down Expand Up @@ -281,38 +277,39 @@ def calibrate_kgain(dataset_kgain,
n_cal=10, n_mean=30, min_val=800, max_val=3000, binwidth=68,
make_plot=True,plot_outdir='figures', show_plot=False,
logspace_start=-1, logspace_stop=4, logspace_num=200,
verbose=False, detector_regions=None):
verbose=False, detector_regions=None, kgain_params=None):
"""
Given an array of frame stacks for various exposure times, each sub-stack
having at least 5 illuminated pupil L1 SCI-size frames having the same
exposure time. The frames are bias-subtracted, and in addition, if EM gain
is >1 for the input data for calibrate_kgain, EM gain division is also needed.
It also creates a mean pupil array from a separate stack of
frames of uniform exposure time. The mean pupil array is scaled to the mean
of each stack and statistics (mean and std dev) are calculated for bins from
the frames in it. kgain (e-/DN) is calculated from the means and variances
kgain (e-/DN) is calculated from the means and variances
within the defined minimum and maximum mean values. A photon transfer curve
is plotted from the std dev and mean values from the bins.
Args:
dataset_kgain (corgidrp.Dataset): Dataset with a set of of EXCAM illuminated
pupil L1 SCI frames (counts in DN) having a range of exp times.
datset_cal contains a set of subset of frames, and all subsets must have
the same number of frames, which is a minimum of 5. The frames in a subset
must all have the same exposure time. There must be at least 10 subsets
(More than 20 sub-stacks recommended. The mean signal in the pupil region should
span from about 100 to about 10000 DN.
In addition, dataset_kgain contains a set of at least 30 frames used to
build a mean frame. All the frames must have the same exposure time,
such that the net mean counts in the pupil region is a few thousand DN
(2000 to 4000 DN recommended;
notice that unity EM gain is recommended when k-gain is the primary desired
product, since it is known more accurately than non-unity values. This
mean frame is used to select pixels with similar illumination for
calculating variances (since the pupil illumination is not perfectly uniform).
All data must be obtained under the same positioning of the pupil
relative to the detector. These frames are identified with the kewyord
'OBSTYPE'='MNFRAME' (TBD).
dataset_kgain (corgidrp.Dataset): The frames in the dataset are
bias-subtracted. The dataset contains frames belonging to two different
sets -- Mean frame and a large array of unity gain frames.
Mean frame: Unity gain frames with constant exposure time. These frames
are used to create a mean pupil image. The mean frame is used to select
pixels in each frame of the large array of unity gain frames (see next)
to calculate its mean signal. In general, it is expected that at least
30 frames or more will be taken for this set. In TVAC, 30 frames, each
with an exposure time of 5.0 sec were taken.
Large array of unity gain frames: Set of unity gain frames with subsets
of equal exposure times. Data for each subset should be taken sequentially:
Each subset must have at least 5 frames. All frames for a subset are taken
before moving to the next subset. Two of the subsets have the same (repeated)
exposure time. These two subsets are not contiguous: The first subset is
taken near the start of the data collection and the second one is taken
at the end of the data collection (see TVAC example below). The mean
signal of these two subsets is used to correct for illumination
brightness/sensor sensitivity drifts for all the frames in the whole set,
depending on when the frames were taken. There should be no other repeated
exposure time among the subsets. In TVAC, a total of 110 frames were taken
within this category. The 110 frames consisted of 22 subsets, each with
5 frames. All 5 frames had the same exposure time. The exposure times in
TVAC in seconds were, each repeated 5 times to collect 5 frames in each
subset -- 0.077, 0.770, 1.538, 2.308, 3.077, 3.846, 4.615, 5.385, 6.154,
6.923, 7.692, 8.462, 9.231, 10.000, 11.538, 10.769, 12.308, 13.077,
13.846, 14.615, 15.385, and 1.538 (again).
n_cal (int):
Minimum number of sub-stacks used to calibrate K-Gain. The default value
is 10.
Expand Down Expand Up @@ -343,12 +340,29 @@ def calibrate_kgain(dataset_kgain,
detector_regions (dict): a dictionary of detector geometry properties.
Keys should be as found in detector_areas in detector.py. Defaults to
that dictionary.
kgain_params (dict): (Optional) Dictionary containing row and col specifications
for the region of interest (indicated by 'rowroi1','rowroi2','colroi1',and 'colroi2').
The 'roi' needs one square region specified, and 'back' needs two square regions,
where a '1' ending indicates the smaller of two values, and a '2' ending indicates the larger
of two values. The coordinates of the square region are specified by matching
up as follows: (rowroi1, colroi1), (rowroi2, colroi1), etc.
Also must contain:
'rn_bins1': lower bound of counts histogram for fitting or read noise
'rn_bins2': upper bound of counts histogram for fitting or read noise
'max_DN_val': maximum DN value to be included in photon transfer curve (PTC)
'signal_bins_N': number of bins in the signal variables of PTC curve
Defaults to kgain_params_default included in this file.
Returns:
corgidrp.data.KGain: kgain estimate from the least-squares fit to the photon
transfer curve (in e-/DN). The expected value of kgain for EXCAM with
flight readout sequence should be between 8 and 9 e-/DN
"""
if kgain_params is None:
kgain_params = kgain_params_default

check_kgain_params(kgain_params)

if detector_regions is None:
detector_regions = detector_areas

Expand Down
Loading

0 comments on commit 0f575a0

Please sign in to comment.