Merge pull request #302 from roman-corgi/develop

Merge develop back into main
roman-corgi · Feb 14, 2025 · 0f575a0 · 0f575a0
2 parents 267354a + 07f4f4d
commit 0f575a0
Show file tree

Hide file tree

Showing 53 changed files with 3,634 additions and 615 deletions.
diff --git a/.gitignore b/.gitignore
@@ -53,7 +53,7 @@ coverage.xml
 *.log
 
 # Sphinx documentation
-docs/_build/
+docs/source/_build/
 
 # PyBuilder
 target/

diff --git a/README.md b/README.md
@@ -95,11 +95,11 @@ def example_step(dataset, calib_data, tuneable_arg=1, another_arg="test"):
 
 Inside the function can be nearly anything you want, but the function signature and start/end of the function should follow a few rules.
 
-  * Each function should include a docstring that descibes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [goggle style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). 
+  * Each function should include a docstring that describes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [google style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). 
   * The input dataset should always be the first input
   * Additional arguments and keywords exist only if you need them--many relevant parameters might already by in Dataset headers. A pipeline step can only have a single argument (the input dataset) if needed.
   * All additional function arguments/keywords should only consist of the following types: int, float, str, or a class defined in corgidrp.Data. 
-    * (Long explaination for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
+    * (Long explanation for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
   * The first line of the function generally should be creating a copy of the dataset (which will be the output dataset). This way, the output dataset is not the same instance as the input dataset. This will make it easier to ensure reproducibility. 
   * The function should always end with updating the header and (typically) the data of the output dataset. The history of running this pipeline step should be recorded in the header. 
 
@@ -133,7 +133,7 @@ End-to-end testing refers to processing data as one would when we get the real d
       - if you need to create mock L1 data, please do it in the script as well. 
       - See the existing tests in `tests/e2e_tests/` for how to structure this script. You should only need to write a single script.
   4. Test that the script runs successfully on your local machine and produces the expected output. Debug as necessary. When appropriate, test your results against those obtained from the II&T/TVAC pipeline using the same input data. 
-  5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac userse can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size"). 
+  5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac users can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size"). 
   6. Document your recipe on the "Corgi-DRP Implementation Document" on Confluence (see the big table in Section 2.0). You should fill out an entire row with your recipe. Under addition notes, note if your recipe took significant run time (> 1 minute) and significant memory (> 1 GB). 
   7. PR! 
 
@@ -175,21 +175,11 @@ Before creating a pull request, review the design Principles below. Use the Gith
 
 ## FAQ
 
-  * Does my pipeline function need to save files?
-    * Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
-  * Can I create new data classes?
-    * Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
-    * You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all you can import and apply the copy module of python, see
-      example: 
-      ```
-      import copy
-      flatfield = data.Flatfield('flatfield.fits')
-      #reference copy
-      flatfield_copy = copy.copy(flatfield)
-      #deep data copy
-      flatfield_copy = copy.deepcopy(flatfield)
-      ```
-
+* Does my pipeline function need to save files?
+  * Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
+* Can I create new data classes?
+  * Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
+  * You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all, you can use the copy function of the Image class.
 * What python version should I develop in?
   * Python 3.12
 

diff --git a/corgidrp/caldb.py b/corgidrp/caldb.py
@@ -168,7 +168,7 @@ def _get_values_from_entry(self, entry, is_calib=True):
             drp_version,
             obsid,
             naxis1,
-            naxis2,
+            naxis2
         ]
 
         # rest are ext_hdr keys we can copy
@@ -243,7 +243,7 @@ def remove_entry(self, entry, to_disk=True):
 
     def get_calib(self, frame, dtype, to_disk=True):
         """
-        Outputs the best calibration file of the given type for the input sciene frame.
+        Outputs the best calibration file of the given type for the input science frame.
 
         Args:
             frame (corgidrp.data.Image): an image frame to request a calibration for. If None is passed in, looks for the 
@@ -288,14 +288,12 @@ def get_calib(self, frame, dtype, to_disk=True):
             options = calibdf.loc[
                 (
                     (calibdf["EXPTIME"] == frame_dict["EXPTIME"])
-                    & (calibdf["NAXIS1"] == frame_dict["NAXIS1"])
-                    & (calibdf["NAXIS2"] == frame_dict["NAXIS2"])
                 )
             ]
 
             if len(options) == 0:
-                raise ValueError("No valid Dark with EXPTIME={0} and dimension ({1},{2})"
-                                 .format(frame_dict["EXPTIME"], frame_dict["NAXIS1"], frame_dict["NAXIS2"]))
+                raise ValueError("No valid Dark with EXPTIME={0})"
+                                 .format(frame_dict["EXPTIME"]))
 
             # select the one closest in time
             result_index = np.abs(options["MJD"] - frame_dict["MJD"]).argmin()

diff --git a/corgidrp/calibrate_kgain.py b/corgidrp/calibrate_kgain.py
@@ -12,7 +12,7 @@
 from corgidrp.detector import slice_section, detector_areas
 
 # Dictionary with constant kgain calibration parameters
-kgain_params= {
+kgain_params_default= {
 # ROI constants
 'rowroi1': 9,
 'rowroi2': 1000,
@@ -30,34 +30,30 @@
 'signal_bins_N': 400,
 }
 
-def check_kgain_params(
-    ):
-    """ Checks integrity of kgain parameters in the dictionary kgain_params. """
-    if 'offset_colroi1' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
-    if 'offset_colroi2' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+def check_kgain_params(kgain_params):
+    """ Checks integrity of kgain parameters in the dictionary kgain_params. 
+    
+    Args:
+        kgain_params (dict):  Dictionary of parameters used for calibrating the k gain.
+    """
+
     if 'rowroi1' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  rowroi1.')
     if 'rowroi2' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  rowroi2.')
     if 'colroi1' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  colroi1.')
     if 'colroi2' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  colroi2.')
     if 'rn_bins1' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  rn_bins1.')
     if 'rn_bins2' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  rn_bins2.')
     if 'max_DN_val' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  max_DN_val.')
     if 'signal_bins_N' not in kgain_params:
-        raise ValueError('Missing parameter in directory pointer YAML file.')
+        raise ValueError('Missing parameter:  signal_bins_N.')
 
-    if not isinstance(kgain_params['offset_colroi1'], (float, int)):
-        raise TypeError('offset_colroi1 is not a number')
-    if not isinstance(kgain_params['offset_colroi2'], (float, int)):
-        raise TypeError('offset_colroi2 is not a number')
     if not isinstance(kgain_params['rowroi1'], (float, int)):
         raise TypeError('rowroi1 is not a number')
     if not isinstance(kgain_params['rowroi2'], (float, int)):
@@ -281,38 +277,39 @@ def calibrate_kgain(dataset_kgain,
                     n_cal=10, n_mean=30, min_val=800, max_val=3000, binwidth=68,
                     make_plot=True,plot_outdir='figures', show_plot=False,
                     logspace_start=-1, logspace_stop=4, logspace_num=200,
-                    verbose=False, detector_regions=None):
+                    verbose=False, detector_regions=None, kgain_params=None):
     """
-    Given an array of frame stacks for various exposure times, each sub-stack
-    having at least 5 illuminated pupil L1 SCI-size frames having the same 
-    exposure time. The frames are bias-subtracted, and in addition, if EM gain
-    is >1 for the input data for calibrate_kgain, EM gain division is also needed.
-    It also creates a mean pupil array from a separate stack of
-    frames of uniform exposure time. The mean pupil array is scaled to the mean
-    of each stack and statistics (mean and std dev) are calculated for bins from
-    the frames in it. kgain (e-/DN) is calculated from the means and variances
+    kgain (e-/DN) is calculated from the means and variances
     within the defined minimum and maximum mean values. A photon transfer curve
     is plotted from the std dev and mean values from the bins. 
-    
+
     Args:
-      dataset_kgain (corgidrp.Dataset): Dataset with a set of of EXCAM illuminated
-        pupil L1 SCI frames (counts in DN) having a range of exp times.
-        datset_cal contains a set of subset of frames, and all subsets must have
-        the same number of frames, which is a minimum of 5. The frames in a subset
-        must all have the same exposure time. There must be at least 10 subsets 
-        (More than 20 sub-stacks recommended. The mean signal in the pupil region should 
-        span from about 100 to about 10000 DN.
-        In addition, dataset_kgain contains a set of at least 30 frames used to
-        build a mean frame. All the frames must have the same exposure time,
-        such that the net mean counts in the pupil region is a few thousand DN
-        (2000 to 4000 DN recommended;
-        notice that unity EM gain is recommended when k-gain is the primary desired
-        product, since it is known more accurately than non-unity values. This
-        mean frame is used to select pixels with similar illumination for
-        calculating variances (since the pupil illumination is not perfectly uniform).
-        All data must be obtained under the same positioning of the pupil
-        relative to the detector. These frames are identified with the kewyord
-        'OBSTYPE'='MNFRAME' (TBD). 
+      dataset_kgain (corgidrp.Dataset): The frames in the dataset are
+        bias-subtracted. The dataset contains frames belonging to two different
+        sets -- Mean frame and a large array of unity gain frames.
+        Mean frame: Unity gain frames with constant exposure time. These frames
+        are used to create a mean pupil image. The mean frame is used to select
+        pixels in each frame of the large array of unity gain frames (see next)
+        to calculate its mean signal. In general, it is expected that at least
+        30 frames or more will be taken for this set. In TVAC, 30 frames, each
+        with an exposure time of 5.0 sec were taken.
+        Large array of unity gain frames: Set of unity gain frames with subsets
+        of equal exposure times. Data for each subset should be taken sequentially:
+        Each subset must have at least 5 frames. All frames for a subset are taken
+        before moving to the next subset. Two of the subsets have the same (repeated)
+        exposure time. These two subsets are not contiguous: The first subset is
+        taken near the start of the data collection and the second one is taken
+        at the end of the data collection (see TVAC example below). The mean
+        signal of these two subsets is used to correct for illumination
+        brightness/sensor sensitivity drifts for all the frames in the whole set,
+        depending on when the frames were taken. There should be no other repeated
+        exposure time among the subsets. In TVAC, a total of 110 frames were taken
+        within this category. The 110 frames consisted of 22 subsets, each with
+        5 frames. All 5 frames had the same exposure time. The exposure times in
+        TVAC in seconds were, each repeated 5 times to collect 5 frames in each
+        subset -- 0.077, 0.770, 1.538, 2.308, 3.077, 3.846, 4.615, 5.385, 6.154,
+        6.923, 7.692, 8.462, 9.231, 10.000, 11.538, 10.769, 12.308, 13.077,
+        13.846, 14.615, 15.385, and 1.538 (again).
       n_cal (int):
         Minimum number of sub-stacks used to calibrate K-Gain. The default value
         is 10.
@@ -343,12 +340,29 @@ def calibrate_kgain(dataset_kgain,
       detector_regions (dict): a dictionary of detector geometry properties.
         Keys should be as found in detector_areas in detector.py.  Defaults to
         that dictionary.
+      kgain_params (dict): (Optional) Dictionary containing row and col specifications
+        for the region of interest (indicated by 'rowroi1','rowroi2','colroi1',and 'colroi2').
+        The 'roi' needs one square region specified, and 'back' needs two square regions, 
+        where a '1' ending indicates the smaller of two values, and a '2' ending indicates the larger 
+        of two values.  The coordinates of the square region are specified by matching 
+        up as follows: (rowroi1, colroi1), (rowroi2, colroi1), etc. 
+        Also must contain:
+        'rn_bins1': lower bound of counts histogram for fitting or read noise
+        'rn_bins2': upper bound of counts histogram for fitting or read noise 
+        'max_DN_val': maximum DN value to be included in photon transfer curve (PTC)
+        'signal_bins_N': number of bins in the signal variables of PTC curve
+        Defaults to kgain_params_default included in this file.
     
     Returns:
       corgidrp.data.KGain: kgain estimate from the least-squares fit to the photon
         transfer curve (in e-/DN). The expected value of kgain for EXCAM with
         flight readout sequence should be between 8 and 9 e-/DN
     """
+    if kgain_params is None:
+        kgain_params = kgain_params_default
+
+    check_kgain_params(kgain_params)
+
     if detector_regions is None:
         detector_regions = detector_areas