Tried to solve some conflicts in mocks.py

roman-corgi · Jan 10, 2025 · a2a41b9 · a2a41b9
2 parents 1a168f8 + 6208510
commit a2a41b9
Show file tree

Hide file tree

Showing 119 changed files with 84,610 additions and 2,076 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1 +1,6 @@
 tests/test_data/FluxMap1024.fits filter=lfs diff=lfs merge=lfs -text
+tests/test_data/medcombined-neptune_band_1.fits filter=lfs diff=lfs merge=lfs -text
+tests/test_data/medcombined-neptune_band_4.fits filter=lfs diff=lfs merge=lfs -text
+tests/test_data/medcombined-uranus_band_4.fits filter=lfs diff=lfs merge=lfs -text
+tests/test_data/medcombined-uranus_band_1.fits filter=lfs diff=lfs merge=lfs -text
+tests/test_data/mock_northup.fits filter=lfs diff=lfs merge=lfs -text
diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml
@@ -5,9 +5,9 @@ name: CI tests
 
 on:
   push:
-    branches: [ "main" ]
+    branches: [ "main", "develop" ]
   pull_request:
-    branches: [ "main" ]
+    branches: [ "main", "develop" ]
 
 permissions:
   contents: read
@@ -19,6 +19,10 @@ jobs:
 
     steps:
     - uses: actions/checkout@v3
+    - name: Set up Git LFS
+      run: |
+        git lfs install
+        git lfs pull
     - name: Set up Python 3.12
       uses: actions/setup-python@v3
       with:

diff --git a/.gitignore b/.gitignore
@@ -22,6 +22,8 @@ var/
 *.egg-info/
 .installed.cfg
 *.egg
+src/ 
+# II&T code for e2e tests may install in src/ if package by same name already installed in your env library
 
 # PyInstaller
 #  Usually these files are written by a python script from a template
@@ -61,6 +63,7 @@ tests/simdata/*
 tests/walker_output/*
 tests/testcalib/*
 tests/e2e_tests/*/*.json
+tests/ops_output/*.json
 
 # editor files
 .vscode/*
@@ -80,3 +83,6 @@ tests/e2e_tests/*/*.json
 # Output figures
 figures/
 tests/figures/
+tests/e2e_tests/*/*.png
+tests/e2e_tests/*_output
+tests/test_data/simastrom/guesses.csv
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -5,3 +5,4 @@ include tests/test_data/metadata.yaml
 include tests/test_data/metadata_eng.yaml
 include tests/test_data/nonlin_sample.csv
 include tests/test_data/nonlin_sample.fits
+include corgidrp/data/JWST_CALFIELD2020.csv
diff --git a/README.md b/README.md
@@ -15,7 +15,19 @@ That configuration directory will be used to locate things on your computer such
 
 ### For Developers
 
-Large binary files (used in tests) are stored in Git LFS. You may need to run `git lfs pull` after checking out the repository to download the latest large binary files, or the unit tests may fail.
+Large binary files (used in tests) are stored in Git LFS. [Install Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) if it isn't already installed.  You may need to run `git lfs pull` after checking out the repository to download the latest large binary files, or the unit tests may fail.
+
+To run the existing end-to-end tests, you also need the II&T code, which is used directly for comparing results. This also requires Git LFS to be installed first. Then install the II&T code by doing the following while in the top-level folder:
+
+```
+pip install -r requirements_e2etests.txt corgidrp
+```
+
+This will install the II&T repositories `cal` and `proc_cgi_frame`.  
+
+### Troubleshooting
+
+If you run into any issues with things in the `.corgidrp` directory not being found properly when you run the pipeline, such as a DetectorParams file, caldb, or configuration settings, your corgidrp is configured into a weird state. Report the bug to our Github issue tracker that includes both the error message, and the state of your `.corgidrp` folder. If you don't want to wait for us to troubleshoot the bug and deploy a fix, you can probably resolve the issue by completely deleting your `.corgidrp` folder and rerunning the code (the code will automatically remake it). This however means you will lose any changes you've made to your settings as well as your calibration database.
 
 ## How to Contribute
 
@@ -83,11 +95,11 @@ def example_step(dataset, calib_data, tuneable_arg=1, another_arg="test"):
 
 Inside the function can be nearly anything you want, but the function signature and start/end of the function should follow a few rules.
 
-  * Each function should include a docstring that descibes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [goggle style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). 
+  * Each function should include a docstring that describes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [google style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). 
   * The input dataset should always be the first input
   * Additional arguments and keywords exist only if you need them--many relevant parameters might already by in Dataset headers. A pipeline step can only have a single argument (the input dataset) if needed.
   * All additional function arguments/keywords should only consist of the following types: int, float, str, or a class defined in corgidrp.Data. 
-    * (Long explaination for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
+    * (Long explanation for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
   * The first line of the function generally should be creating a copy of the dataset (which will be the output dataset). This way, the output dataset is not the same instance as the input dataset. This will make it easier to ensure reproducibility. 
   * The function should always end with updating the header and (typically) the data of the output dataset. The history of running this pipeline step should be recorded in the header. 
 
@@ -116,12 +128,12 @@ End-to-end testing refers to processing data as one would when we get the real d
   1. Write a recipe that produces the desired processed data product starting from L1 data. You will need to determine the series of step functions that need to be run, and what kind of arguments should be modified (e.g., whether prescan columns pixels should be cropped). Refer to the existing recipes in `corgidrp/recipe_templates` as examples and double check all the necessary steps in the FDD. 
   2. Obtain TVAC L1 data from our Box folder (ask Alex Greenbaum or Jason if you don't have access). For some situations (e.g., boresight), there may not be appropriate TVAC data. In those cases, write a piece of code that uses the images from TVAC to provide realistic noise and add it to mock data (i.e., the ones generated for the unit testing) to create mock L1 data.
   3. Write an end-to-end test that processes the L1 data through the new recipe you created using the corgidrp.walker framework
-      - You will probably need to modify the `corgidrp.walker.guess_template()` function to add logic for determining when to use your recipe based on header keywords (e.g., OBSTYPE). Ask Jason, who developed this framework, if it is not clear what should be done. 
+      - You will probably need to modify the `corgidrp.walker.guess_template()` function to add logic for determining when to use your recipe based on header keywords (e.g., VISTYPE). Ask Jason, who developed this framework, if it is not clear what should be done. 
       - Your recipe may require other calibratio files. For now, create them as part of the setup process during the script (see `tests/e2e_tests/l1_to_l2b_e2e.py` for examples of how to do this for each type of calibration)
       - if you need to create mock L1 data, please do it in the script as well. 
       - See the existing tests in `tests/e2e_tests/` for how to structure this script. You should only need to write a single script.
   4. Test that the script runs successfully on your local machine and produces the expected output. Debug as necessary. When appropriate, test your results against those obtained from the II&T/TVAC pipeline using the same input data. 
-  5. Determine how resource intensive your recipe is. There are many ways to do this, but Mac/Linux users can run `/usr/bin/time -v python your_e2e_test.py`. Record "the percent of CPU this job got", "Elapsed (wall clock) time", and "Maximum resident set size (kbytes)". 
+  5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac users can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size"). 
   6. Document your recipe on the "Corgi-DRP Implementation Document" on Confluence (see the big table in Section 2.0). You should fill out an entire row with your recipe. Under addition notes, note if your recipe took significant run time (> 1 minute) and significant memory (> 1 GB). 
   7. PR! 
 
@@ -163,11 +175,11 @@ Before creating a pull request, review the design Principles below. Use the Gith
 
 ## FAQ
 
-  * Does my pipeline function need to save files?
-    * Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
-  * Can I create new data classes?
-    * Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
-
+* Does my pipeline function need to save files?
+  * Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
+* Can I create new data classes?
+  * Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
+  * You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all, you can use the copy function of the Image class.
 * What python version should I develop in?
   * Python 3.12
 
@@ -183,4 +195,45 @@ Before creating a pull request, review the design Principles below. Use the Gith
 
 * Where do I save FITS files or other data files I need to use for my tests?
   * Auxiliary data to run tests should be stored in the tests/test_data folder
-  * If they are larger than 1 MB, they should be stored using `git lfs`. Ask Jason about setting up git lfs (as of writing, we have not set up git lfs yet). 
+  * If they are larger than 1 MB, they should be stored using `git lfs`. Ask Jason about setting up git lfs (as of writing, we have not set up git lfs yet).
+
+## Change Log
+
+**v1.1.2**
+ * Flat field correction marks pixels divided by 0 as bad
+
+**v1.1.1**
+ * Fix unit test that wasn't cleaning up environment properly
+
+**v1.1**
+ * Bug fix so that corgidrp classes can be pickled
+ * New corgidrp.ops interface
+ * Improved agreement with II&T pipeline in updated e2e tests
+ * Ability to embed just the illuminated part of the detector back into a full engineering frame 
+ * Updated DRP throughout to handle recently updated data header specification
+
+**v1.0** 
+ * First official pipeline release!
+ * Step functions to produce the necessary calibration files for analog L1 to L2b step functions implemented and tested
+ * Step function to produce boresight calibration implemented and tested
+ * Automated data processing handling for analog L1/L2 calibration files and for boresight calibration
+ * End-to-end testing demonstrating analog L1/L2 calibration files and boresight calibration file can be produced from realistic/real L1 data files
+
+**v0.2.1**
+ * Update end-to-end tests to handle updated test data filenames
+
+**v0.2**
+ * All L1 to L2b step functions implemented and tested
+ * Automated data porcessing for analog L1 to L2b
+ * End-to-end testing for analog L1 to L2b processing
+ * Minor bug fixes throughout
+
+**v0.1.2**
+ * Added ability to change paths for analog L1 to L2a end-to-end test from command line
+ * v0.1.1 was a partial release of this release, and should not be used. 
+
+**v0.1**
+ * First preliminary release of pipeline including step functions (see next bullet), calibration database, walker (pipeline automation framework)
+ * All L1 to L2a step funtions implemented and tested
+ * Automated data processing for analog L1 to L2a
+ * End-to-end test demonstrating analog L1 to L2a processing
diff --git a/corgidrp/__init__.py b/corgidrp/__init__.py
@@ -3,7 +3,7 @@
 import pathlib
 import configparser
 
-__version__ = "0.1.2"
+__version__ = "1.1.2"
 version = __version__ # temporary backwards compatability 
 
 #### Create a configuration file for the corgidrp if it doesn't exist. 
@@ -25,28 +25,31 @@ def create_config_dir():
     if not os.path.isdir(config_folder):
         os.mkdir(config_folder)
 
-        # make default calibrations folder
-        default_cal_dir = os.path.join(config_folder, "default_calibs")
-        if not os.path.exists(default_cal_dir):
-            os.mkdir(default_cal_dir)
-
-        # write config 
-        config_filepath = os.path.join(config_folder, "corgidrp.cfg")
-        if not os.path.exists(config_filepath):
-            config = configparser.ConfigParser()
-            config["PATH"] = {}
-            config["PATH"]["caldb"] = os.path.join(config_folder, "corgidrp_caldb.csv") # location to store caldb
-            config["PATH"]["default_calibs"] = default_cal_dir
-            config["DATA"] = {}
-            config["DATA"]["track_individual_errors"] = "False"
-            # overwrite with old settings if needed
-            if oldconfig is not None:
-                config["PATH"]["caldb"] = oldconfig["PATH"]["caldb"]
-
-            with open(config_filepath, 'w') as f:
-                config.write(f)
-
-            print("corgidrp: Configuration file written to {0}. Please edit if you want things stored in different locations.".format(config_filepath))
+    # make default calibrations folder if it doesn't exist
+    default_cal_dir = os.path.join(config_folder, "default_calibs")
+    if not os.path.exists(default_cal_dir):
+        os.mkdir(default_cal_dir)
+
+    # write config if it doesn't exist
+    config_filepath = os.path.join(config_folder, "corgidrp.cfg")
+    if not os.path.exists(config_filepath):
+        config = configparser.ConfigParser()
+        config["PATH"] = {}
+        config["PATH"]["caldb"] = os.path.join(config_folder, "corgidrp_caldb.csv") # location to store caldb
+        config["PATH"]["default_calibs"] = default_cal_dir
+        config["DATA"] = {}
+        config["DATA"]["track_individual_errors"] = "False"
+        config["WALKER"] = {}
+        config["WALKER"]["skip_missing_cal_steps"] = "False"
+        config["WALKER"]["jit_calib_id"] = "False"
+        # overwrite with old settings if needed
+        if oldconfig is not None:
+            config["PATH"]["caldb"] = oldconfig["PATH"]["caldb"]
+
+        with open(config_filepath, 'w') as f:
+            config.write(f)
+
+        print("corgidrp: Configuration file written to {0}. Please edit if you want things stored in different locations.".format(config_filepath))
 create_config_dir()
 
 _bool_map = {"true" : True, "false" : False}
@@ -58,6 +61,8 @@ def create_config_dir():
 config.read(config_filepath)
 
 ## pipeline settings
-caldb_filepath = config.get("PATH", "caldb", fallback=None)
-default_cal_dir = config.get("PATH", "default_calibs", fallback=None)
-track_individual_errors = _bool_map[config.get("DATA", "track_individual_errors").lower()]
+caldb_filepath = config.get("PATH", "caldb", fallback=None) # path to calibration db
+default_cal_dir = config.get("PATH", "default_calibs", fallback=None) # path to default calibrations directory
+track_individual_errors = _bool_map[config.get("DATA", "track_individual_errors", fallback='false').lower()] # save each individual error component separately?
+skip_missing_cal_steps = _bool_map[config.get("WALKER", "skip_missing_cal_steps", fallback='false').lower()] # skip steps, instead of crashing, when suitable calibration file cannot be found 
+jit_calib_id = _bool_map[config.get("WALKER", "jit_calib_id", fallback='false').lower()] # AUTOMATIC calibration files identified right before the execution of a step, rather than when recipe is first generated