Skip to content

Commit

Permalink
Tried to solve some conflicts in mocks.py
Browse files Browse the repository at this point in the history
  • Loading branch information
LisaAltinier committed Jan 10, 2025
2 parents 1a168f8 + 6208510 commit a2a41b9
Show file tree
Hide file tree
Showing 119 changed files with 84,610 additions and 2,076 deletions.
5 changes: 5 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
tests/test_data/FluxMap1024.fits filter=lfs diff=lfs merge=lfs -text
tests/test_data/medcombined-neptune_band_1.fits filter=lfs diff=lfs merge=lfs -text
tests/test_data/medcombined-neptune_band_4.fits filter=lfs diff=lfs merge=lfs -text
tests/test_data/medcombined-uranus_band_4.fits filter=lfs diff=lfs merge=lfs -text
tests/test_data/medcombined-uranus_band_1.fits filter=lfs diff=lfs merge=lfs -text
tests/test_data/mock_northup.fits filter=lfs diff=lfs merge=lfs -text
8 changes: 6 additions & 2 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ name: CI tests

on:
push:
branches: [ "main" ]
branches: [ "main", "develop" ]
pull_request:
branches: [ "main" ]
branches: [ "main", "develop" ]

permissions:
contents: read
Expand All @@ -19,6 +19,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Git LFS
run: |
git lfs install
git lfs pull
- name: Set up Python 3.12
uses: actions/setup-python@v3
with:
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ var/
*.egg-info/
.installed.cfg
*.egg
src/
# II&T code for e2e tests may install in src/ if package by same name already installed in your env library

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -61,6 +63,7 @@ tests/simdata/*
tests/walker_output/*
tests/testcalib/*
tests/e2e_tests/*/*.json
tests/ops_output/*.json

# editor files
.vscode/*
Expand All @@ -80,3 +83,6 @@ tests/e2e_tests/*/*.json
# Output figures
figures/
tests/figures/
tests/e2e_tests/*/*.png
tests/e2e_tests/*_output
tests/test_data/simastrom/guesses.csv
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ include tests/test_data/metadata.yaml
include tests/test_data/metadata_eng.yaml
include tests/test_data/nonlin_sample.csv
include tests/test_data/nonlin_sample.fits
include corgidrp/data/JWST_CALFIELD2020.csv
75 changes: 64 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,19 @@ That configuration directory will be used to locate things on your computer such

### For Developers

Large binary files (used in tests) are stored in Git LFS. You may need to run `git lfs pull` after checking out the repository to download the latest large binary files, or the unit tests may fail.
Large binary files (used in tests) are stored in Git LFS. [Install Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) if it isn't already installed. You may need to run `git lfs pull` after checking out the repository to download the latest large binary files, or the unit tests may fail.

To run the existing end-to-end tests, you also need the II&T code, which is used directly for comparing results. This also requires Git LFS to be installed first. Then install the II&T code by doing the following while in the top-level folder:

```
pip install -r requirements_e2etests.txt corgidrp
```

This will install the II&T repositories `cal` and `proc_cgi_frame`.

### Troubleshooting

If you run into any issues with things in the `.corgidrp` directory not being found properly when you run the pipeline, such as a DetectorParams file, caldb, or configuration settings, your corgidrp is configured into a weird state. Report the bug to our Github issue tracker that includes both the error message, and the state of your `.corgidrp` folder. If you don't want to wait for us to troubleshoot the bug and deploy a fix, you can probably resolve the issue by completely deleting your `.corgidrp` folder and rerunning the code (the code will automatically remake it). This however means you will lose any changes you've made to your settings as well as your calibration database.

## How to Contribute

Expand Down Expand Up @@ -83,11 +95,11 @@ def example_step(dataset, calib_data, tuneable_arg=1, another_arg="test"):

Inside the function can be nearly anything you want, but the function signature and start/end of the function should follow a few rules.

* Each function should include a docstring that descibes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [goggle style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* Each function should include a docstring that describes what the function is doing, what the inputs (including units if appropriate) are and what the outputs (also with units). The dosctrings should be [google style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
* The input dataset should always be the first input
* Additional arguments and keywords exist only if you need them--many relevant parameters might already by in Dataset headers. A pipeline step can only have a single argument (the input dataset) if needed.
* All additional function arguments/keywords should only consist of the following types: int, float, str, or a class defined in corgidrp.Data.
* (Long explaination for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* (Long explanation for the curious: The reason for this is that pipeline steps can be written out as text files. Int/float/str are easily represented succinctly by textfiles. All classes in corgidrp.Data can be created simply by passing in a filepath. Therefore, all pipeline steps have easily recordable arguments for easy reproducibility.)
* The first line of the function generally should be creating a copy of the dataset (which will be the output dataset). This way, the output dataset is not the same instance as the input dataset. This will make it easier to ensure reproducibility.
* The function should always end with updating the header and (typically) the data of the output dataset. The history of running this pipeline step should be recorded in the header.

Expand Down Expand Up @@ -116,12 +128,12 @@ End-to-end testing refers to processing data as one would when we get the real d
1. Write a recipe that produces the desired processed data product starting from L1 data. You will need to determine the series of step functions that need to be run, and what kind of arguments should be modified (e.g., whether prescan columns pixels should be cropped). Refer to the existing recipes in `corgidrp/recipe_templates` as examples and double check all the necessary steps in the FDD.
2. Obtain TVAC L1 data from our Box folder (ask Alex Greenbaum or Jason if you don't have access). For some situations (e.g., boresight), there may not be appropriate TVAC data. In those cases, write a piece of code that uses the images from TVAC to provide realistic noise and add it to mock data (i.e., the ones generated for the unit testing) to create mock L1 data.
3. Write an end-to-end test that processes the L1 data through the new recipe you created using the corgidrp.walker framework
- You will probably need to modify the `corgidrp.walker.guess_template()` function to add logic for determining when to use your recipe based on header keywords (e.g., OBSTYPE). Ask Jason, who developed this framework, if it is not clear what should be done.
- You will probably need to modify the `corgidrp.walker.guess_template()` function to add logic for determining when to use your recipe based on header keywords (e.g., VISTYPE). Ask Jason, who developed this framework, if it is not clear what should be done.
- Your recipe may require other calibratio files. For now, create them as part of the setup process during the script (see `tests/e2e_tests/l1_to_l2b_e2e.py` for examples of how to do this for each type of calibration)
- if you need to create mock L1 data, please do it in the script as well.
- See the existing tests in `tests/e2e_tests/` for how to structure this script. You should only need to write a single script.
4. Test that the script runs successfully on your local machine and produces the expected output. Debug as necessary. When appropriate, test your results against those obtained from the II&T/TVAC pipeline using the same input data.
5. Determine how resource intensive your recipe is. There are many ways to do this, but Mac/Linux users can run `/usr/bin/time -v python your_e2e_test.py`. Record "the percent of CPU this job got", "Elapsed (wall clock) time", and "Maximum resident set size (kbytes)".
5. Determine how resource intensive your recipe is. There are many ways to do this, but Linux users can run `/usr/bin/time -v python your_e2e_test.py` and Mac users can run `/usr/bin/time -l -h -p python <your_e2e_test.py>`. Record elapsed (wall clock) time, the percent of CPU this job got (only if parallelization was used), and total memory used (labelled "Maximum resident set size").
6. Document your recipe on the "Corgi-DRP Implementation Document" on Confluence (see the big table in Section 2.0). You should fill out an entire row with your recipe. Under addition notes, note if your recipe took significant run time (> 1 minute) and significant memory (> 1 GB).
7. PR!

Expand Down Expand Up @@ -163,11 +175,11 @@ Before creating a pull request, review the design Principles below. Use the Gith

## FAQ

* Does my pipeline function need to save files?
* Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
* Can I create new data classes?
* Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!

* Does my pipeline function need to save files?
* Files will be saved by a higher level pipeline code. As long as you output an object that's an instance of a `corgidrp.Data` class, it will have a `save()` function that will be used.
* Can I create new data classes?
* Yes, you can feel free to make new data classes. Generally, they should be a subclass of the `Image` class, and you can look at the `Dark` class as an example. Each calibration type should have its own `Image` subclass defined. Talk with Jason and Max to discuss how your class should be implemented!
* You do not necessarily need to write a copy function for subclasses of the `Image` class. If you need to copy calibration objects at all, you can use the copy function of the Image class.
* What python version should I develop in?
* Python 3.12

Expand All @@ -183,4 +195,45 @@ Before creating a pull request, review the design Principles below. Use the Gith

* Where do I save FITS files or other data files I need to use for my tests?
* Auxiliary data to run tests should be stored in the tests/test_data folder
* If they are larger than 1 MB, they should be stored using `git lfs`. Ask Jason about setting up git lfs (as of writing, we have not set up git lfs yet).
* If they are larger than 1 MB, they should be stored using `git lfs`. Ask Jason about setting up git lfs (as of writing, we have not set up git lfs yet).

## Change Log

**v1.1.2**
* Flat field correction marks pixels divided by 0 as bad

**v1.1.1**
* Fix unit test that wasn't cleaning up environment properly

**v1.1**
* Bug fix so that corgidrp classes can be pickled
* New corgidrp.ops interface
* Improved agreement with II&T pipeline in updated e2e tests
* Ability to embed just the illuminated part of the detector back into a full engineering frame 
* Updated DRP throughout to handle recently updated data header specification

**v1.0**
* First official pipeline release!
* Step functions to produce the necessary calibration files for analog L1 to L2b step functions implemented and tested
* Step function to produce boresight calibration implemented and tested
* Automated data processing handling for analog L1/L2 calibration files and for boresight calibration
* End-to-end testing demonstrating analog L1/L2 calibration files and boresight calibration file can be produced from realistic/real L1 data files

**v0.2.1**
* Update end-to-end tests to handle updated test data filenames

**v0.2**
* All L1 to L2b step functions implemented and tested
* Automated data porcessing for analog L1 to L2b
* End-to-end testing for analog L1 to L2b processing
* Minor bug fixes throughout

**v0.1.2**
* Added ability to change paths for analog L1 to L2a end-to-end test from command line
* v0.1.1 was a partial release of this release, and should not be used.

**v0.1**
* First preliminary release of pipeline including step functions (see next bullet), calibration database, walker (pipeline automation framework)
* All L1 to L2a step funtions implemented and tested
* Automated data processing for analog L1 to L2a
* End-to-end test demonstrating analog L1 to L2a processing
57 changes: 31 additions & 26 deletions corgidrp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pathlib
import configparser

__version__ = "0.1.2"
__version__ = "1.1.2"
version = __version__ # temporary backwards compatability

#### Create a configuration file for the corgidrp if it doesn't exist.
Expand All @@ -25,28 +25,31 @@ def create_config_dir():
if not os.path.isdir(config_folder):
os.mkdir(config_folder)

# make default calibrations folder
default_cal_dir = os.path.join(config_folder, "default_calibs")
if not os.path.exists(default_cal_dir):
os.mkdir(default_cal_dir)

# write config
config_filepath = os.path.join(config_folder, "corgidrp.cfg")
if not os.path.exists(config_filepath):
config = configparser.ConfigParser()
config["PATH"] = {}
config["PATH"]["caldb"] = os.path.join(config_folder, "corgidrp_caldb.csv") # location to store caldb
config["PATH"]["default_calibs"] = default_cal_dir
config["DATA"] = {}
config["DATA"]["track_individual_errors"] = "False"
# overwrite with old settings if needed
if oldconfig is not None:
config["PATH"]["caldb"] = oldconfig["PATH"]["caldb"]

with open(config_filepath, 'w') as f:
config.write(f)

print("corgidrp: Configuration file written to {0}. Please edit if you want things stored in different locations.".format(config_filepath))
# make default calibrations folder if it doesn't exist
default_cal_dir = os.path.join(config_folder, "default_calibs")
if not os.path.exists(default_cal_dir):
os.mkdir(default_cal_dir)

# write config if it doesn't exist
config_filepath = os.path.join(config_folder, "corgidrp.cfg")
if not os.path.exists(config_filepath):
config = configparser.ConfigParser()
config["PATH"] = {}
config["PATH"]["caldb"] = os.path.join(config_folder, "corgidrp_caldb.csv") # location to store caldb
config["PATH"]["default_calibs"] = default_cal_dir
config["DATA"] = {}
config["DATA"]["track_individual_errors"] = "False"
config["WALKER"] = {}
config["WALKER"]["skip_missing_cal_steps"] = "False"
config["WALKER"]["jit_calib_id"] = "False"
# overwrite with old settings if needed
if oldconfig is not None:
config["PATH"]["caldb"] = oldconfig["PATH"]["caldb"]

with open(config_filepath, 'w') as f:
config.write(f)

print("corgidrp: Configuration file written to {0}. Please edit if you want things stored in different locations.".format(config_filepath))
create_config_dir()

_bool_map = {"true" : True, "false" : False}
Expand All @@ -58,6 +61,8 @@ def create_config_dir():
config.read(config_filepath)

## pipeline settings
caldb_filepath = config.get("PATH", "caldb", fallback=None)
default_cal_dir = config.get("PATH", "default_calibs", fallback=None)
track_individual_errors = _bool_map[config.get("DATA", "track_individual_errors").lower()]
caldb_filepath = config.get("PATH", "caldb", fallback=None) # path to calibration db
default_cal_dir = config.get("PATH", "default_calibs", fallback=None) # path to default calibrations directory
track_individual_errors = _bool_map[config.get("DATA", "track_individual_errors", fallback='false').lower()] # save each individual error component separately?
skip_missing_cal_steps = _bool_map[config.get("WALKER", "skip_missing_cal_steps", fallback='false').lower()] # skip steps, instead of crashing, when suitable calibration file cannot be found
jit_calib_id = _bool_map[config.get("WALKER", "jit_calib_id", fallback='false').lower()] # AUTOMATIC calibration files identified right before the execution of a step, rather than when recipe is first generated
Loading

0 comments on commit a2a41b9

Please sign in to comment.