Merge pull request #69 from UBC-MDS/66-add-functionality-to-read-and-…

…export-the-checklist-in-csv-format feat: Add functionalities to read and export the checklist in CSV format
UBC-MDS · May 16, 2024 · 09cfe1d · 09cfe1d
2 parents aca5013 + 970d8aa
commit 09cfe1d
Show file tree

Hide file tree

Showing 2 changed files with 311 additions and 98 deletions.
diff --git a/checklist/checklist.yaml b/checklist/checklist.yaml
@@ -4,221 +4,277 @@ Title: Checklist for Tests in Machine Learning Projects
 # TODO: To be filled in later...
 Description: Description about the project and its context
 Test Areas:
-  - Topic: General
-    Description: >
+  - ID: "1"
+    Topic: General
+    Description: >-
       The following items describe best practices for all tests to be written.
     Tests:
-      - Title: Write Descriptive Test Names
-        Requirement: >
-          Each test function should have a clear, descriptive name that accurately reflects the test's purpose and the specific functionality or scenario it examines.
-        Explanation: >
+      - ID: "1.1"
+        Title: Write Descriptive Test Names
+        Requirement: >-
+          Each test function should have a clear, descriptive name that
+          accurately reflects the test's purpose and the specific functionality
+          or scenario it examines.
+        Explanation: >-
           If out tests are narrow and sufficiently descriptive, the test name
-          itself may give us enough information to start debugging.  This also
+          itself may give us enough information to start debugging. This also
           helps us to identify what is being tested inside the function.
         References:
           - trenk2014
           - winters2024
 
-      - Title: Keep Tests Focused
-        Requirement: >
-          Each test should focus on a single scenario, using only one set of mock data and testing one specific behavior or outcome to ensure clarity and isolate issues.
-        Explanation: >
+      - ID: "1.2"
+        Title: Keep Tests Focused
+        Requirement: >-
+          Each test should focus on a single scenario, using only one set of
+          mock data and testing one specific behavior or outcome to ensure
+          clarity and isolate issues.
+        Explanation: >-
           If we test multiple scenarios in a single test, it is hard to idenitfy
           exactly what went wrong. Keeping one scenario in a single test helps
           us to isolate problematic scenarios.
         References:
           - yu2018
 
-      - Title: Prefer Narrow Assertions in Unit Tests
-        Requirement: >
-          Assertions within tests should be focused and narrow. Ensure you are only testing relevant behaviors of complex objects and not including unrelated assertions.
-        Explanation: >
+      - ID: "1.3"
+        Title: Prefer Narrow Assertions in Unit Tests
+        Requirement: >-
+          Assertions within tests should be focused and narrow. Ensure you are
+          only testing relevant behaviors of complex objects and not including
+          unrelated assertions.
+        Explanation: >-
           If we have overly wide assertions (such as depending on every field of
           a complex output proto), the test may fail for many unimportant
           reasons. False positives are the opoosite of actionable.
         References:
           - kent2024
 
-      - Title: Keep Cause and Effect Clear
-        Requirement: >
-          Keep any modifications to objects and the corresponding assertions close together in your tests to maintain readability and clearly show the cause-and-effect relationship.
-        Explanation: >
+      - ID: "1.4"
+        Title: Keep Cause and Effect Clear
+        Requirement: >-
+          Keep any modifications to objects and the corresponding assertions
+          close together in your tests to maintain readability and clearly show
+          the cause-and-effect relationship.
+        Explanation: >-
           Refrain from using large global test data structures shared across
           multiple unit tests. This will allow for clear identification of each
           test's setup and the cause and effect.
         References:
           - yu2017
 
-  - Topic: Data Presence
-    Description: >
+  - ID: "2"
+    Topic: Data Presence
+    Description: >-
       The following items describe tests that need to be done for testing the
       presence of data. This area of tests mainly concern whether the reading
       and saving operations are behaving as expected, and any unexpected
       behavior would not be passed silently.
     Tests:
-      - Title: Ensure Data File Loads as Expected
-        Requirement: >
-          Ensure that data-loading functions correctly load files when they exist and match the expected format, handle non-existent files appropriately, and return the expected results.
-        Explanation: >
+      - ID: "2.1"
+        Title: Ensure Data File Loads as Expected
+        Requirement: >-
+          Ensure that data-loading functions correctly load files when they
+          exist and match the expected format, handle non-existent files
+          appropriately, and return the expected results.
+        Explanation: >-
           Reading data is a common scenario encountered in ML projects.  This
           item ensures that the data exists and can be loaded with expected
           format, and gracefully exit when unable to load the data.
         References:
           - msise2023
 
-      - Title: Ensure Saving Data/Figures Function Works as Expected
-        Requirement: >
-          Verify that functions for saving data and figures perform write operations correctly, checking that the operation succeeds and the content matches the expected format.
-        Explanation: >
+      - ID: "2.2"
+        Title: Ensure Saving Data/Figures Function Works as Expected
+        Requirement: >-
+          Verify that functions for saving data and figures perform write
+          operations correctly, checking that the operation succeeds and the
+          content matches the expected format.
+        Explanation: >-
           Writing operations create artifacts at different stages of the
           analysis. Making sure the artifacts are created as expected ensures
           that the artifacts we obtained at the end of the analysis would be
           consistent and reproducible.
         References:
           - msise2023
 
-  - Topic: Data Quality
-    Description: >
+  - ID: "3"
+    Topic: Data Quality
+    Description: >-
       The following items describe tests that need to be done for testing the
       quality of data. This area of tests mainly concern whether the data
       supplied is in the expected format, data containing null values or
       outliers to make sure that the data processing pipeline is robust.
     Tests:
-      - Title: Files Contain Data
-        Requirement: >
-          Ensure all data files are non-empty and contain the necessary data required for further analysis or processing tasks.
-        Explanation: >
+      - ID: "3.1"
+        Title: Files Contain Data
+        Requirement: >-
+          Ensure all data files are non-empty and contain the necessary data
+          required for further analysis or processing tasks.
+        Explanation: >-
           This checklist item is crucial as it confirms the presence of usable
           data within the files. It prevents errors in later stages of the
           project by ensuring data is available from the start.
         References:
           - msise2023
 
-      - Title: Data in the Expected Format
-        Requirement: >
-          Verify that the data to be ingested matches the format expected by processing algorithms (like pd.DataFrame for CSVs or np.array for images) and adheres to the expected schema.
-        Explanation: >
+      - ID: "3.2"
+        Title: Data in the Expected Format
+        Requirement: >-
+          Verify that the data to be ingested matches the format expected by
+          processing algorithms (like pd.DataFrame for CSVs or np.array for
+          images) and adheres to the expected schema.
+        Explanation: >-
           Ensuring that data and images are in the correct format is essential
           for compatibility with processing tools and algorithms, which may not
           handle unexpected formats gracefully.
         References:
           - msise2023
 
-      - Title: Data Does Not Contain Null Values or Outliers
-        Requirement: >
-          Check that data files are free from unexpected null values and identify any outliers that could affect the analysis. Tests should explicitly state if null values are part of expected data.
-        Explanation: >
+      - ID: "3.3"
+        Title: Data Does Not Contain Null Values or Outliers
+        Requirement: >-
+          Check that data files are free from unexpected null values and
+          identify any outliers that could affect the analysis. Tests should
+          explicitly state if null values are part of expected data.
+        Explanation: >-
           Null values can lead to errors or inaccurate computations in many
           data processing applications, while outliers can distort statistical
           analyses and models. As such, these values should be checked when
           before the data is being ingested.
         References:
           - msise2023
 
-  - Topic: Data Ingestion
-    Description: >
+  - ID: "4"
+    Topic: Data Ingestion
+    Description: >-
       The following items describe tests that need to be done for testing if the
       data is ingestion properly.
     Tests:
-      - Title: Cleaning and Transformation Functions Work as Expected
-        Requirement: >
-          Test that a fixed input to a function or model produces the expected output, focusing on one verification per test to ensure predictable behavior.
-        Explanation: >
+      - ID: "4.1"
+        Title: Cleaning and Transformation Functions Work as Expected
+        Requirement: >-
+          Test that a fixed input to a function or model produces the expected
+          output, focusing on one verification per test to ensure predictable
+          behavior.
+        Explanation: >-
           Fixed input and output during the data cleaning and transformation
           routines should be tested so that no unexpected transformation is
           introduced during these steps.
         References:
           - msise2023
 
-  - Topic: Model Fitting
-    Description: >
+  - ID: "5"
+    Topic: Model Fitting
+    Description: >-
       The following items describe tests that need to be done for testing the
       model fitting process. The unit tests written for this section usually
       mock model load and model predictions similarly to mocking file access.
     Tests:
-      - Title: Validate Model Input and Output Compatibility
-        Requirement: >
-          Confirm that the model accepts inputs of the correct shapes and types and produces outputs that meet the expected shapes and types without any errors.
-        Explanation: >
+      - ID: "5.1"
+        Title: Validate Model Input and Output Compatibility
+        Requirement: >-
+          Confirm that the model accepts inputs of the correct shapes and types
+          and produces outputs that meet the expected shapes and types without
+          any errors.
+        Explanation: >-
           Ensuring that inputs and outputs conform to expected specifications
           is critical for the correct functioning of the model in a production
           environment.
         References:
           - msise2023
 
-      - Title: Check Model is Learning During Fit
-        Requirement: >
-          For parametric models, ensure that the model's weights update correctly per training iteration. For non-parametric models, verify that the data fits correctly into the model.
-        Explanation: >
+      - ID: "5.2"
+        Title: Check Model is Learning During Fit
+        Requirement: >-
+          For parametric models, ensure that the model's weights update
+          correctly per training iteration. For non-parametric models, verify
+          that the data fits correctly into the model.
+        Explanation: >-
           Making sure the training process is indeed training the model is
           crucial as model without training is not fitted to any data and the
           performance would suffer.
         References:
           - msise2023
 
-      - Title: Ensure Model Output Shape Aligns with Expectation
-        Requirement: >
-          Ensure the shape of the model's output aligns with the expected structure based on the task, such as matching the number of labels in a classification task.
-        Explanation: >
+      - ID: "5.3"
+        Title: Ensure Model Output Shape Aligns with Expectation
+        Requirement: >-
+          Ensure the shape of the model's output aligns with the expected
+          structure based on the task, such as matching the number of labels in
+          a classification task.
+        Explanation: >-
           Correct output alignment confirms that the model is accurately
           interpreting the input data and making predictions that are sensible
           given the context.
         References:
           - jordan2020
 
-      - Title: Ensure Model Output Aligns with Task Trained
-        Requirement: >
-          Verify that the model's output values are appropriate for its task, such as outputting probabilities that sum to 1 for classification tasks.
-        Explanation: >
+      - ID: "5.4"
+        Title: Ensure Model Output Aligns with Task Trained
+        Requirement: >-
+          Verify that the model's output values are appropriate for its task,
+          such as outputting probabilities that sum to 1 for classification
+          tasks.
+        Explanation: >-
           This ensures that the model's output is interpretable and relevant to
           the task it was trained for.
         References:
           - jordan2020
 
-      - Title: Validate Loss Reduction on Gradient Update
-        Requirement: >
-          If using gradient descent for training, verify that a single gradient step on a batch of data results in a decrease in the model's training loss.
-        Explanation: >
+      - ID: "5.5"
+        Title: Validate Loss Reduction on Gradient Update
+        Requirement: >-
+          If using gradient descent for training, verify that a single gradient
+          step on a batch of data results in a decrease in the model's training
+          loss.
+        Explanation: >-
           A decrease in training loss after a gradient update demonstrates that
           the data is adequate to be fitted into the model.
         References:
           - jordan2020
 
-      - Title: Check for Data Leakage
-        Requirement: >
-          Confirm that there is no leakage of data between training, validation, and testing sets, or across cross-validation folds, to ensure the integrity of the splits.
-        Explanation: >
+      - ID: "5.6"
+        Title: Check for Data Leakage
+        Requirement: >-
+          Confirm that there is no leakage of data between training, validation,
+          and testing sets, or across cross-validation folds, to ensure the
+          integrity of the splits.
+        Explanation: >-
           Data leakage can compromise the model's ability to generalize to
           unseen data, making it crucial to ensure datasets are properly
           segregated.
         References:
           - jordan2020
 
 
-  - Topic: Model Evaluation
-    Description: >
+  - ID: "6"
+    Topic: Model Evaluation
+    Description: >-
       The following items describe tests that need to be done for testing the
       model evaluation process.
     Tests:
-      - Title: Dummy Title
-        Requirement: >
+      - ID: "6.1"
+        Title: Dummy Title
+        Requirement: >-
           This is a dummy item and there is nothing needed to act on.
-        Explanation: >
+        Explanation: >-
           This is a dummy item to show how the checklist items would be stored
           inside this YAML file. It serves no other purpose.
         References:
           - http://128.0.0.1
           - UBC-MDS
 
-  - Topic: Artifact Testing
-    Description: >
+  - ID: "7"
+    Topic: Artifact Testing
+    Description: >-
       The following items describe tests that need to be done for testing any
       artifacts that are created from the project.
     Tests:
-      - Title: Dummy Title
-        Requirement: >
+      - ID: "7.1"
+        Title: Dummy Title
+        Requirement: >-
           This is a dummy item and there is nothing needed to act on.
-        Explanation: >
+        Explanation: >-
           This is a dummy item to show how the checklist items would be stored
           inside this YAML file. It serves no other purpose.
         References: