[Issue 3867] Cleanup logging during extract and add failure rate logg…

…ing (#4033) ## Summary Fixes #3867 ### Time to review: __2 mins__ ## Changes proposed > What was added, updated, or removed in this PR. Removed noise from analytics logging that occurs during the extract phase of the ETL workflow. This PR makes logs statements more concise, removing the noise, and moving the details of each exception to debug level logging. This PR also add calculation and logging of validation failure rate that occurs when marshaling data from GraphQL response payload to internal data structures. ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. In the extract phase of the ETL workflow, some rows fail validation, and the logs are much too noisy about that. Also, there is no clear indication of the ratio of records that fail validation. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected. **Logs BEFORE** <img width="997" alt="logs-before" src="https://github.com/user-attachments/assets/54413c2f-6e4a-4dc9-9824-2df5af6742ef" /> **Logs AFTER** <img width="963" alt="logs-after" src="https://github.com/user-attachments/assets/abe6fd05-544f-48dd-a8a4-570c5ea5e14d" /> **Failure Rate Log Message** <img width="621" alt="failure-rate" src="https://github.com/user-attachments/assets/9eb40508-56c0-43aa-879e-7a570bf50d8e" />
HHS · Feb 27, 2025 · 5ed0231 · 5ed0231
1 parent d756be3
commit 5ed0231
Showing 1 changed file with 16 additions and 6 deletions.
diff --git a/analytics/src/analytics/integrations/github/main.py b/analytics/src/analytics/integrations/github/main.py
@@ -27,12 +27,17 @@ def transform_project_data(
 ) -> list[dict]:
     """Pluck and reformat relevant fields for each item in the raw data."""
     transformed_data = []
+    count = 0
+    fail = 0
 
     for i, item in enumerate(raw_data):
+        count += 1
         try:
             # Filter out invalid content from boards local user may not have permission to
             if item.get("content") is None:
-                logger.info("Row %d is missing the 'content' key, skipping.", i)
+                message = f"project item {i} has no content; skipping"
+                logger.info(message)
+                logger.debug(item)
                 continue
 
             # Validate and parse the raw item
@@ -73,13 +78,18 @@ def transform_project_data(
             transformed_data.append(transformed)
 
         except ValidationError as err:
-            logger.info(
-                "**** Skipping project row %d, skipping. ****  Error: %s",
-                i,
-                err,
-            )
+            fail += 1
+            message = f"project item {i} cannot be validated; skipping"
+            logger.info(message)
+            logger.debug(err)
+            logger.debug(item)
             continue
 
+    if count > 0:
+        failure_rate = round(100 * fail / count, 3)
+        message = f"validation failure rate: {failure_rate} %"
+        logger.info(message)
+
     return transformed_data