Skip to content

Commit

Permalink
[Issue 3867] Cleanup logging during extract and add failure rate logg…
Browse files Browse the repository at this point in the history
…ing (#4033)

## Summary
Fixes #3867

### Time to review: __2 mins__

## Changes proposed
> What was added, updated, or removed in this PR.

Removed noise from analytics logging that occurs during the extract
phase of the ETL workflow. This PR makes logs statements more concise,
removing the noise, and moving the details of each exception to debug
level logging.

This PR also add calculation and logging of validation failure rate that
occurs when marshaling data from GraphQL response payload to internal
data structures.

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

In the extract phase of the ETL workflow, some rows fail validation, and
the logs are much too noisy about that. Also, there is no clear
indication of the ratio of records that fail validation.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

**Logs BEFORE**
<img width="997" alt="logs-before"
src="https://github.com/user-attachments/assets/54413c2f-6e4a-4dc9-9824-2df5af6742ef"
/>

**Logs AFTER**
<img width="963" alt="logs-after"
src="https://github.com/user-attachments/assets/abe6fd05-544f-48dd-a8a4-570c5ea5e14d"
/>

**Failure Rate Log Message**
<img width="621" alt="failure-rate"
src="https://github.com/user-attachments/assets/9eb40508-56c0-43aa-879e-7a570bf50d8e"
/>
  • Loading branch information
DavidDudas-Intuitial authored Feb 27, 2025
1 parent d756be3 commit 5ed0231
Showing 1 changed file with 16 additions and 6 deletions.
22 changes: 16 additions & 6 deletions analytics/src/analytics/integrations/github/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,17 @@ def transform_project_data(
) -> list[dict]:
"""Pluck and reformat relevant fields for each item in the raw data."""
transformed_data = []
count = 0
fail = 0

for i, item in enumerate(raw_data):
count += 1
try:
# Filter out invalid content from boards local user may not have permission to
if item.get("content") is None:
logger.info("Row %d is missing the 'content' key, skipping.", i)
message = f"project item {i} has no content; skipping"
logger.info(message)
logger.debug(item)
continue

# Validate and parse the raw item
Expand Down Expand Up @@ -73,13 +78,18 @@ def transform_project_data(
transformed_data.append(transformed)

except ValidationError as err:
logger.info(
"**** Skipping project row %d, skipping. **** Error: %s",
i,
err,
)
fail += 1
message = f"project item {i} cannot be validated; skipping"
logger.info(message)
logger.debug(err)
logger.debug(item)
continue

if count > 0:
failure_rate = round(100 * fail / count, 3)
message = f"validation failure rate: {failure_rate} %"
logger.info(message)

return transformed_data


Expand Down

0 comments on commit 5ed0231

Please sign in to comment.