Add records correcting FERC 1 calculations that are off #2620

zaneselvans · 2023-06-02T00:50:02Z

PR Overview

Fix a bad reference to old XBRL Metadata asset names in our FERC 1 ETL debugging notebook.
Add a generic {xbrl_factoid}_correction component to all calculations in process_xbrl_metadata() method of the FERC 1 table transformers.
Add a step to the check_table_calculations method that calculates the difference between the reported totals and our calculated totals and adds corrective records so that these totals match.

Questions:

It seemed wrong to have any value of record_id in the correction records since they have no source record, so I set it to NA. Are there any downstream consequences for this?
It seemed wrong to say that the correction records had xbrl_row_type value of reported_value or calculated_value so I set them to correction and updated the ENUM constraint on that field.
What set of calculations we want to create correction records for? Right now they're only being generated for the ones that fail the np.isclose() test (and end up in off_df), but we say "match exactly" in some places, and there are lots of records which are off by just one dollar -- none of which will get corrections in the current setup, which seems fine, even though there will technically still be some difference between the calculated and reported values.
Do we want to change the name of check_table_calculations to correct_table_calculations or something else that indicates we're actually altering the database tables?

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

review-notebook-app · 2023-06-02T00:50:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

cmgosnell

one suggestion for fillna-ing & a nit/musing about a slight simplification. but overall this looks great

It seemed wrong to have any value of record_id in the correction records since they have no source record, so I set it to NA. Are there any downstream consequences for this?

I'm nervous about downstream PK/merge problems. even though we don't typically use the record_id as a pk or a post-transform merge key so maybe it'll be fine 🤷🏻 . I'm personally not opposed to the idea of using the record_id of the calculated value because this is truly the source.

It seemed wrong to say that the correction records had xbrl_row_type value of reported_value or calculated_value so I set them to correction and updated the ENUM constraint on that field.

this seems like a good change.

What set of calculations we want to create correction records for? Right now they're only being generated for the ones that fail the np.isclose() test (and end up in off_df), but we say "match exactly" in some places, and there are lots of records which are off by just one dollar -- none of which will get corrections in the current setup, which seems fine, even though there will technically still be some difference between the calculated and reported values.

I am of two minds about this. I like the way you did it. But also having a correction be applied across the board no matter what seems reasonable and maybe more informative to users (i.e. getting the $1 or mostly $0 corrections).

Do we want to change the name of check_table_calculations to correct_table_calculations or something else that indicates we're actually altering the database tables?

this might be a good idea. its a lil weird bc we are indeed mostly checking but its a good edit.

src/pudl/transform/ferc1.py

zaneselvans · 2023-06-02T16:12:58Z

I am of two minds about this. I like the way you did it. But also having a correction be applied across the board no matter what seems reasonable and maybe more informative to users (i.e. getting the $1 or mostly $0 corrections).

If we correct these apparent rounding errors, there will be a lot of corrections, and they'll be correcting values that are off by only 1 part in a million. But they won't always show up, because there will be many cases in which nothing was off by enough to trip up np.isclose() -- and in those cases no corrections of any kind would be generated.

Unless we store these dollar values as decimal numbers (rather than floating point -- decimal numbers are just an integer number of cents with a baked-in assumption that you have 2 sigfigs after the decimal point) the only reasonable way to compare them is with something like np.isclose() and I think correcting only those values that np.arentclose() will be clearer -- they'll be real corrections of real reporting issues, rather than a ton of fixes most of which aren't important.

I'm personally not opposed to the idea of using the record_id of the calculated value because this is truly the source.

I can see how this isn't crazy, but it could also end up creating additional tables with non-unique record_id values for reasons that have nothing to do with reshaping the data. It doesn't appear to right now though. I really want to change this to source_record_id across the board so folks don't ever try and use it as a unique ID for the tables you find it in but that's another story. I just don't want anyone to think that these correction values come from any original data source, which is kinda what having a record_id indicates to me.

Should I go ahead and rename check_table_calculations to something that indicates both checking and correction is happening?

And do we truly always want corrections to be added?

cmgosnell · 2023-06-02T16:17:09Z

using np.arentclose() as the threshold for whether to make a correction sounds grand to me

Should I go ahead and rename check_table_calculations to something that indicates both checking and correction is happening?

Sure!

And do we truly always want corrections to be added?

If we end up not wanting to we can always add another param into CheckTableCalculations

I renamed check_table_calculations / CheckTableCalculations to reconcile_table_calculations / ReconcileTableCalculations because we're not just checking the values now -- we're altering the data in many cases, to include correction records, and accounting reconciliation has a more active "make these things match" connotation. Also now using `fillna()` when generating the correction records, so that if we're correcting a calculated value that's NA we actually get a real correction (we're treating NA values as 0.0 in the aggregations already so this is internally self-consistent).

src/pudl/transform/ferc1.py

codecov · 2023-06-02T21:25:18Z

Codecov Report

Patch coverage: 94.4% and no project coverage change.

Comparison is base (fe4fc9e) 87.0% compared to head (274acad) 87.1%.

Additional details and impacted files

@@                Coverage Diff                @@
##           xbrl_meta_reshape   #2620   +/-   ##
=================================================
  Coverage               87.0%   87.1%           
=================================================
  Files                     84      84           
  Lines                   9861    9878   +17     
=================================================
+ Hits                    8588    8605   +17     
  Misses                  1273    1273

Impacted Files	Coverage Δ
src/pudl/metadata/fields.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/ferc1.py	`100.0% <ø> (ø)`
src/pudl/transform/params/ferc1.py	`100.0% <ø> (ø)`
src/pudl/transform/ferc1.py	`96.6% <94.4%> (-0.2%)`	⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

zaneselvans added 5 commits June 1, 2023 14:57

Fix references to XBRL metadata assets in FERC 1 notebook

7885937

Merge branch 'check_calcs_in_transformers' into calc-correction-records

0edf88e

Apply json.dumps() directly.

456605d

Add records that correct overall calculation results in FERC 1.

e9f0203

Use off_df as source for correction records.

49bc1c7

zaneselvans requested a review from cmgosnell June 2, 2023 00:50

zaneselvans added this to the 2023 Spring milestone Jun 2, 2023

zaneselvans linked an issue Jun 2, 2023 that may be closed by this pull request

Add correction/balancing account line items for any calculated field that does not compute to the total in the FERC1 calculated fields. #2597

Closed

cmgosnell requested changes Jun 2, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

Base automatically changed from check_calcs_in_transformers to xbrl_meta_reshape June 2, 2023 13:34

Merge branch 'xbrl_meta_reshape' into calc-correction-records

153b7e0

zaneselvans self-assigned this Jun 2, 2023

Expand docstring for reconcile_table_calculations()

d09735c

cmgosnell reviewed Jun 2, 2023

View reviewed changes

src/pudl/transform/ferc1.py Show resolved Hide resolved

Use natural plant_in_service_ferc1 PK

274acad

cmgosnell approved these changes Jun 2, 2023

View reviewed changes

zaneselvans merged commit f96ee5b into xbrl_meta_reshape Jun 2, 2023

zaneselvans deleted the calc-correction-records branch June 5, 2023 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add records correcting FERC 1 calculations that are off #2620

Add records correcting FERC 1 calculations that are off #2620

zaneselvans commented Jun 2, 2023

review-notebook-app bot commented Jun 2, 2023

cmgosnell left a comment •

edited

Loading

zaneselvans commented Jun 2, 2023

cmgosnell commented Jun 2, 2023

codecov bot commented Jun 2, 2023

Add records correcting FERC 1 calculations that are off #2620

Add records correcting FERC 1 calculations that are off #2620

Conversation

zaneselvans commented Jun 2, 2023

PR Overview

Questions:

PR Checklist

review-notebook-app bot commented Jun 2, 2023

cmgosnell left a comment • edited Loading

Choose a reason for hiding this comment

zaneselvans commented Jun 2, 2023

cmgosnell commented Jun 2, 2023

codecov bot commented Jun 2, 2023

Codecov Report

cmgosnell left a comment •

edited

Loading