-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add records correcting FERC 1 calculations that are off #2620
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion for fillna-ing & a nit/musing about a slight simplification. but overall this looks great
It seemed wrong to have any value of record_id in the correction records since they have no source record, so I set it to NA. Are there any downstream consequences for this?
I'm nervous about downstream PK/merge problems. even though we don't typically use the record_id
as a pk or a post-transform merge key so maybe it'll be fine 🤷🏻 . I'm personally not opposed to the idea of using the record_id
of the calculated value because this is truly the source.
It seemed wrong to say that the correction records had xbrl_row_type value of reported_value or calculated_value so I set them to correction and updated the ENUM constraint on that field.
this seems like a good change.
What set of calculations we want to create correction records for? Right now they're only being generated for the ones that fail the np.isclose() test (and end up in off_df), but we say "match exactly" in some places, and there are lots of records which are off by just one dollar -- none of which will get corrections in the current setup, which seems fine, even though there will technically still be some difference between the calculated and reported values.
I am of two minds about this. I like the way you did it. But also having a correction be applied across the board no matter what seems reasonable and maybe more informative to users (i.e. getting the $1 or mostly $0 corrections).
Do we want to change the name of check_table_calculations to correct_table_calculations or something else that indicates we're actually altering the database tables?
this might be a good idea. its a lil weird bc we are indeed mostly checking but its a good edit.
If we correct these apparent rounding errors, there will be a lot of corrections, and they'll be correcting values that are off by only 1 part in a million. But they won't always show up, because there will be many cases in which nothing was off by enough to trip up Unless we store these dollar values as decimal numbers (rather than floating point -- decimal numbers are just an integer number of cents with a baked-in assumption that you have 2 sigfigs after the decimal point) the only reasonable way to compare them is with something like
I can see how this isn't crazy, but it could also end up creating additional tables with non-unique Should I go ahead and rename And do we truly always want corrections to be added? |
using np.arentclose() as the threshold for whether to make a correction sounds grand to me
Sure!
If we end up not wanting to we can always add another param into |
I renamed check_table_calculations / CheckTableCalculations to reconcile_table_calculations / ReconcileTableCalculations because we're not just checking the values now -- we're altering the data in many cases, to include correction records, and accounting reconciliation has a more active "make these things match" connotation. Also now using `fillna()` when generating the correction records, so that if we're correcting a calculated value that's NA we actually get a real correction (we're treating NA values as 0.0 in the aggregations already so this is internally self-consistent).
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## xbrl_meta_reshape #2620 +/- ##
=================================================
Coverage 87.0% 87.1%
=================================================
Files 84 84
Lines 9861 9878 +17
=================================================
+ Hits 8588 8605 +17
Misses 1273 1273
☔ View full report in Codecov by Sentry. |
PR Overview
{xbrl_factoid}_correction
component to all calculations inprocess_xbrl_metadata()
method of the FERC 1 table transformers.check_table_calculations
method that calculates the difference between the reported totals and our calculated totals and adds corrective records so that these totals match.Questions:
record_id
in the correction records since they have no source record, so I set it to NA. Are there any downstream consequences for this?xbrl_row_type
value ofreported_value
orcalculated_value
so I set them tocorrection
and updated the ENUM constraint on that field.np.isclose()
test (and end up inoff_df
), but we say "match exactly" in some places, and there are lots of records which are off by just one dollar -- none of which will get corrections in the current setup, which seems fine, even though there will technically still be some difference between the calculated and reported values.check_table_calculations
tocorrect_table_calculations
or something else that indicates we're actually altering the database tables?PR Checklist
dev
).