-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change SEC 10-K table schemas to fix FK errors and use quarterly naming. #4046
Conversation
@zschira can you provide more extensive table descriptions for these new guys? |
It looks like the offending
|
bad_utility_ids = [ | ||
3579, # Cirro Group, Inc. in Texas | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this is a rogue utility_id_eia
that only exists in the EIA-861 data, and was the only example of such a utility that ended up matching to an SEC company. It should go away if we remove the experimental EIA-861 ID harvesting upstream, as @katie-lamb said she intends to.
"core_sec10k__quarterly_filings", | ||
"core_sec10k__quarterly_exhibit_21_company_ownership", | ||
"core_sec10k__quarterly_company_information", | ||
"out_sec10k__parents_and_subsidiaries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exclude these because they don't have the same (annual) time frequency as the entity table.
def core_sec10k__company_information() -> pd.DataFrame: | ||
def core_sec10k__quarterly_company_information() -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add time frequency prefix.
"core_sec10k__filings": { | ||
"core_sec10k__quarterly_filings": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename these to include time frequency prefix.
Branch build seems to have succeeded last night! (with the exception of the Zenodo Sandbox flakeout, which is unrelated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks for stepping in with the foreign key fixes! I wonder if we should just merge @katie-lamb's metadata into this branch once we finish review, then merge everything into main? I guess it doesn't make too much difference if we get both PR's done today
I think we might have more work to do on the documentation, and it'd be good to get the nightly build ETL passing with the quarterly updates in the works, so I say we merge this one and work on the docs / metadata independently. |
Overview
core_eia860__scd_utilities
table (annual).utility_id_eia
.What did you change?
utility_id_eia
Documentation
Tasks
Testing
sec10k
tables locally and ranpudl_check_fks
and got the same failure as the nightly builds.pudl_check_fks
again and got just 3 records violating the constraint.To-do list