-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing delivery utilities in sales_eia861 #2636
Comments
Hi @christiantfong thanks for pointing this out, it definitely looks wrong! I will investigate and let you know what is happening. |
@christiantfong I believe I've fixed the bug in #2637. We were using an incomplete set of columns as the primary key for the table, and so a reshaping operation that depended on knowing the PK columns was dropping some records. I've created an issue to check that something similar isn't happening anywhere else in the offending function: #2638 |
Closing this as fixed by #2637. @christiantfong assuming the nightly builds succeed tonight, the complete EIA-861 sales data should appear in the fresh |
(Just following up to actually close this, but @christiantfong do let us know if you run into any other unexpected troubles with the sales data!) |
The `sales_eia861` and `demand_response_eia861` tables each have a handful of duplicate primary keys due to NA values in the `balancing_authority_code_eia` column. Quantify and log the extent of this problem, and consolidate the data in the duplicated records if they constitute less than 0.5% of all records in the table. This check would also have caught the incorrect primary key columns reported in #2636 and fixed in #2637. Because there were so few duplicate records, I decided to just consolidate them all (with a hard limit on the fraction of records that could be consolidated) rather that requiring that the only duplication be due to the BA Code column. Closes #2638
Describe the bug
Hi all! I've been pulling data from the sales to ultimate customer sheet of 861, and I realized that, at least for the 2021 data, there is only a single delivery utility (service_type == delivery) in the data, which is Rockland Electric Co (utility_id_eia == 16213). Whereas in the actual EIA 861 form, there are over 70 utilities when I filter for delivery under service type. Is this data missing, or should I be pulling from another data source?
Bug Severity
How badly is this bug affecting you?
Because I need the delivery utilities in 861, I am required to manually download the 861 excel forms and use that for now.
To Reproduce
Steps to reproduce the behavior -- ideally including a code snippet that causes the error to appear.
(This is in R, but I'm basically just pulling the sales_eia861 sheet, filtering for 2021, and then filtering for delivery service_type utilities)
Is the bug related to the software / database? If so, please attach the
settings.yml
file you're using to specify which data to load, and make a note of where in the ETL process the error is happening.N/A
Have you found an error or inconsistency in the data that PUDL brings together? If so, what is the data source, year, plant_id, etc
I haven't thoroughly checked all years of the data, but it seems like a few delivery utilities appear in other years, but not the full amount of the data
-- how can we find the data you're looking at, and what is the nature of the error or inconsistency.
See above
This is after the data is imported when I am trying to use it
Expected behavior
A clear and concise description of what you expected to happen, or what you expected the data to look like.
All the utilities to show up in sales_eia861, including the utilities that are "delivery" by "service_type"
Software Environment?
Operating System. (e.g. MacOS 14.5, Ubuntu 22.04, Windows Subsystem for Linux v2)
MacOS 13.3.1
Python version and distribution (e.g. Anaconda Python 3.10.6)
R 4.2.2
How did you install PUDL?
git clone
what branch are you using (probablymain
ordev
)pip
,conda
ormamba
what version did you install?I downloaded the SQLite with the following link: https://s3.us-west-2.amazonaws.com/intake.catalyst.coop/dev/pudl.sqlite
Additional context
Add any other context about the problem here.
N/A
The text was updated successfully, but these errors were encountered: