Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing delivery utilities in sales_eia861 #2636

Closed
christiantfong opened this issue Jun 6, 2023 · 4 comments · Fixed by #2637
Closed

Missing delivery utilities in sales_eia861 #2636

christiantfong opened this issue Jun 6, 2023 · 4 comments · Fixed by #2637
Assignees
Labels
bug Things that are just plain broken. eia861 Anything having to do with EIA Form 861
Milestone

Comments

@christiantfong
Copy link

Describe the bug

Hi all! I've been pulling data from the sales to ultimate customer sheet of 861, and I realized that, at least for the 2021 data, there is only a single delivery utility (service_type == delivery) in the data, which is Rockland Electric Co (utility_id_eia == 16213). Whereas in the actual EIA 861 form, there are over 70 utilities when I filter for delivery under service type. Is this data missing, or should I be pulling from another data source?

Bug Severity

How badly is this bug affecting you?

  • High: This bug is preventing me from using PUDL.
    Because I need the delivery utilities in 861, I am required to manually download the 861 excel forms and use that for now.

To Reproduce

Steps to reproduce the behavior -- ideally including a code snippet that causes the error to appear.
(This is in R, but I'm basically just pulling the sales_eia861 sheet, filtering for 2021, and then filtering for delivery service_type utilities)

file = "pudl.sqlite"
pudl <- dbConnect(SQLite(), file)

sales_ult_cust_861 <- dbReadTable(pudl, "sales_eia861") %>%
  filter(year(report_date) == 2021)

delivery_utilities <- sales_ult_cust_861 %>% 
  filter(service_type == "delivery")
  • Is the bug related to the software / database? If so, please attach the settings.yml file you're using to specify which data to load, and make a note of where in the ETL process the error is happening.
    N/A

  • Have you found an error or inconsistency in the data that PUDL brings together? If so, what is the data source, year, plant_id, etc
    I haven't thoroughly checked all years of the data, but it seems like a few delivery utilities appear in other years, but not the full amount of the data

-- how can we find the data you're looking at, and what is the nature of the error or inconsistency.
See above

  • Does the problem happen when you're starting up the database (importing data) or does it happen later when you're trying to use the database?
    This is after the data is imported when I am trying to use it

Expected behavior

A clear and concise description of what you expected to happen, or what you expected the data to look like.
All the utilities to show up in sales_eia861, including the utilities that are "delivery" by "service_type"

Software Environment?

  • Operating System. (e.g. MacOS 14.5, Ubuntu 22.04, Windows Subsystem for Linux v2)
    MacOS 13.3.1

  • Python version and distribution (e.g. Anaconda Python 3.10.6)
    R 4.2.2

  • How did you install PUDL?

Additional context

Add any other context about the problem here.
N/A

@christiantfong christiantfong added the bug Things that are just plain broken. label Jun 6, 2023
@zaneselvans
Copy link
Member

Hi @christiantfong thanks for pointing this out, it definitely looks wrong! I will investigate and let you know what is happening.

@zaneselvans zaneselvans added the eia861 Anything having to do with EIA Form 861 label Jun 6, 2023
@zaneselvans zaneselvans linked a pull request Jun 6, 2023 that will close this issue
8 tasks
@zaneselvans zaneselvans moved this from New to In progress in Catalyst Megaproject Jun 6, 2023
@zaneselvans zaneselvans self-assigned this Jun 6, 2023
@zaneselvans
Copy link
Member

@christiantfong I believe I've fixed the bug in #2637. We were using an incomplete set of columns as the primary key for the table, and so a reshaping operation that depended on knowing the PK columns was dropping some records. I've created an issue to check that something similar isn't happening anywhere else in the offending function: #2638

@zaneselvans zaneselvans added this to the 2023 Spring milestone Jun 6, 2023
@zaneselvans
Copy link
Member

Closing this as fixed by #2637.

@christiantfong assuming the nightly builds succeed tonight, the complete EIA-861 sales data should appear in the fresh pudl.sqlite that can be downloaded from AWS Open Data here tomorrow (Friday) morning.

@e-belfer
Copy link
Member

e-belfer commented Jun 8, 2023

(Just following up to actually close this, but @christiantfong do let us know if you run into any other unexpected troubles with the sales data!)

@e-belfer e-belfer closed this as completed Jun 8, 2023
@github-project-automation github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Jun 8, 2023
zaneselvans added a commit that referenced this issue Jun 8, 2023
The `sales_eia861` and `demand_response_eia861` tables each have a
handful of duplicate primary keys due to NA values in the
`balancing_authority_code_eia` column. Quantify and log the extent of
this problem, and consolidate the data in the duplicated records if they
constitute less than 0.5% of all records in the table.

This check would also have caught the incorrect primary key columns
reported in #2636 and fixed in #2637.

Because there were so few duplicate records, I decided to just
consolidate them all (with a hard limit on the fraction of records that
could be consolidated) rather that requiring that the only duplication
be due to the BA Code column.

Closes #2638
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken. eia861 Anything having to do with EIA Form 861
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants