Create simple SQL view assets #2445

bendnorman · 2023-03-23T00:45:00Z

PR Overview

This PR contains:

A SQL view called denorm_plants_utils_ferc1 which performs the simple inner join pudl.output.ferc1.plants_utils_ferc1() does in pandas. PudlTabl.pu_ferc1() has been updated to pull the SQL view from the database instead of calling plants_utils_ferc1().
A helper function calledsql_asset_factory which returns an asset that sends SQL code to the db to be executed. The SQL code is stored in pudl.output.sql.
An example python output table called denorm_utilities_eia860 is just a dagsterified version of pudl.output.eia860.utilities_eia860().
Metadata for denorm_utilities_eia860 and denorm_plants_utils_ferc1.
A new parameter for SQLiteIOManagers that allow you to exclude certain table schemas for being created in the database. I made this change because we want the IO manager to throw an error if we haven’t created the resource metadata for a view but we also don’t want it to create a table schema in the db for the view.
A notebook that documents the output table conversion process.

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

review-notebook-app · 2023-03-23T00:45:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

src/pudl/output/ferc1.py

bendnorman · 2023-03-23T00:48:10Z

src/pudl/output/helpers.py

+from dagster import AssetsDefinition, asset
+
+
+def sql_asset_factory(


There might be a better location for this function.

This does seem like a weird place for it. How about pudl.output.sql or pudl.output.sql_views?

Do we want to think about how the assets should ultimately be organized and start populating that hierarchy with the new assets that we're creating? Something based on Dagster's recommended project layout? What would the sub-packages under pudl.assets be?

It seems like the layout becomes much more flexible once we're using assets and referring to them in the global asset namespace, rather than calling functions and passing all the dataframes around in function calls.

I think assets in the same asset group should live in the same module so that we can call load_assets_from_modules([module], group_name="group_name").

With our current project structure we could either have one large output table asset group or asset groups for each data source: "denorm_eia860", "denorm_ferc1". At this point in the data processing, it might not make sense to be grouping by datasource because the data has been harvested and we are combining tables from different datasources.

@zaneselvans how would you categorize our output table groups?

We want the IO Manager to throw an error if we haven't created resource meatdata for a view but we also don't want a table schema created for the view in the database.

codecov · 2023-03-24T01:24:06Z

Codecov Report

Patch coverage: 97.2% and no project coverage change.

Comparison is base (060b72d) 87.2% compared to head (2364040) 87.2%.

Additional details and impacted files

@@          Coverage Diff          @@
##             dev   #2445   +/-   ##
=====================================
  Coverage   87.2%   87.2%           
=====================================
  Files         81      84    +3     
  Lines       9511    9537   +26     
=====================================
+ Hits        8302    8325   +23     
- Misses      1209    1212    +3

Impacted Files	Coverage Δ
src/pudl/metadata/resources/eia.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/eia860.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/epacems.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/ferc1.py	`100.0% <ø> (ø)`
src/pudl/io_managers.py	`90.0% <50.0%> (-0.5%)`	⬇️
src/pudl/metadata/classes.py	`86.3% <100.0%> (ø)`
src/pudl/output/denorm_eia.py	`100.0% <100.0%> (ø)`
src/pudl/output/denorm_ferc1.py	`100.0% <100.0%> (ø)`
src/pudl/output/eia860.py	`72.9% <100.0%> (+0.1%)`	⬆️
src/pudl/output/ferc1.py	`98.6% <100.0%> (+<0.1%)`	⬆️
... and 2 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

bendnorman

@zaneselvans I still need to make some changes but the structure is there and we have an output view and table in the database. Would appreciate some feedback on the questions I flagged.

src/pudl/etl/denormalized_assets.py

src/pudl/io_managers.py

src/pudl/metadata/resources/ferc1.py

src/pudl/output/ferc1.py

src/pudl/output/pudltabl.py

src/pudl/output/ferc1.py

src/pudl/output/eia860.py

…flesh out output table conversion notebook

zaneselvans

Lots of comments, mostly about what feels like some proliferating brittle complexity and wondering how we are going to organize the new structures we're creating so that we can keep track of them easily.

src/pudl/etl/__init__.py

src/pudl/io_managers.py

src/pudl/metadata/resources/eia860.py

src/pudl/output/sql/denorm_plants_utils_ferc1.sql

src/pudl/output/helpers.py

src/pudl/output/pudltabl.py

test/unit/io_managers_test.py

src/pudl/metadata/classes.py

src/pudl/output/sql/denorm_plants_utils_ferc1.sql

src/pudl/output/denorm_ferc.py

src/pudl/output/__init__.py

bendnorman · 2023-04-04T22:24:44Z

Assuming we're going to punt the decision on how to organize the data dictionaries I think this is ready to go.

zaneselvans

There are only two hard problems in computer science: data types, naming things, and off-by-one errors.

Mainly:

with the table names should we standardize on "utilities" rather than somtimes using "utils" (even though we're preserving "utils" in the PudlTabl methods for backwards consistency).
Should we / can we use enforce_dtypes() when reading tables out of the DB for use in building other tables to avoid later dtype shenanigans? If the table is in the DB then it should have metadata defined for it, right?

src/pudl/metadata/resources/ferc1.py

src/pudl/output/pudltabl.py

src/pudl/output/sql/denorm_plants_utils_ferc1.sql

zaneselvans · 2023-04-06T01:04:42Z

src/pudl/output/eia860.py

+    logger.warning(
+        "pudl.output.eia860.utilities_eia860() will be deprecated in a future version of PUDL."
+        " In the future, call the PudlTabl.utils_eia860() method or pull the denorm_utilities_eia table"
+        "directly from the pudl.sqlite database."
+    )


We don't need to fix this to merge, but I think we should probably avoid directing folks to use the PudlTabl outputs since we intend to deprecate that too -- if we're giving them directions for the future, it should probably be to use the DB tables directly.

Create sql view factory and plants_utils_ferc1 view

08e1f9f

bendnorman marked this pull request as draft March 23, 2023 00:45

bendnorman requested a review from zaneselvans March 23, 2023 00:45

bendnorman commented Mar 23, 2023

View reviewed changes

bendnorman linked an issue Mar 23, 2023 that may be closed by this pull request

Create data structure for dagster SQL views #2264

Closed

bendnorman added 2 commits March 23, 2023 13:52

Add param to SQLiteIOManager to exclude tables from schema creation

024a247

We want the IO Manager to throw an error if we haven't created resource meatdata for a view but we also don't want a table schema created for the view in the database.

Add example python denorm asset

22a8a25

bendnorman commented Mar 24, 2023

View reviewed changes

bendnorman marked this pull request as ready for review March 24, 2023 19:26

bendnorman marked this pull request as draft March 24, 2023 19:27

Add denorm_utilities_eia860 to PudlTabl, add sql_asset_factory test, …

2b75bda

…flesh out output table conversion notebook

bendnorman requested a review from jdangerx March 24, 2023 21:55

zaneselvans reviewed Mar 27, 2023

View reviewed changes

Base automatically changed from dagster-eia861 to dev March 27, 2023 19:14

This was referenced Mar 28, 2023

Use enforce_schema() and read-chunking in PudlSQLiteIOManager. #2459

Merged

Create PudlSQLiteIOManager to accept a Package object #2466

Merged

bendnorman added 2 commits March 30, 2023 09:55

Merge dev into init-sql-views

e255b18

Merge PudlSQLiteIOManager enforce_schema changes into init-sql-views

425c4e3

bendnorman commented Apr 1, 2023

View reviewed changes

src/pudl/metadata/classes.py Outdated Show resolved Hide resolved

bendnorman commented Apr 1, 2023

View reviewed changes

src/pudl/output/sql/denorm_plants_utils_ferc1.sql Outdated Show resolved Hide resolved

Include sql files in MANIFEST and some other clean up

5d7c95b

zaneselvans added this to the 2023Q2 milestone Apr 4, 2023

bendnorman added 2 commits April 4, 2023 08:43

Create new modules for denorm assets

b27a95a

Drop 860 from denorm asset name

4749611

zaneselvans reviewed Apr 4, 2023

View reviewed changes

src/pudl/output/denorm_ferc.py Outdated Show resolved Hide resolved

zaneselvans reviewed Apr 4, 2023

View reviewed changes

src/pudl/output/__init__.py Outdated Show resolved Hide resolved

Rename ferc to ferc1

1c02e32

Merge dev into init-sql-views

dd46383

zaneselvans requested changes Apr 4, 2023

View reviewed changes

bendnorman marked this pull request as ready for review April 5, 2023 21:50

Make table name and include_in_database attribute more verbose

bea1e62

zaneselvans reviewed Apr 6, 2023

View reviewed changes

zaneselvans self-requested a review April 6, 2023 01:25

zaneselvans approved these changes Apr 6, 2023

View reviewed changes

Update denorm_plants_utilities_ferc1.sql file name

2364040

bendnorman merged commit c8782d5 into dev Apr 6, 2023

bendnorman deleted the init-sql-views branch April 6, 2023 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create simple SQL view assets #2445

Create simple SQL view assets #2445

bendnorman commented Mar 23, 2023 •

edited

Loading

Tasks

review-notebook-app bot commented Mar 23, 2023

bendnorman Mar 23, 2023

zaneselvans Mar 27, 2023

bendnorman Mar 28, 2023

codecov bot commented Mar 24, 2023 •

edited

Loading

bendnorman left a comment

zaneselvans left a comment

bendnorman commented Apr 4, 2023

zaneselvans left a comment

zaneselvans Apr 6, 2023

		from dagster import AssetsDefinition, asset


		def sql_asset_factory(

Create simple SQL view assets #2445

Create simple SQL view assets #2445

Conversation

bendnorman commented Mar 23, 2023 • edited Loading

PR Overview

Tasks

PR Checklist

review-notebook-app bot commented Mar 23, 2023

bendnorman Mar 23, 2023

Choose a reason for hiding this comment

zaneselvans Mar 27, 2023

Choose a reason for hiding this comment

bendnorman Mar 28, 2023

Choose a reason for hiding this comment

codecov bot commented Mar 24, 2023 • edited Loading

Codecov Report

bendnorman left a comment

Choose a reason for hiding this comment

zaneselvans left a comment

Choose a reason for hiding this comment

bendnorman commented Apr 4, 2023

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans Apr 6, 2023

Choose a reason for hiding this comment

bendnorman commented Mar 23, 2023 •

edited

Loading

codecov bot commented Mar 24, 2023 •

edited

Loading