Create data structure for dagster SQL views #2264
Labels
dagster
Issues related to our use of the Dagster orchestrator
inframundo
sqlite
Issues related to interacting with sqlite databases
Milestone
We are planning on converting a lot of our output table pandas code to SQL views so the output tables can be persisted in the database. To do this, we'll create dagster assets that depend on upstream assets (normalized tables or other output tables) and execute SQL code. Here is an example:
The
non_argument_deps
argument allows you specify upstream dependencies of the asset without having to actually load the upstream tables as dataframes. We don't need the tables as dataframes because this view asset is just executing a SQL query. It's important to add the upstream dependencies so views aren't created before the component tables are in the database!This asset returns a string that the
pudl_sqlite_io_manager
executes as SQL.Storing hundreds of lines of SQL as block quotes in python is not ideal! We should be saving these view creation statements in .SQL files to take advantage of formatting and syntax highlighting.
We could create a SQL file for each view in a directory in the output sub-package.
Option 1
One option is to use factory function to takes in the view name and the upstream asset names, reads in a view statement from a SQL file and returns the statement for the IO manager to execute.
Option 2
We could also store the asset names and non_argument_deps in a pydantic model called SQLViewAsset. This way we could iterate through a set of SQLViewAsset objects to create the dagster assets. Something like this:
The text was updated successfully, but these errors were encountered: