Improve the INPUT workflow #974

clizbe · 2024-12-17T08:53:21Z

The input workflow (raw to model) needs improvements and consolidation. This will take many iterations, but an initial effort should improve the user experience in the short term as our community grows.
This epic focuses mainly on the user-facing features and workflow structure, not what's under the hood (not all TIO should be listed here).

Workflow Overview

"Files"

Might not necessarily be a file (could be in DuckDB), but something you can imagine as a file.

Create a file that explicitly shows what features the user wants to enable
Define & document the User Format
Create template of the User Format (possibly a repo, but I think through a DuckDB function)
- Is this the best way? How much can we communicate with the templates other than available columns?
- Or maybe a template repo?
Decide & create how defaults will exist - File or code? I suggest code (or repo file) that can be overwritten by user file.
- Implement data validation & default assumptions #461
- Consider the defaults file Diego has drafted here in function get_default_values:
  https://github.com/TulipaEnergy/Tulipa-OBZ-CaseStudy/blob/main/functions.jl

Data Manipulation

Help the user process data from raw into User Format.

UserFormat to ModelFormat

This should be (semi) automatic.

Create function(s) to convert UF tables into MF tables
- Once again, consider the work in the OBZ repo
(Probably) Create a wrapper function to convert all tables at once

Other Important Stuff

Where and how should this happen? Once decided, move to (or create) that section.

Scenario creation...
- The tools for manipulating the data into scenarios are probably similar to Raw -> UserFormat, but can we create a structure/method so the user can more easily create and track multiple scenarios in one session?
Define as input the resolution input data in the rep_periods_data from somewhere in the pipeline #686
Programmatically generate partition data for editing TulipaIO.jl#29

I've been trying to imagine how an analyst would start from scratch.
Below is a pseudocode of my imagined workflow, using TIO to generate templates that include the columns the user needs.
We could also consider a template repo, but then I think the templates would be set and not dynamic to the user needs.

Not sure if this would all become a nightmare with versioning...

# Example Workflow (with imaginary parts)

using TulipaIO: TIO
using DuckDB: DDB
using TulipaEnergyModel: TEM

## Create Templates
# - I think it's necessary to first start something in DuckDB?
connection = DBInterface.connect(DuckDB.DB)
# - This function would create tables in DuckDB according to the choices in the analysis_features 'file'
TIO.create_template_tables("analysis_features.csv", connection; table = all)
# - But these tables will be empty except the selection of column names, 
#   so maybe they also need some more input like assets or something

# - This function would be the same as create_template_tables but works on an existing table to add/remove columns
#   Theoretically allowing the analyst to adjust their analysis features on the fly 
TIO.change_feature_columns("analysis_features_v2.csv", connection)

## Import raw data

## Manipulate data into User Format
# - Function to map a column of raw data onto the corresponding UserFormat column 
#   (Maybe this could enable more automatic mapping scripts...)
TIO.map_column(col_raw, col_UF; key_raw, key_UF)
# - Function that fills in defaults - either with auto defaults or specified by user 
TIO.fill_defaults(; defaults_file)
# - Other stuff

## Scenarios?
# - Some way of setting Base scenario and creating Alternatives?
# - For now it seems this is entirely on the user to run the workflow for each scenario...

## Preprocessing?
# - TulipaClustering
# - Time partitions? Created somehow in templates?

## Convert to Model Format
TIO.userformat_to_modelformat(connection)

## Run Model
TEM.run_scenario(connection; optimizer, params...)

This idea is taking shape for me, but I need feedback in case it's the wrong direction.
Let's discuss in person in the new year!

clizbe · 2024-12-17T17:48:37Z

If we built a template repo, it could have a few basic files and a Pluto workflow that walks them through...

Fork this repo
Change the analysis_features and defaults files as necessary
Open the Pluto notebook analysis template and follow the steps

And then maybe the Pluto notebook includes the template workflow above with instructions?
But I dunno if that's actually useful / maintainable.

clizbe added the epic Epic issues (collection of smaller tasks towards a goal) label Dec 17, 2024

clizbe self-assigned this Dec 17, 2024

clizbe mentioned this issue Dec 17, 2024

Improve the OUTPUT workflow #975

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the INPUT workflow #974

Improve the INPUT workflow #974

clizbe commented Dec 17, 2024 •

edited

Loading

clizbe commented Dec 17, 2024

clizbe commented Dec 17, 2024 •

edited

Loading

Improve the INPUT workflow #974

Improve the INPUT workflow #974

Comments

clizbe commented Dec 17, 2024 • edited Loading

Workflow Overview

"Files"

Data Manipulation

UserFormat to ModelFormat

Other Important Stuff

Related

clizbe commented Dec 17, 2024

clizbe commented Dec 17, 2024 • edited Loading

clizbe commented Dec 17, 2024 •

edited

Loading

clizbe commented Dec 17, 2024 •

edited

Loading