Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the INPUT workflow #974

Open
15 tasks
clizbe opened this issue Dec 17, 2024 · 2 comments
Open
15 tasks

Improve the INPUT workflow #974

clizbe opened this issue Dec 17, 2024 · 2 comments
Assignees
Labels
epic Epic issues (collection of smaller tasks towards a goal)

Comments

@clizbe
Copy link
Member

clizbe commented Dec 17, 2024

The input workflow (raw to model) needs improvements and consolidation. This will take many iterations, but an initial effort should improve the user experience in the short term as our community grows.
This epic focuses mainly on the user-facing features and workflow structure, not what's under the hood (not all TIO should be listed here).

Workflow Overview

"Files"

Might not necessarily be a file (could be in DuckDB), but something you can imagine as a file.

  • Create a file that explicitly shows what features the user wants to enable
  • Define & document the User Format
  • Create template of the User Format (possibly a repo, but I think through a DuckDB function)
    • Is this the best way? How much can we communicate with the templates other than available columns?
    • Or maybe a template repo?
  • Decide & create how defaults will exist - File or code? I suggest code (or repo file) that can be overwritten by user file.

Data Manipulation

Help the user process data from raw into User Format.

UserFormat to ModelFormat

This should be (semi) automatic.

  • Create function(s) to convert UF tables into MF tables
    • Once again, consider the work in the OBZ repo
  • (Probably) Create a wrapper function to convert all tables at once

Other Important Stuff

Where and how should this happen? Once decided, move to (or create) that section.

Related

@clizbe clizbe added the epic Epic issues (collection of smaller tasks towards a goal) label Dec 17, 2024
@clizbe clizbe self-assigned this Dec 17, 2024
@clizbe
Copy link
Member Author

clizbe commented Dec 17, 2024

I've been trying to imagine how an analyst would start from scratch.
Below is a pseudocode of my imagined workflow, using TIO to generate templates that include the columns the user needs.
We could also consider a template repo, but then I think the templates would be set and not dynamic to the user needs.

Not sure if this would all become a nightmare with versioning...

# Example Workflow (with imaginary parts)

using TulipaIO: TIO
using DuckDB: DDB
using TulipaEnergyModel: TEM

## Create Templates
# - I think it's necessary to first start something in DuckDB?
connection = DBInterface.connect(DuckDB.DB)
# - This function would create tables in DuckDB according to the choices in the analysis_features 'file'
TIO.create_template_tables("analysis_features.csv", connection; table = all)
# - But these tables will be empty except the selection of column names, 
#   so maybe they also need some more input like assets or something

# - This function would be the same as create_template_tables but works on an existing table to add/remove columns
#   Theoretically allowing the analyst to adjust their analysis features on the fly 
TIO.change_feature_columns("analysis_features_v2.csv", connection)

## Import raw data

## Manipulate data into User Format
# - Function to map a column of raw data onto the corresponding UserFormat column 
#   (Maybe this could enable more automatic mapping scripts...)
TIO.map_column(col_raw, col_UF; key_raw, key_UF)
# - Function that fills in defaults - either with auto defaults or specified by user 
TIO.fill_defaults(; defaults_file)
# - Other stuff

## Scenarios?
# - Some way of setting Base scenario and creating Alternatives?
# - For now it seems this is entirely on the user to run the workflow for each scenario...

## Preprocessing?
# - TulipaClustering
# - Time partitions? Created somehow in templates?

## Convert to Model Format
TIO.userformat_to_modelformat(connection)

## Run Model
TEM.run_scenario(connection; optimizer, params...)

This idea is taking shape for me, but I need feedback in case it's the wrong direction.
Let's discuss in person in the new year!

@clizbe
Copy link
Member Author

clizbe commented Dec 17, 2024

If we built a template repo, it could have a few basic files and a Pluto workflow that walks them through...

  1. Fork this repo
  2. Change the analysis_features and defaults files as necessary
  3. Open the Pluto notebook analysis template and follow the steps

And then maybe the Pluto notebook includes the template workflow above with instructions?
But I dunno if that's actually useful / maintainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Epic issues (collection of smaller tasks towards a goal)
Projects
None yet
Development

No branches or pull requests

1 participant