You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am suggesting a major refactor of the repo, particularly (i) the way params are passed to notebooks and (ii) the ways notebooks can be used independently
The primary purpose of the notebooks in this repo is to be run daily and generate outputs for the public dash.
But I also think each of the ipynb files in this repo should be "viable" as a standalone notebook. Someone should be able to grab one of the ipynb files from this repo, pull it into their e-mission-server (dockerized or otherwise), and run it against their e-mission-server (dockerized or otherwise) without having to change a bunch of parameters in the notebook – perhaps ideally, not change any parameters and only specify a couple environment variables.
In other words, I think the notebooks from here should be able to be used in the way that the notebooks from https://github.com/e-mission/e-mission-eval-private-data can be used.
I think these changes would:
make it simpler for future contributors to understand the codebase
I have now worked on all major components of the e-mission platform, and I have found the public dash to be the least intuitive / steepest learning curve to start working on. Per README, it's intended to be "simple and stupid", but I suspect it has grown more complex over time it was originally conceived to be
I also think there is a lot of good code (scaffolding, plots) here that could be transferrable to other eda/viz projects with e-mission data, but is not organized in such a way that it can be used anywhere except this repo
make it easier to spot-check / test changes to the notebooks locally
we'd be able to open the notebook in VSCode / IDE of choice, set some env variables, and run notebooks locally without having to connect to the Jupyter notebook server
this may become even more relevant/useful if we start adding inline assert statements to all the notebooks as part of a testing strategy
Specific changes I suggest:
Simplify the parameters that notebooks receive and/or change params to environment variables. Currently, the notebooks receive:
Besides year and month, all of these are derived from the dynamic config. So why are we not just passing the entire config? Unpacking it into a bunch of different variables, with different names, makes it less clear what is going on, and makes it diverge from other components of the e-mission platform.
In fact, the notebooks should be able to just call eacd.get_dynamic_config themselves, rather than using this duplicated code from generate_plots.py:
f"and data collection URL {dynamic_config['server']['connectUrl'] if'server'indynamic_configelse'default'}")
Similarly, e-mission-common should have a function that handles custom label options retrieval vs. default label options from emcommon, which would replace this bit of code:
I am suggesting a major refactor of the repo, particularly (i) the way params are passed to notebooks and (ii) the ways notebooks can be used independently
The primary purpose of the notebooks in this repo is to be run daily and generate outputs for the public dash.
But I also think each of the ipynb files in this repo should be "viable" as a standalone notebook. Someone should be able to grab one of the ipynb files from this repo, pull it into their e-mission-server (dockerized or otherwise), and run it against their e-mission-server (dockerized or otherwise) without having to change a bunch of parameters in the notebook – perhaps ideally, not change any parameters and only specify a couple environment variables.
In other words, I think the notebooks from here should be able to be used in the way that the notebooks from https://github.com/e-mission/e-mission-eval-private-data can be used.
I think these changes would:
scaffolding
,plots
) here that could be transferrable to other eda/viz projects with e-mission data, but is not organized in such a way that it can be used anywhere except this repoassert
statements to all the notebooks as part of a testing strategySpecific changes I suggest:
Simplify the parameters that notebooks receive and/or change params to environment variables. Currently, the notebooks receive:
Besides
year
andmonth
, all of these are derived from the dynamic config. So why are we not just passing the entire config? Unpacking it into a bunch of different variables, with different names, makes it less clear what is going on, and makes it diverge from other components of the e-mission platform.In fact, the notebooks should be able to just call
eacd.get_dynamic_config
themselves, rather than using this duplicated code fromgenerate_plots.py
:em-public-dashboard/viz_scripts/bin/generate_plots.py
Lines 27 to 38 in 3777e48
Similarly,
e-mission-common
should have a function that handles custom label options retrieval vs. default label options from emcommon, which would replace this bit of code:em-public-dashboard/viz_scripts/bin/generate_plots.py
Lines 45 to 71 in 3777e48
break
scaffolding
andplots
into smaller, reusable pieces and relocate them toe-mission-server
ore-mission-common
The text was updated successfully, but these errors were encountered: