Make notebooks simpler and reusable across e-mission platform #187

JGreenlee · 2025-01-13T16:04:31Z

I am suggesting a major refactor of the repo, particularly (i) the way params are passed to notebooks and (ii) the ways notebooks can be used independently

The primary purpose of the notebooks in this repo is to be run daily and generate outputs for the public dash.
But I also think each of the ipynb files in this repo should be "viable" as a standalone notebook. Someone should be able to grab one of the ipynb files from this repo, pull it into their e-mission-server (dockerized or otherwise), and run it against their e-mission-server (dockerized or otherwise) without having to change a bunch of parameters in the notebook – perhaps ideally, not change any parameters and only specify a couple environment variables.
In other words, I think the notebooks from here should be able to be used in the way that the notebooks from https://github.com/e-mission/e-mission-eval-private-data can be used.

I think these changes would:

make it simpler for future contributors to understand the codebase
- I have now worked on all major components of the e-mission platform, and I have found the public dash to be the least intuitive / steepest learning curve to start working on. Per README, it's intended to be "simple and stupid", but I suspect it has grown more complex over time it was originally conceived to be
- I also think there is a lot of good code (scaffolding, plots) here that could be transferrable to other eda/viz projects with e-mission data, but is not organized in such a way that it can be used anywhere except this repo
make it easier to spot-check / test changes to the notebooks locally
- we'd be able to open the notebook in VSCode / IDE of choice, set some env variables, and run notebooks locally without having to connect to the Jupyter notebook server
- this may become even more relevant/useful if we start adding inline assert statements to all the notebooks as part of a testing strategy

Specific changes I suggest:

Simplify the parameters that notebooks receive and/or change params to environment variables. Currently, the notebooks receive:

  year=year,
  month=month,
  program=args.program,
  study_type=dynamic_config['intro']['program_or_study'],
  mode_of_interest=mode_studied,
  include_test_users=dynamic_config.get('metrics', {}).get('include_test_users', False),
  labels = labels,
  use_imperial = dynamic_config.get('display_config', {}).get('use_imperial', True),
  sensed_algo_prefix=dynamic_config.get('metrics', {}).get('sensed_algo_prefix', "cleaned"),
  bluetooth_only = dynamic_config.get('tracking', {}).get('bluetooth_only', False),
  survey_info = dynamic_config.get('survey_info', {}),

Besides year and month, all of these are derived from the dynamic config. So why are we not just passing the entire config? Unpacking it into a bunch of different variables, with different names, makes it less clear what is going on, and makes it diverge from other components of the e-mission platform.

In fact, the notebooks should be able to just call eacd.get_dynamic_config themselves, rather than using this duplicated code from generate_plots.py:

em-public-dashboard/viz_scripts/bin/generate_plots.py

Lines 27 to 38 in 3777e48

    
           # Read and use parameters from the unified config file on the e-mission Github page 
        
           download_url = "https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/" + STUDY_CONFIG + ".nrel-op.json" 
        
           print("About to download config from %s" % download_url) 
        
           r = requests.get(download_url) 
        
           if r.status_code is not 200: 
        
               print(f"Unable to download study config, status code: {r.status_code}") 
        
               sys.exit(1) 
        
           else: 
        
               dynamic_config = json.loads(r.text) 
        
               print(f"Successfully downloaded config with version {dynamic_config['version']} "\ 
        
                   f"for {dynamic_config['intro']['translated_text']['en']['deployment_name']} "\ 
        
                   f"and data collection URL {dynamic_config['server']['connectUrl'] if 'server' in dynamic_config else 'default'}")

Similarly, e-mission-common should have a function that handles custom label options retrieval vs. default label options from emcommon, which would replace this bit of code:

em-public-dashboard/viz_scripts/bin/generate_plots.py

Lines 45 to 71 in 3777e48

    
           # dynamic_labels can  be referenced from  
        
           # https://github.com/e-mission/nrel-openpath-deploy-configs/blob/main/label_options/example-study-label-options.json 
        
           labels = { } 
        
           async def load_default_label_options(): 
        
               labels = await emcu.read_json_resource("label-options.default.json") 
        
               return labels 
        
           # Check if the dynamic config contains dynamic labels 'label_options' 
        
           # Parse through the dynamic_labels_url: 
        
           if 'label_options' in dynamic_config: 
        
               dynamic_labels_url = dynamic_config['label_options'] 
        
               req = requests.get(dynamic_labels_url) 
        
               if req.status_code != 200: 
        
                   print(f"Unable to download dynamic_labels_url, status code: {req.status_code} for {STUDY_CONFIG}") 
        
               else: 
        
                   labels = json.loads(req.text) 
        
                   print(f"Dynamic labels download was successful for nrel-openpath-deploy-configs: {STUDY_CONFIG}" ) 
        
           else: 
        
               # load default labels from e-mission-common 
        
               # https://raw.githubusercontent.com/JGreenlee/e-mission-common/refs/heads/master/src/emcommon/resources/label-options.default.json 
        
               labels = asyncio.run(load_default_label_options()) 
        
               if not labels: 
        
                   print(f"Unable to load labels for : {STUDY_CONFIG}") 
        
               else: 
        
                   print(f"Labels loading was successful for nrel-openpath-deploy-configs: {STUDY_CONFIG}")

break scaffolding and plots into smaller, reusable pieces and relocate them to e-mission-server or e-mission-common

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make notebooks simpler and reusable across e-mission platform #187

Make notebooks simpler and reusable across e-mission platform #187

JGreenlee commented Jan 13, 2025 •

edited

Loading

Make notebooks simpler and reusable across e-mission platform #187

Make notebooks simpler and reusable across e-mission platform #187

Comments

JGreenlee commented Jan 13, 2025 • edited Loading

JGreenlee commented Jan 13, 2025 •

edited

Loading