HHS · DavidDudas-Intuitial · Jan 21, 2025 · Jan 15, 2025 · Jan 15, 2025 · Jan 15, 2025
@@ -2,48 +2,24 @@
 
 ## Introduction
 
-This a command line interface (CLI) tool written in python that is used to run analytics on operational data for the Simpler.Grants.gov initiative. For a more in depth discussion of tools used and the structure of the codebase, view the technical details for the analytics package.
-
-## Project directory structure
-
-Outlines the structure of the analytics codebase, relative to the root of the simpler-grants-gov repo.
-
-```text
-root
-├── analytics
-│   └── src
-│       └── analytics
-│           └── datasets      Create re-usable data interfaces for calculating metrics
-│           └── integrations  Integrate with external systems used to export data or metrics
-│           └── metrics       Calculate the project's operational metrics
-│   └── tests
-│       └── integrations      Integration tests, mostly for src/analytics/integrations
-│       └── datasets          Unit tests for src/analytics/datasets
-│       └── metrics           Unit tests for src/analytics/metrics
-|
-│   └── config.py             Load configurations from environment vars or local .toml files
-│   └── settings.toml         Default configuration settings, tracked by git
-│   └── .secrets.toml         Gitignored file for secrets and configuration management
-│   └── Makefile              Frequently used commands for setup, development, and CLI usage
-│   └── pyproject.toml        Python project configuration file
-```
-
-## Using the tool
-
-Project maintainers and members of the public have a few options for interacting with the tool and the reports it produces. Read more about each option in the [usage guide](../documentation/analytics/usage.md):
-
-1. [Viewing the reports in Slack](../documentation/analytics/usage.md#view-daily-reports-in-slack)
-2. [Triggering reports from GitHub](../documentation/analytics/usage.md#trigger-a-report-from-github)
-3. [Triggering reports from the command line](../documentation/analytics/usage.md#trigger-a-report-from-the-command-line)
-
-## Contributing to the tool
-
-Project maintainers or open source contributors are encouraged to contribute to the tool. Follow the guides linked below for more information:
-
-1. [Technical overview](../documentation/analytics/technical-overview.md)
-2. [Installation and development guide](../documentation/analytics/development.md)
-   - [Adding a new data source](../documentation/analytics/development.md#adding-a-new-dataset)
-   - [Adding a new metric](../documentation/analytics/development.md#adding-a-new-metric)
-3. [Writing and running tests](../documentation/analytics/testing.md)
-4. [Command line interface (CLI) user guide](../documentation/analytics/usage.md#using-the-command-line-interface)
-5. [Description of existing metrics](../documentation/analytics/metrics/README.md)
+This package encapsulates a data pipeline service. The service is responsible for extracting project data from GitHub and transforming the extracted data into rows in a data warehouse. 
+
+## Data Pipeline
+
+The service in this package provides capabilities to satisfy the middle step (denoted as "ETL") in the following data flow diagram:
+
+  `SGG Project Data → GitHub → ETL → Postgres DW → Metabase → End User`
+
+The service does not listen on a port or run as a daemon. Instead, it must be triggered manually, via `Make` commands on the command-line, or via a text-based interactive tool written in Python and referred to as CLI.
+
+In current practice, the service is triggered daily via an AWS Step Function (akin to a cron job) orchestrated with Terraform.
+
+##  Developer Information
+
+The service is open-source and can be installed and run in a local development environment, which is useful for project maintainers and/or open source contributors. Follow the links below for more information:
+
+1. [Technical Overview](../documentation/analytics/technical-overview.md)
+2. [Getting Started Guide for Developers](../documentation/analytics/development.md)
+3. [Writing and Running Tests](../documentation/analytics/testing.md)
+4. [Usage Guide: Data Pipeline Service & CLI](../documentation/analytics/usage.md)
+
@@ -1,48 +1,23 @@
-# Development <!-- omit in toc -->
+# Getting Started Guide for Developers
 
 > [!NOTE]
-> All of the steps on this page should be run from the root of the [`analytics/`](../../analytics/) sub-directory
+> All of the steps on this page should be run from the root of the [`analytics/`](../../analytics/) directory
 
-<details>
-   <summary>Table of contents</summary>
-
-- [Setting up the tool locally](#setting-up-the-tool-locally)
-  - [Docker vs Native](#docker-vs-native)
-    - [Running with Docker](#running-with-docker)
-    - [Running natively](#running-natively)
-  - [Configuring secrets](#configuring-secrets)
-    - [Prerequisites](#prerequisites)
-    - [Finding reporting channel ID](#finding-reporting-channel-id)
-    - [Finding slackbot token](#finding-slackbot-token)
-- [Running the tool locally](#running-the-tool-locally)
-  - [Using the `make` commands](#using-the-make-commands)
-  - [Using the CLI tool](#using-the-cli-tool)
-- [Common development tasks](#common-development-tasks)
-  - [Adding a new dataset](#adding-a-new-dataset)
-  - [Adding a new metric](#adding-a-new-metric)
-  - [Adding a new CLI entrypoint](#adding-a-new-cli-entrypoint)
-
-</details>
-
-## Setting up the tool locally
-
-The following sections describe how to install and work with the analytics application on your own computer. If you don't need to run the application locally, view the [usage docs](usage.md) for other ways to monitor our operational metrics.
+## Development Environment Setup 
 
 ### Docker vs Native
 
-This project run itself inside of docker by default. If you wish to run this natively, add PY_RUN_APPROACH=local to your environment variables. You can set this by either running `export PY_RUN_APPROACH=local` in your shell or add it to your ~/.zshrc file (and run `source ~/.zshrc`).
-
-After choosing your approach, following the corresponding setup instructions:
+This package runs in Docker by default, but can also be configured to run natively without Docker. Choose the option that's best for you, and then follow the instructions for that option:
 
-- [Running with Docker](#running-with-docker)
-- [Running natively](#running-natively)
+- [Run with Docker](#run-with-docker)
+- [Run Natively](#run-natively)
 
-#### Running with Docker
+#### Run with Docker
 
 **Pre-requisites**
 
-- Docker installed and running locally: `docker --version`
-- Docker compose installed: `docker-compose --version`
+- **Docker** installed and running locally: `docker --version`
+- **Docker compose** installed: `docker-compose --version`
 
 **Steps**
 
@@ -58,27 +33,23 @@ After choosing your approach, following the corresponding setup instructions:
       - read:org
       - admin:public_key
       - project
-3. Add `export GH_TOKEN=...` to your `zshrc` or similar
-4. Set the slackbot token and the channel ID for Slack after following the instructions in [configuring secrets](#configuring-secrets). **Note:** replace the `...` with the value of these secrets:
-   ```
-   export ANALYTICS_SLACK_BOT_TOKEN=...
-   export ANALYTICS_REPORTING_CHANNEL_ID=...
-   ```
-5. Run `make test-audit` to confirm the application is running correctly.
+3. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
+4. Run `make test-audit` to confirm the application is running correctly
+5. Proceed to next section to learn how to invoke commands 
 
-#### Running natively
+#### Run Natively
 
 **Pre-requisites**
 
-- **Python version 3.12:** [pyenv](https://github.com/pyenv/pyenv#installation) is one popular option for installing Python,
-   or [asdf](https://asdf-vm.com/).
-- **Poetry:** After installing and activating the right version of Python, [install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx).
+- **Python version 3.12:** [pyenv](https://github.com/pyenv/pyenv#installation) is one popular option for installing Python, or [asdf](https://asdf-vm.com/)
+- **Poetry:** [install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx)
 - **GitHub CLI:** [Install the GitHub CLI](https://github.com/cli/cli#installation)
 
 **Steps**
 
-1. Set up the project: `make setup` -- This will install the required packages and prompt you to authenticate with GitHub
-2. Acquire a GitHub Token using one of the methods below
+1. Add PY_RUN_APPROACH=local to your environment variables, e.g. in .zshrc or .bashrc
+2. Set up the project: `make install` -- This will install the required packages and prompt you to authenticate with GitHub
+3. Acquire a GitHub Token using one of the methods below
   - Via AWS (Project Team)
     - Retrieve GH_TOKEN from [AWS](https://us-east-1.console.aws.amazon.com/systems-manager/parameters/%252Fanalytics%252Fgithub-token/description?region=us-east-1&tab=Table#list_parameter_filters=Name:Contains:analytics%2Fgithub-token)
   - Create your own in GitHub (Open Source)
@@ -89,122 +60,58 @@ After choosing your approach, following the corresponding setup instructions:
       - read:org
       - admin:public_key
       - project
-3. Add `export GH_TOKEN=...` to your `zshrc` or similar
-4. Set the slackbot token and the channel ID for Slack after following the instructions in [configuring secrets](#configuring-secrets). **Note:** replace the `...` with the value of these secrets:
-   ```
-   export ANALYTICS_SLACK_BOT_TOKEN=...
-   export ANALYTICS_REPORTING_CHANNEL_ID=...
-   ```
-5. Run `make test-audit` to confirm the application is running correctly.
-
-### Configuring secrets
-
-#### Prerequisites
-
-In order to correctly set the value of the `slack_bot_token` and `reporting_channel_id` you will need:
-
-1. To be a member of the Simpler.Grants.gov slack workspace
-2. To be a collaborator on the Sprint Reporting Bot slack app
-
-If you need to be added to the slack workspace or to the list of collaborators for the app, contact a project maintainer.
+4. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
+5. Run `make test-audit` to confirm the application is running correctly
+6. Proceed to next section to learn how to invoke commands 
 
-#### Finding reporting channel ID
+## Invoke Commands on the Service
 
-1. In the Simpler.Grants.gov Slack workspace navigate to the `#z_bot-sprint-reporting` channel. NB: Use`#z_bot-analytics-ci-test` channel for testing.
-2. Click on the name of the channel in the top left part of the screen.
-3. Scroll down to the bottom of the resulting dialog box until you see where it says `Channel ID` and copy.
+### Using `make` 
 
-<img alt="Screenshot of dialog box with channel ID" src="../../analytics/static/screenshot-channel-id.png" height=500>
+Several `make` commands are defined in the project [`Makefile`](../../analytics/Makefile). Commands can be invoked from the command line, as in the following examples:
 
-#### Finding slackbot token
-
-1. Go to [the dashboard](https://api.slack.com/apps) that displays the slack apps for which you have collaborator access
-2. Click on `Sprint Reporting Bot` to go to the settings for our analytics slackbot
-3. From the side menu, select `OAuth & Permissions` and scroll down to the "OAuth tokens for your workspace" section
-4. Copy the "Bot user OAuth token" which should start with `xoxb`
-
-<img alt="Screenshot of slack app settings page with bot user OAuth token" src="../../analytics/static/screenshot-slackbot-token.png" width=750>
-
-## Running the tool locally
-
-While the [usage guide](usage.md) describes all of the options for running the `analytics` package locally, the following sections highlight some helpful commands to interact with the tool during development.
-
-### Using the `make` commands
-
-In earlier steps, you'll notice that we've configured a set of `make` commands that help streamline common developer workflows related to the `analytics` package. You can view the [`Makefile`](../../analytics/Makefile) for the full list of commands, but some common ones are also described below:
-
-- `make install` - Checks that you have the prereqs installed, installs new dependencies, and prompts you to authenticate with the GitHub CLI.
-- `make unit-test` - Runs the unit tests and prints a coverage report
-- `make e2e-test` - Runs integration and end-to-end tests and prints a coverage report
+- `make install` - Checks that prereqs are installed, installs new dependencies, and prompts for GitHub authentication
+- `make unit-test` - Runs the unit tests and opens a coverage report in a web browser
+- `make e2e-test` - Runs integration and end-to-end tests and opens a coverage report in a web browser
 - `make lint` - Runs [linting and formatting checks](formatting-and-linting.md)
-- `make sprint-reports-with-latest-data` Runs the full analytics pipeline which includes:
-  - Exporting data from GitHub
-  - Calculating the following metrics
-  - Either printing those metrics to the command line or posting them to slack (if `ACTION=post-results` is passed)
 
-### Using the CLI tool
+### Using the CLI 
 
-The `analytics` package comes with a built-in CLI that you can use to discover the reporting features available. Start by simply typing `poetry run analytics --help` which will print out a list of available commands:
+The package includes a CLI that can be used to discover the available commands. To run the CLI, type `poetry run analytics --help` at the command line, and the CLI should respond with a list of available commands.
 
 ![Screenshot of passing the --help flag to CLI entry point](../../analytics/static/screenshot-cli-help.png)
 
-Additional guidance on working with the CLI tool can be found in the [usage guide](usage.md#using-the-command-line-interface).
+## Example Development Tasks
 
-## Common development tasks
+### How To Add New Dataset
 
-### Adding a new dataset
+1. Create a new python file in `src/analytics/datasets/`
+2. In that file, create a new class that inherits from the `BaseDataset`
+3. Store the names of key columns as either class or instance attributes
+4. If you need to combine multiple source files (or other datasets) to produce this dataset, consider creating a class method that can be used to instantiate this dataset from those sources
+5. Create **at least** one unit test for each method that is implemented with the new class
 
-1. Create a new python file in `src/analytics/datasets/`.
-2. In that file, create a new class that inherits from the `BaseDataset`.
-3. Store the names of key columns as either class or instance attributes.
-4. If you need to combine multiple source files (or other datasets) to produce this dataset, consider creating a class method that can be used to instantiate this dataset from those sources.
-5. Create **at least** one unit test for each method that is implemented with the new class.
+### How To Add New CLI Entrypoint
 
-### Adding a new metric
+1. Add a new function to [`cli.py`](../../analytics/src/analytics/cli.py)
+2. Wrap this function with a [sub-command `typer` decorator](https://typer.tiangolo.com/tutorial/subcommands/single-file/) 
+3. If the function accepts parameters, [annotate those parameters](https://typer.tiangolo.com/tutorial/options/name/)
+4. Add *at least* one unit test for the CLI entrypoint, optionally mocking potential side effects of calling the entrypoint
 
-1. Create a new python file in `src/analytics/metrics/`.
-2. In that file, create a new class that inherits from the `BaseMetric`.
-3. Determine which dataset class this metric requires as an input. **Note:** If the metric requires a dataset that doesn't exist, review the steps to [add a dataset](#adding-a-new-dataset).
-4. Implement the following methods on that class.:
-   - `__init__()` - Instantiates the metric class and accepts any inputs needed to calculate the metric (e.g. `sprint` for `SprintBurndown`)
-   - `calculate()` - Calculates the metric and stores the output to a `self.results` attribute. **Tip:** It's often helpful to break the steps involved in calculating the metric into a series of private methods (i.e. methods that begin with an underscore, e.g. `_get_and_validate_sprint_name()`) that can be called from the main `calculate()` method.
-   - `get_stats()` - Calculates and returns key stats about the metric or input dataset. **Note:** Stats are different from metrics in that they represent single values and aren't meant to be visualized in a chart.
-   - `format_slack_message()` - Generate a string that will be included if the results are posted to Slack. This often includes a list of stats as well as the title of the metric.
-5. Create *at least* one unit test for each of these methods to test them against a simplified input dataset to ensure the function has been implemented correctly. For more information review the [docs on testing](../../documentation/analytics/testing.md)
-6. Follow the steps in [adding a new CLI entrypoint](#adding-a-new-cli-entrypoint) to expose this metric via the CLI.
+### How to Extend Analytics DB Schema
 
-### Adding a new CLI entrypoint
+1. Add a new migration file to [`integrations/etldb/migrations/versions/`](../../analytics/src/analytics/integrations/etldb/migrations/versions) and prefix file name with the next iteration number (ex: `0007_`)
+2. Add valid Postgres SQL to the new integration file
+3. Run the migration command: `make db-migrate` 
 
-1. Add a new function to [`cli.py`](../../analytics/src/analytics/cli.py)
-2. Wrap this function with a [sub-command `typer` decorator](https://typer.tiangolo.com/tutorial/subcommands/single-file/). For example if you want to calculate sprint burndown with the entrypoint `analytics calculate sprint_burndown`, you'd use the decorator: `metrics_app.command(name="sprint_burndown")`
-3. If the function accepts parameters, [annotate those parameters](https://typer.tiangolo.com/tutorial/options/name/).
-4. Add *at least* one unit test for the CLI entrypoint, optionally mocking potential side effects of calling the entrypoint.
-
-### Copying table from grants-db
-
-1. Add a new sql migration file in `src/analytics/integrations/etldb/migrations/versions` and prefix file name with the next iteration number (ex: `0007`).
-2. Use your database management system(ex: `pg_admin`, `db_beaver`...) and right-click on the table you wish to copy and select `SQL scripts` then `request and copy original DDL` 
-3. Paste the DDL in your new migration file. Fix any formating issues, see previous migration files for reference.
-4. Remove all reference to schema, roles, triggers and the use of `default now()` for timestamp columns.
-
-    Example: 
-    ``` sql 
-    create table if not exists opi.opportunity
-    ( 
-     ...,
-     created_at              timestamp with time zone default now() not null,
-     ...
-    )
-    ```
-    should be
-    ``` sql 
-    CREATE TABLE IF NOT EXISTS opportunity;
-    ( 
-     ...,
-     created_at              timestamp with time zone not null
-     ...
-    )
-      ```
-
-5. Run migration via `make db-migrate` command
+### How To Run Linters
+
+```bash
+make lint
+```
+
+### How To Run Unit Tests
 
+```bash
+make unit-test
+```