[Issue 2130] Follow-up additions to documentation (#3598)

## Summary Fixes #2130 ### Time to review: __2 mins__ ## Changes proposed > What was added, updated, or removed in this PR. A few minor additions to the analytics documentation, as requested in the comments on PR #3562 ## Context for reviewers > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. ## Additional information > Screenshots, GIF demos, code examples or output to help show the changes working as expected.
HHS · Jan 22, 2025 · 4152ccd · 4152ccd
1 parent d719b93
commit 4152ccd
Show file tree

Hide file tree

Showing 2 changed files with 49 additions and 31 deletions.
diff --git a/analytics/README.md b/analytics/README.md
@@ -4,6 +4,28 @@
 
 This package encapsulates a data pipeline service. The service is responsible for extracting project data from GitHub and transforming the extracted data into rows in a data warehouse. 
 
+## Project Directory Structure
+
+The structure of the analytics codebase is outlined below, relative to the root of the `simpler-grants-gov` repo.
+
+```text
+root
+├── analytics
+│   └── src
+│       └── analytics
+│           └── datasets      Create re-usable data interfaces for calculating metrics
+│           └── integrations  Integrate with external systems used to export data or metrics
+│   └── tests
+│       └── integrations      Integration tests, mostly for src/analytics/integrations
+│       └── datasets          Unit tests for src/analytics/datasets
+|
+│   └── config.py             Load configurations from environment vars or local .toml files
+│   └── settings.toml         Default configuration settings, tracked by git
+│   └── .secrets.toml         Gitignored file for secrets and configuration management
+│   └── Makefile              Frequently used commands for setup, development, and CLI usage
+│   └── pyproject.toml        Python project configuration file
+```
+
 ## Data Pipeline
 
 The service in this package provides capabilities to satisfy the middle step (denoted as "ETL") in the following data flow diagram:
@@ -12,7 +34,7 @@ The service in this package provides capabilities to satisfy the middle step (de
 
 The service does not listen on a port or run as a daemon. Instead, it must be triggered manually, via `Make` commands on the command-line, or via a text-based interactive tool written in Python and referred to as CLI.
 
-In current practice, the service is triggered daily via an AWS Step Function (akin to a cron job) orchestrated with Terraform.
+In current practice, the service is triggered daily via an AWS Step Function (akin to a cron job) orchestrated with Terraform. This results in a daily update to the analytics data warehouse in Postgres, and a visible data refresh for viewers of SGG program-level metrics dashboards in Metabase. 
 
 ##  Developer Information
 

diff --git a/documentation/analytics/development.md b/documentation/analytics/development.md
@@ -3,51 +3,37 @@
 > [!NOTE]
 > All of the steps on this page should be run from the root of the [`analytics/`](../../analytics/) directory
 
-## Development Environment Setup 
+## Install Prerequisites 
 
-### Docker vs Native
+## Development Environment: Docker vs. Native
 
-This package runs in Docker by default, but can also be configured to run natively without Docker. Choose the option that's best for you, and then follow the instructions for that option:
+This package runs in Docker by default, but can also be configured to run natively without Docker. Choose the option that's best for you, and then install the prerequisites for that option:
 
 - [Run with Docker](#run-with-docker)
 - [Run Natively](#run-natively)
 
-#### Run with Docker
+#### Run with Docker 
 
-**Pre-requisites**
+**Prerequisites**
 
-- **Docker** installed and running locally: `docker --version`
-- **Docker compose** installed: `docker-compose --version`
-
-**Steps**
-
-1. Run `make build`
-2. Acquire a GitHub Token using one of the methods below
-  - Via AWS (Project Team)
-    - Retrieve GH_TOKEN from [AWS](https://us-east-1.console.aws.amazon.com/systems-manager/parameters/%252Fanalytics%252Fgithub-token/description?region=us-east-1&tab=Table#list_parameter_filters=Name:Contains:analytics%2Fgithub-token)
-  - Create your own in GitHub (Open Source)
-    - Go to https://github.com/settings/tokens
-    - Generate a new token (classic)
-    - Give it the following scopes:
-      - repo
-      - read:org
-      - admin:public_key
-      - project
-3. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
-4. Run `make test-audit` to confirm the application is running correctly
-5. Proceed to next section to learn how to invoke commands 
+- **Docker** [Installation options](https://docs.docker.com/desktop/setup/install/mac-install/) 
+- **docker-compose** [Installation options](https://formulae.brew.sh/formula/docker-compose)
 
 #### Run Natively
 
-**Pre-requisites**
+**Prerequisites**
 
 - **Python version 3.12:** [pyenv](https://github.com/pyenv/pyenv#installation) is one popular option for installing Python, or [asdf](https://asdf-vm.com/)
-- **Poetry:** [install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx)
+- **Poetry:** [Install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx)
 - **GitHub CLI:** [Install the GitHub CLI](https://github.com/cli/cli#installation)
+- **Postgres:** [Installation options for macOS](https://www.postgresql.org/download/macosx/)
+- **Psycopg:** [Installation options](https://www.psycopg.org/psycopg3/docs/basic/install.html)
+
+### Install the Package
 
 **Steps**
 
-1. Add PY_RUN_APPROACH=local to your environment variables, e.g. in .zshrc or .bashrc
+1. Install all prerequisites
 2. Set up the project: `make install` -- This will install the required packages and prompt you to authenticate with GitHub
 3. Acquire a GitHub Token using one of the methods below
   - Via AWS (Project Team)
@@ -61,8 +47,10 @@ This package runs in Docker by default, but can also be configured to run native
       - admin:public_key
       - project
 4. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
-5. Run `make test-audit` to confirm the application is running correctly
-6. Proceed to next section to learn how to invoke commands 
+5. If running natively, add PY_RUN_APPROACH=local to your environment variables
+6. Edit `local.env` and set the value of DB_HOST accordingly
+7. Run `make test-audit` to confirm the application is running correctly
+
 
 ## Invoke Commands on the Service
 
@@ -83,6 +71,14 @@ The package includes a CLI that can be used to discover the available commands.
 
 ## Example Development Tasks
 
+### How To Access Dockerized Postgres DB from MacOS Terminal
+
+1. Start the database container: `sudo docker-compose up -d`
+2. Ensure container is running: `docker-compose ls`
+3. Get your IP address, which will be used in next step: `ifconfig -u | grep 'inet ' | grep -v 127.0.0.1 | cut -d\  -f2 | head -1` (this will display a value similar to `10.0.1.101`)
+4. Launch the terminal-based front-end to Postgres: `psql -h 10.0.1.101 -p 5432 -U app -W app` (use IP address from previous step for the value of `-h` arg)
+5. Type a PostgresSQL command, e.g.: `\dir`.
+
 ### How To Add New Dataset
 
 1. Create a new python file in `src/analytics/datasets/`