Skip to content

Commit

Permalink
[Issue 2130] Follow-up additions to documentation (#3598)
Browse files Browse the repository at this point in the history
## Summary
Fixes #2130

### Time to review: __2 mins__

## Changes proposed
> What was added, updated, or removed in this PR.

A few minor additions to the analytics documentation, as requested in
the comments on PR #3562

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
  • Loading branch information
DavidDudas-Intuitial authored Jan 22, 2025
1 parent d719b93 commit 4152ccd
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 31 deletions.
24 changes: 23 additions & 1 deletion analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,28 @@

This package encapsulates a data pipeline service. The service is responsible for extracting project data from GitHub and transforming the extracted data into rows in a data warehouse.

## Project Directory Structure

The structure of the analytics codebase is outlined below, relative to the root of the `simpler-grants-gov` repo.

```text
root
├── analytics
│ └── src
│ └── analytics
│ └── datasets Create re-usable data interfaces for calculating metrics
│ └── integrations Integrate with external systems used to export data or metrics
│ └── tests
│ └── integrations Integration tests, mostly for src/analytics/integrations
│ └── datasets Unit tests for src/analytics/datasets
|
│ └── config.py Load configurations from environment vars or local .toml files
│ └── settings.toml Default configuration settings, tracked by git
│ └── .secrets.toml Gitignored file for secrets and configuration management
│ └── Makefile Frequently used commands for setup, development, and CLI usage
│ └── pyproject.toml Python project configuration file
```

## Data Pipeline

The service in this package provides capabilities to satisfy the middle step (denoted as "ETL") in the following data flow diagram:
Expand All @@ -12,7 +34,7 @@ The service in this package provides capabilities to satisfy the middle step (de

The service does not listen on a port or run as a daemon. Instead, it must be triggered manually, via `Make` commands on the command-line, or via a text-based interactive tool written in Python and referred to as CLI.

In current practice, the service is triggered daily via an AWS Step Function (akin to a cron job) orchestrated with Terraform.
In current practice, the service is triggered daily via an AWS Step Function (akin to a cron job) orchestrated with Terraform. This results in a daily update to the analytics data warehouse in Postgres, and a visible data refresh for viewers of SGG program-level metrics dashboards in Metabase.

## Developer Information

Expand Down
56 changes: 26 additions & 30 deletions documentation/analytics/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,51 +3,37 @@
> [!NOTE]
> All of the steps on this page should be run from the root of the [`analytics/`](../../analytics/) directory
## Development Environment Setup
## Install Prerequisites

### Docker vs Native
## Development Environment: Docker vs. Native

This package runs in Docker by default, but can also be configured to run natively without Docker. Choose the option that's best for you, and then follow the instructions for that option:
This package runs in Docker by default, but can also be configured to run natively without Docker. Choose the option that's best for you, and then install the prerequisites for that option:

- [Run with Docker](#run-with-docker)
- [Run Natively](#run-natively)

#### Run with Docker
#### Run with Docker

**Pre-requisites**
**Prerequisites**

- **Docker** installed and running locally: `docker --version`
- **Docker compose** installed: `docker-compose --version`

**Steps**

1. Run `make build`
2. Acquire a GitHub Token using one of the methods below
- Via AWS (Project Team)
- Retrieve GH_TOKEN from [AWS](https://us-east-1.console.aws.amazon.com/systems-manager/parameters/%252Fanalytics%252Fgithub-token/description?region=us-east-1&tab=Table#list_parameter_filters=Name:Contains:analytics%2Fgithub-token)
- Create your own in GitHub (Open Source)
- Go to https://github.com/settings/tokens
- Generate a new token (classic)
- Give it the following scopes:
- repo
- read:org
- admin:public_key
- project
3. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
4. Run `make test-audit` to confirm the application is running correctly
5. Proceed to next section to learn how to invoke commands
- **Docker** [Installation options](https://docs.docker.com/desktop/setup/install/mac-install/)
- **docker-compose** [Installation options](https://formulae.brew.sh/formula/docker-compose)

#### Run Natively

**Pre-requisites**
**Prerequisites**

- **Python version 3.12:** [pyenv](https://github.com/pyenv/pyenv#installation) is one popular option for installing Python, or [asdf](https://asdf-vm.com/)
- **Poetry:** [install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx)
- **Poetry:** [Install poetry with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or alternatively use [pipx to install](https://python-poetry.org/docs/#installing-with-pipx)
- **GitHub CLI:** [Install the GitHub CLI](https://github.com/cli/cli#installation)
- **Postgres:** [Installation options for macOS](https://www.postgresql.org/download/macosx/)
- **Psycopg:** [Installation options](https://www.psycopg.org/psycopg3/docs/basic/install.html)

### Install the Package

**Steps**

1. Add PY_RUN_APPROACH=local to your environment variables, e.g. in .zshrc or .bashrc
1. Install all prerequisites
2. Set up the project: `make install` -- This will install the required packages and prompt you to authenticate with GitHub
3. Acquire a GitHub Token using one of the methods below
- Via AWS (Project Team)
Expand All @@ -61,8 +47,10 @@ This package runs in Docker by default, but can also be configured to run native
- admin:public_key
- project
4. Add `GH_TOKEN=...` to your environment variables, e.g. in .zshrc or .bashrc
5. Run `make test-audit` to confirm the application is running correctly
6. Proceed to next section to learn how to invoke commands
5. If running natively, add PY_RUN_APPROACH=local to your environment variables
6. Edit `local.env` and set the value of DB_HOST accordingly
7. Run `make test-audit` to confirm the application is running correctly


## Invoke Commands on the Service

Expand All @@ -83,6 +71,14 @@ The package includes a CLI that can be used to discover the available commands.

## Example Development Tasks

### How To Access Dockerized Postgres DB from MacOS Terminal

1. Start the database container: `sudo docker-compose up -d`
2. Ensure container is running: `docker-compose ls`
3. Get your IP address, which will be used in next step: `ifconfig -u | grep 'inet ' | grep -v 127.0.0.1 | cut -d\ -f2 | head -1` (this will display a value similar to `10.0.1.101`)
4. Launch the terminal-based front-end to Postgres: `psql -h 10.0.1.101 -p 5432 -U app -W app` (use IP address from previous step for the value of `-h` arg)
5. Type a PostgresSQL command, e.g.: `\dir`.

### How To Add New Dataset

1. Create a new python file in `src/analytics/datasets/`
Expand Down

0 comments on commit 4152ccd

Please sign in to comment.