Skip to content

Commit

Permalink
Temporal workflows (#61)
Browse files Browse the repository at this point in the history
Changes:
* Add support for starting workflows using temporal via the oonipipeline
CLI
* Refactor how workers are started in temporal
* Enable end to end workflow tests
  • Loading branch information
hellais authored Apr 16, 2024
1 parent 546f40e commit 0132bfc
Show file tree
Hide file tree
Showing 13 changed files with 637 additions and 221 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/test_oonipipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ jobs:
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
- name: Install temporal
run: |
curl -sSf https://temporal.download/cli.sh | sh
echo "$HOME/.temporalio/bin" >> $GITHUB_PATH
- name: Run all tests
run: hatch run cov
working-directory: ./oonipipeline/
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ coverage.xml
/output
/attic
/prof
/clickhouse-data
39 changes: 39 additions & 0 deletions oonipipeline/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# OONI Pipeline v5

This it the fifth major iteration of the OONI Data Pipeline.

For historical context, these are the major revisions:
* `v0` - The "pipeline" is basically just writing the RAW json files into a public `www` directory. Used until ~2013
* `v1` - OONI Pipeline based on custom CLI scripts using mongodb as a backend. Used until ~2015.
* `v2` - OONI Pipeline based on [luigi](https://luigi.readthedocs.io/en/stable/). Used until ~2017.
* `v3` - OONI Pipeline based on [airflow](https://airflow.apache.org/). Used until ~2020.
* `v4` - OONI Pipeline basedon custom script and systemd units (aka fastpath). Currently in use in production.
* `v5` - Next generation OONI Pipeline. What this readme is relevant to. Expected to become in production by Q4 2024.

## Setup

In order to run the pipeline you should setup the following dependencies:
* [Temporal for python](https://learn.temporal.io/getting_started/python/dev_environment/)
* [Clickhouse](https://clickhouse.com/docs/en/install)
* [hatch](https://hatch.pypa.io/1.9/install/)


### Quick start

Start temporal dev server:
```
temporal server start-dev
```

Start clickhouse server:
```
mkdir -p clickhouse-data
clickhouse server
```

You can then start the desired workflow, for example to create signal observations for the US:
```
hatch run oonipipeline mkobs --probe-cc US --test-name signal --start-day 2024-01-01 --end-day 2024-01-02
```

Monitor the workflow executing by accessing: http://localhost:8233/
12 changes: 0 additions & 12 deletions oonipipeline/debug-temporal.sh

This file was deleted.

1 change: 1 addition & 0 deletions oonipipeline/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ path = ".venv/"
path = "src/oonipipeline/__about__.py"

[tool.hatch.envs.default.scripts]
oonipipeline = "python -m oonipipeline.main {args}"
test = "pytest {args:tests}"
test-cov = "pytest -s --full-trace --log-level=INFO --log-cli-level=INFO -v --setup-show --cov=./ --cov-report=xml --cov-report=html --cov-report=term {args:tests}"
cov-report = ["coverage report"]
Expand Down
Empty file.
Loading

0 comments on commit 0132bfc

Please sign in to comment.