Skip to content

Commit

Permalink
chore(documentation): add documentation on scaleway and scalingo
Browse files Browse the repository at this point in the history
  • Loading branch information
hlecuyer authored and vmttn committed Jul 9, 2024
1 parent 5a6a661 commit 7a243d3
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 7 deletions.
7 changes: 5 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,12 @@ After a few seconds, the services should be avaible as follow:
| airflow UI | [http://localhost:8080](http://localhost:8080) | user: `airflow` pass: `airflow` |
| data.inclusion | [http://localhost:8000](http://localhost:8000/api/v0/docs) | token must be generated |

### `minio` client

Optional, but it allows you to interact with the datalake from the commandline.
### `minio` Client

This is optional but allows you to interact with the datalake locally from the command line.

Cf [DEPLOYMENT.md](DEPLOYMENT.md) if you also wich to interact with staging and prod bucket.

See installation instructions [here](https://min.io/docs/minio/linux/reference/minio-mc.html).

Expand Down
53 changes: 52 additions & 1 deletion DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,55 @@
* The project is deployable on the Scalingo platform.
* Each service (pipeline, api, etc.) is deployed in its own application.
* It is made possible using the [`PROJECT_DIR`](https://doc.scalingo.com/platform/getting-started/common-deployment-errors#project-in-a-subdirectory) env variable defined in each app.
* Services are configured through the environment.
* Services are configured through the environment.



### Scaleway

If you need to interact with Scaleway, once you have your access with the right IAM configuration:

1. Install [Scaleway CLI](https://www.scaleway.com/en/docs/developer-tools/scaleway-cli/quickstart/#how-to-install-the-scaleway-cli-locally).
2. Generate an [SSH key](https://www.scaleway.com/en/docs/identity-and-access-management/organizations-and-projects/how-to/create-ssh-key/#how-to-upload-the-public-ssh-key-to-the-scaleway-interface) (if you don't already have one).
3. Upload it on [Scaleway](https://www.scaleway.com/en/docs/identity-and-access-management/organizations-and-projects/how-to/create-ssh-key/#how-to-upload-the-public-ssh-key-to-the-scaleway-interface).
4. Generate two API keys, one for the production bucket and one for the staging bucket.
5. You can then create two profiles for the Scaleway CLI with the following command:
```bash
scw init -p staging \
access-key={youraccesskey} \
secret-key={yoursecretkey} \
organization-id={organization} \
project-id={projectid}
```

### `minio` Client

This is optional but allows you to interact with the datalake from the command line (staging and prod).
It can be usefull for debug purposes.

See installation instructions [here](https://min.io/docs/minio/linux/reference/minio-mc.html).

You can then create aliases for Scaleway S3 staging and production, as well as one for your local Minio server. For your local server, you need to first create your API key. After launching Docker Compose, go to the [console](http://localhost:9001), click on the `Access Keys` tab, and create an access key.

You can add aliases with the following command:
```bash
mc alias set dev http://localhost:9000 {youraccesskey} {yoursecretkey}
```

Do the same for staging and production (replace the access key and the secret key with the API key you created in Scaleway):
```bash
mc alias set prod https://s3.fr-par.scw.cloud {youraccesskey} {yoursecretkey} --api S3v4
mc alias set staging https://s3.fr-par.scw.cloud {youraccesskey} {yoursecretkey} --api S3v4
```

You can test it out, and you should have results that look like this:
```bash
$ mc ls prod
[2024-04-22 13:33:54 CEST] 0B data-inclusion-datalake-prod-grand-titmouse/
$ mc ls staging
[2024-04-10 19:45:43 CEST] 0B data-inclusion-datalake-staging-sincere-buzzard/
$ mc ls dev
[2024-06-11 10:08:06 CEST] 0B data-inclusion-lake/
```

You can now easily interact with all the buckets.
60 changes: 60 additions & 0 deletions api/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,43 @@ alembic upgrade head
uvicorn data_inclusion.api.app:app --reload
```

## Initialize the Database with Data from staging or prod

### Prerequisites:
1. Launch Docker Compose.
2. Set up MinIO alias.

Check the [Deployment Guide](../DEPLOYMENT.md) for more details.

```bash
# Copy staging (or production) data mart to your local MinIO instance
mc cp --recursive staging/data-inclusion-datalake-staging-sincere-buzzard/data/marts/2024-06-12/ dev/data-inclusion-lake/data/marts/2024-06-12

# Activate the virtual environment and install dependencies
source .venv/bin/activate

# Launch command to import the Admin Express database
python src/data_inclusion/api/cli.py import_admin_express

# Launch command to import data
python src/data_inclusion/api/cli.py load_inclusion_data
```

## Initialize the Database with data compute by airflow locally

You can also run locally airflow (with potentially less sources or only the sources that interest you).
After running the main dag:
```bash
# Activate the virtual environment and install dependencies
source .venv/bin/activate

# Launch command to import the Admin Express database
python src/data_inclusion/api/cli.py import_admin_express

# Launch command to import data
python src/data_inclusion/api/cli.py load_inclusion_data
```

## Running the test suite

```bash
Expand All @@ -54,3 +91,26 @@ make
```bash
make upgrade all
```

### Infrastructure

The app is deployed on Scalingo. Make sure you have access to the console.

Just like Scaleway, it can be useful to install the [CLI](https://doc.scalingo.com/platform/cli/start).

You also need to upload your [public key](https://www.scaleway.com/en/docs/dedibox-console/account/how-to/upload-an-ssh-key/) for SSH connection. You can use the same key as Scaleway.

Here are three useful commands (example for staging):

```bash
# Open psql
scalingo -a data-inclusion-api-staging pgsql-console

# Launch a one-off container
scalingo -a data-inclusion-api-staging run bash

# Open a tunnel
scalingo -a data-inclusion-api-staging db-tunnel SCALINGO_POSTGRESQL_URL
```

Once you launch the tunnel, you need a user to finish opening the connection. You can create one from the DB dashboard in the user tab.
5 changes: 1 addition & 4 deletions pipeline/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,11 @@ tox
You can run dbt commands from your terminal.

```bash
# install dbt
pipx install --include-deps dbt-postgres==1.7.1

# install extra dbt packages (e.g. dbt_utils)
dbt deps

# load seeds
dbt seeds
dbt seed

# create user defined functions
dbt run-operation create_udfs
Expand Down

0 comments on commit 7a243d3

Please sign in to comment.