diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a3384964a..ee0a06495 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -36,9 +36,12 @@ After a few seconds, the services should be avaible as follow: | airflow UI | [http://localhost:8080](http://localhost:8080) | user: `airflow` pass: `airflow` | | data.inclusion | [http://localhost:8000](http://localhost:8000/api/v0/docs) | token must be generated | -### `minio` client -Optional, but it allows you to interact with the datalake from the commandline. +### `minio` Client + +This is optional but allows you to interact with the datalake locally from the command line. + +Cf [DEPLOYMENT.md](DEPLOYMENT.md) if you also wich to interact with staging and prod bucket. See installation instructions [here](https://min.io/docs/minio/linux/reference/minio-mc.html). diff --git a/DEPLOYMENT.md b/DEPLOYMENT.md index c2b5dff65..5c26b419b 100644 --- a/DEPLOYMENT.md +++ b/DEPLOYMENT.md @@ -3,4 +3,55 @@ * The project is deployable on the Scalingo platform. * Each service (pipeline, api, etc.) is deployed in its own application. * It is made possible using the [`PROJECT_DIR`](https://doc.scalingo.com/platform/getting-started/common-deployment-errors#project-in-a-subdirectory) env variable defined in each app. -* Services are configured through the environment. \ No newline at end of file +* Services are configured through the environment. + + + +### Scaleway + +If you need to interact with Scaleway, once you have your access with the right IAM configuration: + +1. Install [Scaleway CLI](https://www.scaleway.com/en/docs/developer-tools/scaleway-cli/quickstart/#how-to-install-the-scaleway-cli-locally). +2. Generate an [SSH key](https://www.scaleway.com/en/docs/identity-and-access-management/organizations-and-projects/how-to/create-ssh-key/#how-to-upload-the-public-ssh-key-to-the-scaleway-interface) (if you don't already have one). +3. Upload it on [Scaleway](https://www.scaleway.com/en/docs/identity-and-access-management/organizations-and-projects/how-to/create-ssh-key/#how-to-upload-the-public-ssh-key-to-the-scaleway-interface). +4. Generate two API keys, one for the production bucket and one for the staging bucket. +5. You can then create two profiles for the Scaleway CLI with the following command: + ```bash + scw init -p staging \ + access-key={youraccesskey} \ + secret-key={yoursecretkey} \ + organization-id={organization} \ + project-id={projectid} + ``` + +### `minio` Client + +This is optional but allows you to interact with the datalake from the command line (staging and prod). +It can be usefull for debug purposes. + +See installation instructions [here](https://min.io/docs/minio/linux/reference/minio-mc.html). + +You can then create aliases for Scaleway S3 staging and production, as well as one for your local Minio server. For your local server, you need to first create your API key. After launching Docker Compose, go to the [console](http://localhost:9001), click on the `Access Keys` tab, and create an access key. + +You can add aliases with the following command: +```bash +mc alias set dev http://localhost:9000 {youraccesskey} {yoursecretkey} +``` + +Do the same for staging and production (replace the access key and the secret key with the API key you created in Scaleway): +```bash +mc alias set prod https://s3.fr-par.scw.cloud {youraccesskey} {yoursecretkey} --api S3v4 +mc alias set staging https://s3.fr-par.scw.cloud {youraccesskey} {yoursecretkey} --api S3v4 +``` + +You can test it out, and you should have results that look like this: +```bash +$ mc ls prod +[2024-04-22 13:33:54 CEST] 0B data-inclusion-datalake-prod-grand-titmouse/ +$ mc ls staging +[2024-04-10 19:45:43 CEST] 0B data-inclusion-datalake-staging-sincere-buzzard/ +$ mc ls dev +[2024-06-11 10:08:06 CEST] 0B data-inclusion-lake/ +``` + +You can now easily interact with all the buckets. \ No newline at end of file diff --git a/api/CONTRIBUTING.md b/api/CONTRIBUTING.md index a9f6cd301..cc27ed611 100644 --- a/api/CONTRIBUTING.md +++ b/api/CONTRIBUTING.md @@ -30,6 +30,43 @@ alembic upgrade head uvicorn data_inclusion.api.app:app --reload ``` +## Initialize the Database with Data from staging or prod + +### Prerequisites: +1. Launch Docker Compose. +2. Set up MinIO alias. + +Check the [Deployment Guide](../DEPLOYMENT.md) for more details. + +```bash +# Copy staging (or production) data mart to your local MinIO instance +mc cp --recursive staging/data-inclusion-datalake-staging-sincere-buzzard/data/marts/2024-06-12/ dev/data-inclusion-lake/data/marts/2024-06-12 + +# Activate the virtual environment and install dependencies +source .venv/bin/activate + +# Launch command to import the Admin Express database +python src/data_inclusion/api/cli.py import_admin_express + +# Launch command to import data +python src/data_inclusion/api/cli.py load_inclusion_data +``` + +## Initialize the Database with data compute by airflow locally + +You can also run locally airflow (with potentially less sources or only the sources that interest you). +After running the main dag: +```bash +# Activate the virtual environment and install dependencies +source .venv/bin/activate + +# Launch command to import the Admin Express database +python src/data_inclusion/api/cli.py import_admin_express + +# Launch command to import data +python src/data_inclusion/api/cli.py load_inclusion_data +``` + ## Running the test suite ```bash @@ -54,3 +91,26 @@ make ```bash make upgrade all ``` + +### Infrastructure + +The app is deployed on Scalingo. Make sure you have access to the console. + +Just like Scaleway, it can be useful to install the [CLI](https://doc.scalingo.com/platform/cli/start). + +You also need to upload your [public key](https://www.scaleway.com/en/docs/dedibox-console/account/how-to/upload-an-ssh-key/) for SSH connection. You can use the same key as Scaleway. + +Here are three useful commands (example for staging): + +```bash +# Open psql +scalingo -a data-inclusion-api-staging pgsql-console + +# Launch a one-off container +scalingo -a data-inclusion-api-staging run bash + +# Open a tunnel +scalingo -a data-inclusion-api-staging db-tunnel SCALINGO_POSTGRESQL_URL +``` + +Once you launch the tunnel, you need a user to finish opening the connection. You can create one from the DB dashboard in the user tab. \ No newline at end of file diff --git a/pipeline/CONTRIBUTING.md b/pipeline/CONTRIBUTING.md index 53307cb59..ed58fe312 100644 --- a/pipeline/CONTRIBUTING.md +++ b/pipeline/CONTRIBUTING.md @@ -32,14 +32,11 @@ tox You can run dbt commands from your terminal. ```bash -# install dbt -pipx install --include-deps dbt-postgres==1.7.1 - # install extra dbt packages (e.g. dbt_utils) dbt deps # load seeds -dbt seeds +dbt seed # create user defined functions dbt run-operation create_udfs