-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Change urls to restrict parameters to uuids + type every view * Update readme * Tweak coverage * Cleanup pipeline templates * Cleanup catalog templates * Cleanup card templates * Go back to optional * Remove pyupgrade
- Loading branch information
1 parent
082475d
commit 0061210
Showing
37 changed files
with
161 additions
and
798 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
*.env | ||
.coverage | ||
htmlcov/ | ||
|
||
k8s/* | ||
!k8s/sample_app.yaml | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,19 +18,19 @@ OpenHexa is an **open-source data integration platform** that allows users to: | |
</div> | ||
|
||
OpenHexa architecture | ||
--------------------- | ||
===================== | ||
|
||
The OpenHexa platform is composed of **three main components**: | ||
|
||
- The **App component**, a Django application that acts as the user-facing part of the OpenHexa platform | ||
- The **Notebooks component** (a [JupyterHub](https://jupyter.org/hub) setup) | ||
- The **Data Pipelines component** (build on top of [Airflow](https://airflow.apache.org/)) | ||
- The **Notebooks component** (a customized [JupyterHub](https://jupyter.org/hub) setup) | ||
- The **Data Pipelines component** (built on top of [Airflow](https://airflow.apache.org/)) | ||
|
||
This repository contains the code for the **App component**, which servers as the user-facing part of the OpenHexa | ||
This repository contains the code for the **App component**, which serves as the user-facing part of the OpenHexa | ||
stack. | ||
|
||
The code related to the Notebooks component can be found in the | ||
[`openhexa-notebooks`](https://github.com/blsq/openhexa-notebooks) repository, while the Data Pipelines component | ||
[`openhexa-notebooks`](https://github.com/blsq/openhexa-notebooks) repository, while the Data Pipelines component | ||
code resides in the [`openhexa-pipelines`](https://github.com/blsq/openhexa-pipelines) repository. | ||
|
||
App component overview | ||
|
@@ -39,269 +39,40 @@ App component overview | |
The **App component** is a [Django](https://www.djangoproject.com/) application connected to a | ||
[PostgreSQL](https://www.postgresql.org/) database. | ||
|
||
This component is meant to be deployed in a [Kubernetes](https://kubernetes.io/) cluster (either in a public cloud or in | ||
your own infrastructure). | ||
|
||
The **App component** is the main point of entry to the OpenHexa platform. It provides: | ||
|
||
- User management capabilities | ||
- A browsable Data Catalog | ||
- An advanced search engine | ||
- A dashboard | ||
|
||
Additionally, it acts as a frontend for the **Notebooks** component (which is embedded in the app component as an | ||
Additionally, it acts as a frontend for the **Notebooks** component (which is embedded in the app component as an | ||
iframe) and for the **Data pipelines** component. | ||
|
||
OpenHexa can connect to a wide range of **data stores**, such as AWS S3 / Google Cloud GCS buckets, | ||
OpenHexa can connect to a wide range of **data stores**, such as AWS S3 / Google Cloud GCS buckets, | ||
DHIS2 instances, PostgreSQL databases... | ||
|
||
**Data stores** in OpenHexa can be categorized under three different categories: | ||
|
||
1. **Primary Data Sources**: those data sources are external to the platform. They are **read-only**: OpenHexa will | ||
1. **Primary Data Sources**: those data sources are external to the platform. They are **read-only**: OpenHexa will | ||
never alter the data residing in primary data sources. Users can schedule data extracts in **data lakes** | ||
or **data warehouses** to work on the extracted data. | ||
1. **Data Lakes**: those data stores are buckets of flat files of various formats (CSV, GPKG, Jupyter | ||
notebooks...). Data residing in data lakes can be read and written to. | ||
1. **Data Warehouses**: those data stores are read/write databases (as of now, only PostgreSQL data warehouses are | ||
implemented). | ||
|
||
Provisioning | ||
------------ | ||
|
||
**Note:** the following instructions are tailored to a Google Cloud Platform setup (using Google Kubernetes Engine and | ||
Google Cloud SQL). OpenHexa can be deployed on other cloud providers or in a private cloud, but you will need to adapt | ||
the instructions below to your infrastructure of choice. | ||
|
||
### Requirements | ||
|
||
In order to run the OpenHexa **App component**, you will need: | ||
|
||
1. A **Kubernetes cluster** | ||
1. A **PostgreSQL server** running PostgreSQL 11 or later | ||
|
||
It is perfectly fine to run the OpenHexa **App component** in an existing Kubernetes cluster. All the Kubernetes | ||
resources created for this component will be attached to a specific Kubernetes namespace named `hexa-app`. | ||
|
||
### Configure gcloud | ||
|
||
We will need the [`gcloud`](https://cloud.google.com/sdk/gcloud) command-line tool for the next steps. Make sure it is | ||
installed and configured properly - among other things, that the appropriate configuration is active. | ||
|
||
The following command will show which configuration you are using: | ||
|
||
```bash | ||
gcloud config configurations list | ||
``` | ||
|
||
### Create a global IP address (and a DNS record) | ||
|
||
The Kubernetes ingress used to access the OpenHexa app component exposes an external IP. This IP might change when | ||
re-deploying or re-provisioning. In order to prevent it, create an address in GCP compute and get back its value: | ||
|
||
```bash | ||
gcloud compute addresses create <HEXA_APP_ADDRESS_NAME> --global | ||
gcloud compute addresses describe <HEXA_APP_ADDRESS_NAME> --global | ||
``` | ||
|
||
Then, you can create a DNS record that points to the ip address returned by the `describe` command above. | ||
|
||
### Create a Cloud SQL instance, database and user | ||
|
||
Unless you already have a ready-to-use Google Cloud SQL instance, you can create one using the following command: | ||
|
||
```bash | ||
gcloud sql instances create hexa-prime \ | ||
--database-version=POSTGRES_12 \ | ||
--tier=db-custom-1-3840 --zone=europe-west1-b --root-password=asecurepassword | ||
``` | ||
|
||
You will then need to create a database for the App component: | ||
|
||
```bash | ||
gcloud sql databases create hexa-app --instance=hexa-prime | ||
``` | ||
|
||
You will need a user as well: | ||
|
||
```bash | ||
gcloud sql users create hexa-app --instance=hexa-prime --password=asecurepassword | ||
``` | ||
|
||
🚨 The created user will have root access on your instance. You should make sure to adapt its permissions accordingly if | ||
needed. | ||
|
||
The last step is to get the connection string of your Cloud SQL instance. Launch the following command and write down | ||
the value next to the `connectionName` key, you will need it later: | ||
|
||
```bash | ||
gcloud sql instances describe hexa-prime | ||
``` | ||
|
||
### Create a service account for the Cloud SQL proxy | ||
|
||
The OpenHexa app component will connect to the Cloud SQL instance using a | ||
[Cloud SQL Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy). The proxy requires a GCP service account. If | ||
you have not created such a service account yet, create one: | ||
|
||
```bash | ||
gcloud iam service-accounts create hexa-cloud-sql-proxy \ | ||
--display-name=hexa-cloud-sql-proxy \ | ||
--description='Used to allow pods to access Cloud SQL' | ||
``` | ||
|
||
Give it the `roles/cloudsql.client` role: | ||
|
||
```bash | ||
gcloud projects add-iam-policy-binding blsq-dip-test \ | ||
--member=serviceAccount:[email protected] \ | ||
--role=roles/cloudsql.client | ||
``` | ||
|
||
Finally, download a key file for the service account and keep it somewhere safe, we will need it later: | ||
|
||
```bash | ||
mkdir -p ../gcp_keyfiles | ||
gcloud iam service-accounts keys create ../gcp_keyfiles/hexa-cloud-sql-proxy.json \ | ||
--iam-account=hexa-cloud-sql-proxy@blsq-dip-test.iam.gserviceaccount.com | ||
``` | ||
|
||
Note that we deliberately download the key file outside the current repository to avoid it being included | ||
in Git or in the Docker image. | ||
|
||
### Create a GKE cluster: | ||
|
||
Unless you already have a running Kubernetes cluster, you need to create one. The following command | ||
will create a new cluster in Google Kubernetes Engine, along with a default node pool: | ||
|
||
```bash | ||
gcloud container clusters create hexa-prime \ | ||
--machine-type=n2-standard-2 \ | ||
--zone=europe-west1-b \ | ||
--enable-autoscaling \ | ||
--num-nodes=1 \ | ||
--min-nodes=1 \ | ||
--max-nodes=4 \ | ||
--cluster-version=latest | ||
``` | ||
|
||
The `node-labels` and `node-taints` options will allow JupyterHub to spawn the single-user Jupyter server pods in the | ||
user node pool. | ||
|
||
To make sure that the `kubectl` utility can access the newly created cluster, you need to launch another command: | ||
|
||
```bash | ||
gcloud container clusters get-credentials hexa-prime --region=europe-west1-b | ||
``` | ||
|
||
Deploying | ||
--------- | ||
|
||
The OpenHexa **App component** can be deployed with the `kubectl` utility. Almost all the required resources can be | ||
contained in a single file (we provide a sample `k8s/sample_app.yaml` file to serve as a basis). | ||
|
||
As we want all resources to be located in a specific Kubernetes namespace, create it if it does not exist yet: | ||
|
||
```bash | ||
kubectl create namespace hexa-app | ||
``` | ||
|
||
Before we can deploy the app component, we need to create a secret for the Cloud SQL proxy: | ||
|
||
```bash | ||
kubectl create secret generic hexa-cloudsql-oauth-credentials -n hexa-app \ | ||
--from-file=credentials.json=../gcp_keyfiles/hexa-cloud-sql-proxy.json | ||
``` | ||
|
||
We need another secret for the Django environment variables. First, you need to generate a secret key for the | ||
Django application, as well as an encryption key used to encrypt the various credentials stored in the database: | ||
|
||
```bash | ||
docker-compose run app manage generate_key SECRET_KEY | ||
docker-compose run app manage generate_key ENCRYPTION_KEY | ||
``` | ||
|
||
Then, create the secret: | ||
|
||
```bash | ||
kubectl create secret generic app-secret -n hexa-app \ | ||
--from-literal DATABASE_USER=<HEXA_APP_DATABASE_USER> \ | ||
--from-literal DATABASE_PASSWORD=<HEXA_APP_DATABASE_PASSWORD> \ | ||
--from-literal DATABASE_NAME=<HEXA_APP_DATABASE_NAME> \ | ||
--from-literal SECRET_KEY=<HEXA_APP_SECRET_KEY> \ | ||
--from-literal ENCRYPTION_KEY=<HEXA_APP_ENCRYPTION_KEY> | ||
``` | ||
|
||
Then, you can copy the sample file and adapt it to your needs: | ||
|
||
```bash | ||
cp k8s/sample_app.yaml k8s/app.yaml | ||
nano k8s/app.yaml | ||
``` | ||
|
||
A few notes about the sample file: | ||
Running OpenHexa | ||
================ | ||
|
||
1. `HEXA_APP_DOMAIN` should be replaced by the value of the DNS record that points to your OpenHexa app instance | ||
(`openhexa.yourorg.com` for example) | ||
1. `HEXA_APP_NODE_POOL_SELECTOR` should be set to the name of the node pool that will run your OpenHexa app pods | ||
(example: `default-pool`) | ||
1. `HEXA_APP_IMAGE` is the full path of the OpenHexa app image (`blsq/openhexa-app:latest` or `blsq/openhexa-app:0.3.1`, | ||
or a path to a custom image)1 | ||
1. `HEXA_CLOUDSQL_CONNECTION_STRING` corresponds to the `connectionName` value returned by the | ||
`gcloud sql instances describe` command (see above) | ||
1. `HEXA_APP_ADDRESS_NAME` is the named used when creating the address using the `gcloud compute addresses create` command | ||
1. `HEXA_NOTEBOOKS_URL` should be replaced by the URL of the DNS record that points to your OpenHexa notebooks | ||
instance (`https://notebooks.openhexa.yourorg.com` for example) | ||
|
||
You can then deploy the app component using `kubectl apply`: | ||
|
||
```bash | ||
kubectl apply -n hexa-app -f k8s/app.yaml | ||
``` | ||
|
||
Don't forget to run the migrations (with fixtures if needed): | ||
|
||
```bash | ||
# Migrate | ||
kubectl exec deploy/app-deployment -n hexa-app -- python manage.py migrate | ||
# Load fixtures | ||
kubectl exec deploy/app-deployment -n hexa-app -- python manage.py loaddata demo.json | ||
``` | ||
|
||
If you need to run a command in a pod, you can use the following: | ||
|
||
```bash | ||
kubectl exec -it deploy/app-deployment -n hexa-app -- bash | ||
``` | ||
|
||
By default, the datasource refresh (synchronization) is done in the web server. You can activate an asynchronous | ||
refresh with the settings ```DATASOURCE_ASYNC_REFRESH```.If you have slow datasources, it could be used | ||
to decrease the latency of the web server. In that case, a queue will be used to dispatch the refresh request to a | ||
background worker. This worker is called ```sync_datasources_worker```, callable like this: | ||
```bash | ||
python manage.py sync_datasources_worker | ||
``` | ||
|
||
Building the Docker image | ||
------------------------- | ||
Docker image | ||
------------ | ||
|
||
The OpenHexa app Docker image is publicly available on Docker Hub | ||
([blsq/openhexa-app](https://hub.docker.com/r/blsq/openhexa-app)). | ||
|
||
This repository also provides a Github workflow to build the Docker image in the `.github/workflows` directory. | ||
|
||
Provision and deploy the Notebooks component | ||
-------------------------------------------- | ||
|
||
The app component will embed the [Notebooks component](https://github.com/blsq/openhexa-notebooks) as an `iframe` in a | ||
dedicated section. | ||
|
||
Before deploying the App component, you will need to deploy the Notebooks component, following the instructions | ||
provided in the [`README.md`](https://github.com/blsq/openhexa-notebooks/blob/main/README.md) of the Notebooks | ||
component. | ||
|
||
It's important to have the Notebooks and App components running on the same top-level domain, as we use cookies for | ||
cross-component authentication. | ||
|
||
Local development | ||
----------------- | ||
|
||
|
@@ -315,6 +86,11 @@ docker-compose run app fixtures | |
docker-compose up | ||
``` | ||
|
||
This will start all the required services and processes, correctly configure all the environment variables | ||
and fill the database with some initial data. | ||
|
||
You can then log in with the following credentials: `[email protected]`/`root` | ||
|
||
### Running the tests | ||
|
||
Running the tests is as simple as: | ||
|
@@ -336,19 +112,14 @@ Test coverage is evaluated using the [`coverage`](https://github.com/nedbat/cove | |
docker-compose run app coverage | ||
``` | ||
|
||
### Tailwind | ||
|
||
OpenHexa uses [TailwindUI](https://tailwindui.com/) and [TailwindCSS](https://tailwindcss.com/) for the user interface. | ||
No specific step is required to use it, unless you want to perform changes to the TailwindUI/TailwindCSS configuration. | ||
|
||
To be able to do that, you need to install `django-tailwind` and start tailwind in dev mode: | ||
|
||
`docker-compose run app manage tailwind install`. | ||
`docker-compose run app manage tailwind start`. | ||
|
||
### Code style | ||
|
||
Our python code is linted using [`black`](https://github.com/psf/black). | ||
Our python code is linted using [`black`](https://github.com/psf/black), [`isort`](https://github.com/PyCQA/isort) and [`autoflake`](https://github.com/myint/autoflake). | ||
We currently target the Python 3.9 syntax. | ||
|
||
We use a [pre-commit](https://pre-commit.com/) hook to lint the code before committing. Make sure that `pre-commit` is | ||
installed, and run `pre-commit install` the first time you check out the code. | ||
installed, and run `pre-commit install` the first time you check out the code. Linting will again be checked | ||
when submitting a pull request. | ||
|
||
OpenHexa uses [TailwindUI](https://tailwindui.com/), [TailwindCSS](https://tailwindcss.com/) | ||
and [Heroicons](https://heroicons.com/) for the user interface. |
23 changes: 0 additions & 23 deletions
23
hexa/catalog/templates/catalog/partials/datasource_sync_info.html
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.