The objective of our project is to provide analytical datasets from our Northwind database.
The users of our datasets are Data Analysts. From dashboards using Power BI and Tableau.
- How many order by regions, and territories and ship_countries?
- Time between order-date and ship-date? delay in shipment, lead time
- Which supplier has the most popular the product?
- Which supplier can cover the most orders?
- How many orders packed by each employee?
Source name | Source type | Source documentation |
---|---|---|
Northwind database for Postgres | PostgreSQL database | Source |
Architecture:
- Source Database (PostgreSQL on AWS RDS): Holds raw data.
- Airbyte on EC2: Extracts data from RDS.
- DBT: Transforms data.
- Snowflake Data Warehouse: Stores transformed data.
- Preset: Uses data for visualization.
Patterns:
- Data Extraction: Batch extraction with Airbyte.
- Data Loading: Batch loading to Snowflake.
- Data Transformation: ELT pattern.
Here is a solution architecture diagram:
We used the github project kanban to manage our project.
Tasks:
-
Data Extraction and Transformation:
- DE: Extract data using Airbyte.
- DE: Transform data using DBT.
-
Infrastructure Setup:
- CA: Provision EC2, Snowflake, and Databricks.
- CA: Set up monitoring.
-
Data Loading and Quality:
- DE: Load data into Snowflake.
- DA and DE: Ensure data quality.
-
Analytics and Visualization:
- DA: Create Power BI reports.
This simplified breakdown helps clarify who does what in the project and ensures clear responsibility assignments for each task.