GitHub - VenkataNikhil/Redshift-pipeline

Redshift-pipeline Scalable data pipeline to efficiently transfer batch data from AWS S3 buckets to Redshift data warehouse daily for analytical workloads.

Purpose
Automates the setup of an Amazon Redshift data warehouse and related infrastructure for analytics workloads

Tech Stack

Python, boto3, AWS Redshift, S3, IAM roles

Key Components

Infrastructure Provisioning
- Creates a Redshift cluster programmatically using boto3
- Handles IAM roles, security groups, and connectivity automatically
Configuration Management
- Reads in parameters from a config file (clusters ids, DB names etc.)
- Allows easy management of different environments
Data Ingestion
- Fetches data files stored in S3 bucket
- Sets up workflow to load data from S3 into Redshift

Outcomes

One-click creation of production-ready Redshift environment
Code for loading data pipeline from S3 for analysis
Config file-based configuration for easy management

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
CreateRedshiftDataWareHouse_CopyDataFromS3.ipynb		CreateRedshiftDataWareHouse_CopyDataFromS3.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

VenkataNikhil/Redshift-pipeline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages