GitHub - Tejas2512/fitbit: Automate entire machine learning pipeline and create end to end solution for given project.

Automate entire machine learning pipeline and create end to end solution for given project.

Project : To build a regression model to predict the calories burnt based on the given indicators in the training data.

Approach:

Entire project divided into following pipelines:

(1) Data collection (2) Data validation (3) Data insertion in Database (4) Data pre-processing (5) Model training and prediction

Data collection

Data had been coming from some remote sources like Azure storage, GCP storage, or AWS S3 bucket.

Data validation

In this stage file name, columns name, columns length and missing values in columns (if entire column has null value then drop it) is validate according to training and prediction schema. Files passed all validation steps moved to good data container and file failed in data validation moved to bad data container and send email to client with rejected files.

Data insertion in Database

After data validation we store all good files (files successfully passed all validation steps) into MongoDB atlas collection and create final master csv file for prediction. After that we export master csv file to azure container.

Data pre-processing

In data pre-processing we performed feature scaling, drop column with 0 standard deviation, drop duplicated rows, separate dependant and independent features, and train-test split.

Model training

In model training we grouped data has same pattern using KMeans clustering and label each data with cluster number. Then we trained each cluster on different algorithms (Stacking, bagging, boosting, SVM, KNN etc.), algorithms with highest r2_score get saved with <model_name><cluster_name>.sav in azure container. This step repeat number of cluster times.

Prediction

All the steps (exclude model training) performed during prediction also. According to cluster number pretrained model get selected for prediction. At end of prediction prediction.csv file saved in azure container and path appear in application.

Tools & Libraries

Language: python3.6

Tools: Docker (Build docker image)

Cloud: Azure (Store training files, model files, prediction files and metadata)

Database: MongoDB atlas (Store Logs, Schemas, Evolution matrices and Metafiles)

Libraries: sklearn, pandas, NumPy, flask, azure-storage, pymongo etc.

Detail explanation provided in Problem Statement.docx** file.

Command we use to build docker image:

docker image build -t <REPOSITORY>

docker images

docker ps

docker run -p 5000:5000 -d <REPOSITORY>

docker login dockerfitbit.azurecr.io

docker push dockerfitbit.azurecr.io/mlfitbit:latest

docker stop <containerID>

docker system prune

url: https://fit-bitapp.herokuapp.com/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
DataTransform_Training		DataTransform_Training
DataTransformation_Prediction		DataTransformation_Prediction
DataTypeValidation_Insertion_Prediction		DataTypeValidation_Insertion_Prediction
DataTypeValidation_Insertion_Training		DataTypeValidation_Insertion_Training
EDA		EDA
EncoderPickle		EncoderPickle
Prediction_Raw_Data_Validation		Prediction_Raw_Data_Validation
Training_Raw_data_validation		Training_Raw_data_validation
application_logging		application_logging
best_model_finder		best_model_finder
configfile		configfile
data_ingestion		data_ingestion
data_preprocessing		data_preprocessing
file_operations		file_operations
templates		templates
Problem Statement.docx		Problem Statement.docx
Procfile		Procfile
README.md		README.md
app.py		app.py
app.yaml		app.yaml
azure_file.py		azure_file.py
docker-machine		docker-machine
load_yaml.py		load_yaml.py
main.py		main.py
predictFromModel.py		predictFromModel.py
prediction_Validation_Insertion.py		prediction_Validation_Insertion.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
template.py		template.py
trainingModel.py		trainingModel.py
training_Validation_Insertion.py		training_Validation_Insertion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automate entire machine learning pipeline and create end to end solution for given project.

Detail explanation provided in Problem Statement.docx** file.

Command we use to build docker image:

About

Releases

Packages

Languages

Tejas2512/fitbit

Folders and files

Latest commit

History

Repository files navigation

Automate entire machine learning pipeline and create end to end solution for given project.

** Detail explanation provided in Problem Statement.docx file.

Command we use to build docker image:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Detail explanation provided in Problem Statement.docx** file.

Packages