Data Science Final Project

This repository contains the code and components for our Data Science Final Project, which focuses on building a full data pipeline for dataset exploration, preprocessing, model training, and hyperparameter tuning. Our project compares the performance of various machine learning models, including Neural Networks (NN), Support Vector Machines (SVM), Random Forests (RF), and K-Nearest Neighbors (KNN), and also includes an ensemble method for enhanced performance.

Getting Started

To run the full data pipeline:

Open and run the main.ipynb file.
This script will install the necessary dependencies via pip and call each subnotebook for seamless execution.

Project Structure

The project is organized into the following components:

main.ipynb: Orchestrates the end-to-end data pipeline.
data_exploration.ipynb: Generates charts and visualizations to provide insights into the dataset distribution.
data_preprocessing.ipynb: Handles data cleaning, transformation, and preparation for model training.
model_training.ipynb: Defines and trains machine learning models, including Neural Networks, SVM, Random Forest, and KNN.
grid_search.ipynb: Performs hyperparameter tuning to optimize model performance using Grid Search.
ensemble_method.ipynb: Combines multiple models through an ensemble method to improve prediction accuracy.

My Contribution

As a key member of the project team, I contributed significantly in the following areas:

Model Evaluation: Evaluated and compared the performance of different machine learning models (Neural Networks, SVM, Random Forest, KNN), using metrics such as accuracy, precision, and recall.
Dataset Selection: Conducted research to select a suitable dataset for the problem at hand, ensuring it met the project's objectives and provided meaningful insights.
Code Contributions: Drafted and implemented critical sections of the model evaluation and hyperparameter tuning code to improve the performance of our models.

Skills & Technologies Utilized

Languages: Python
Tools: Jupyter Notebooks, Pandas, Scikit-learn, Matplotlib, Seaborn
Models: Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbors (KNN)
Machine Learning Techniques: Data Preprocessing, Model Training, Hyperparameter Tuning, Ensemble Methods

How to Run the Project

Clone the repository to your local machine:

git clone https://github.com/your-username/project-repo.git

Install the dependencies:
```
pip install -r requirements.txt
```
Run the Jupyter notebook:

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.gitignore		.gitignore
README.md		README.md
data_exploration.ipynb		data_exploration.ipynb
data_preprocessing.ipynb		data_preprocessing.ipynb
ensemble_method.ipynb		ensemble_method.ipynb
hyperparameter_tuning.ipynb		hyperparameter_tuning.ipynb
main.ipynb		main.ipynb
model_training.ipynb		model_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Final Project

Getting Started

Project Structure

My Contribution

Skills & Technologies Utilized

How to Run the Project

About

Releases

Packages

Languages

VejayPersaud/ucimlrepo

Folders and files

Latest commit

History

Repository files navigation

Data Science Final Project

Getting Started

Project Structure

My Contribution

Skills & Technologies Utilized

How to Run the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages