Skip to content

Python package for dataset imports from UCI ML Repository

Notifications You must be signed in to change notification settings

VejayPersaud/ucimlrepo

 
 

Repository files navigation

Data Science Final Project

This repository contains the code and components for our Data Science Final Project, which focuses on building a full data pipeline for dataset exploration, preprocessing, model training, and hyperparameter tuning. Our project compares the performance of various machine learning models, including Neural Networks (NN), Support Vector Machines (SVM), Random Forests (RF), and K-Nearest Neighbors (KNN), and also includes an ensemble method for enhanced performance.


Getting Started

To run the full data pipeline:

  1. Open and run the main.ipynb file.
  2. This script will install the necessary dependencies via pip and call each subnotebook for seamless execution.

Project Structure

The project is organized into the following components:

  • main.ipynb: Orchestrates the end-to-end data pipeline.
  • data_exploration.ipynb: Generates charts and visualizations to provide insights into the dataset distribution.
  • data_preprocessing.ipynb: Handles data cleaning, transformation, and preparation for model training.
  • model_training.ipynb: Defines and trains machine learning models, including Neural Networks, SVM, Random Forest, and KNN.
  • grid_search.ipynb: Performs hyperparameter tuning to optimize model performance using Grid Search.
  • ensemble_method.ipynb: Combines multiple models through an ensemble method to improve prediction accuracy.

My Contribution

As a key member of the project team, I contributed significantly in the following areas:

  • Model Evaluation: Evaluated and compared the performance of different machine learning models (Neural Networks, SVM, Random Forest, KNN), using metrics such as accuracy, precision, and recall.
  • Dataset Selection: Conducted research to select a suitable dataset for the problem at hand, ensuring it met the project's objectives and provided meaningful insights.
  • Code Contributions: Drafted and implemented critical sections of the model evaluation and hyperparameter tuning code to improve the performance of our models.

Skills & Technologies Utilized

  • Languages: Python
  • Tools: Jupyter Notebooks, Pandas, Scikit-learn, Matplotlib, Seaborn
  • Models: Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbors (KNN)
  • Machine Learning Techniques: Data Preprocessing, Model Training, Hyperparameter Tuning, Ensemble Methods

How to Run the Project

  1. Clone the repository to your local machine:
    git clone https://github.com/your-username/project-repo.git
    
  2. Install the dependencies:
    pip install -r requirements.txt
    
  3. Run the Jupyter notebook:

About

Python package for dataset imports from UCI ML Repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%