Skip to content

Vaccine Usage Analysis and Prediction: Predicted H1N1 flu vaccine acceptance using Python, Pandas, and Scikit-learn. Compared models, achieving 84% with Random Forest. Analyzed individual behaviors, perceptions, and demographics, enhancing data analysis and machine learning skills.

Notifications You must be signed in to change notification settings

thiruppu/H1N1-Vaccine-Prediction-using-Machine-Learning

Repository files navigation

Vaccine Usage Analysis and Prediction

Project Overview

This project aims to predict the likelihood of individuals taking an H1N1 flu vaccine based on their characteristics and attitudes. By analyzing a dataset containing various features related to behaviors, perceptions, and demographics, we build a predictive model that can help healthcare professionals and policymakers target vaccination campaigns more effectively.

Problem Statement

The goal is to predict the probability of individuals taking an H1N1 flu vaccine using various machine learning algorithms. The dataset includes information about individuals' behaviors, perceptions, and demographics. We aim to determine which model provides the best accuracy for this prediction task.

Dataset

The dataset used for this project includes features related to:

  • Demographics (age, sex, income, etc.)
  • Behaviors (frequency of hand washing, doctor visits, etc.)
  • Perceptions (worry about H1N1, awareness of vaccine, etc.)

Algorithms and Performance

We compared several machine learning algorithms to determine which provides the best accuracy for predicting vaccine acceptance. The algorithms and their accuracies are as follows:

  • Logistic Regression: 84.6%
  • Decision Tree: 77%
  • XGBoost: 83%
  • AdaBoost: 85%
  • Random Forest: 84%

Chosen Model

The Random Forest algorithm was selected for its balance of accuracy and interpretability. While AdaBoost provided slightly higher accuracy, the Random Forest model was chosen for its robustness and performance consistency.

Implementation

Dependencies

The project requires the following libraries:

  • Python 3.7+
  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • Seaborn

Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/vaccine-usage-analysis.git
    cd vaccine-usage-analysis
  2. Install the required libraries:

    pip install -r requirements.txt

Data Preprocessing

The dataset is preprocessed to handle missing values, encode categorical variables, and normalize numerical features. Key preprocessing steps include:

  • Filling missing values with appropriate imputation techniques.
  • Encoding categorical features using one-hot encoding.
  • Normalizing numerical features to ensure they have a similar scale.

Model Training

The preprocessed data is split into training and testing sets. The Random Forest model is trained using the training set and evaluated on the testing set. Hyperparameter tuning is performed using GridSearchCV to optimize the model's performance.

Evaluation

The model's performance is evaluated using accuracy, precision, recall, and F1-score. The confusion matrix is also analyzed to understand the model's predictions better.

Results

The Random Forest model achieved an accuracy of 84%, making it a reliable choice for predicting vaccine acceptance. Other models' performances are compared to highlight the strengths and weaknesses of each approach.

Visualizations

The following visualizations are included in the analysis:

  • Count of H1N1 worry levels
  • Number of health workers
  • Sex distribution of vaccine consumers
  • H1N1 awareness levels
  • Frequency of hand washing

Conclusion

The project successfully demonstrates the use of machine learning algorithms to predict vaccine acceptance. The insights gained can help healthcare professionals and policymakers design more effective vaccination campaigns.

Future Work

Future improvements can include:

  • Incorporating additional features to enhance model accuracy.
  • Exploring more advanced algorithms and ensemble techniques.
  • Deploying the model as a web application for real-time predictions.

Repository Structure

  • data/: Contains the dataset.
  • notebooks/: Jupyter notebooks for data exploration and model training.
  • scripts/: Python scripts for preprocessing and model training.
  • results/: Results and visualizations.
  • README.md: Project documentation.

About

Vaccine Usage Analysis and Prediction: Predicted H1N1 flu vaccine acceptance using Python, Pandas, and Scikit-learn. Compared models, achieving 84% with Random Forest. Analyzed individual behaviors, perceptions, and demographics, enhancing data analysis and machine learning skills.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published