This project aims to predict the likelihood of individuals taking an H1N1 flu vaccine based on their characteristics and attitudes. By analyzing a dataset containing various features related to behaviors, perceptions, and demographics, we build a predictive model that can help healthcare professionals and policymakers target vaccination campaigns more effectively.
The goal is to predict the probability of individuals taking an H1N1 flu vaccine using various machine learning algorithms. The dataset includes information about individuals' behaviors, perceptions, and demographics. We aim to determine which model provides the best accuracy for this prediction task.
The dataset used for this project includes features related to:
- Demographics (age, sex, income, etc.)
- Behaviors (frequency of hand washing, doctor visits, etc.)
- Perceptions (worry about H1N1, awareness of vaccine, etc.)
We compared several machine learning algorithms to determine which provides the best accuracy for predicting vaccine acceptance. The algorithms and their accuracies are as follows:
- Logistic Regression: 84.6%
- Decision Tree: 77%
- XGBoost: 83%
- AdaBoost: 85%
- Random Forest: 84%
The Random Forest algorithm was selected for its balance of accuracy and interpretability. While AdaBoost provided slightly higher accuracy, the Random Forest model was chosen for its robustness and performance consistency.
The project requires the following libraries:
- Python 3.7+
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
-
Clone the repository:
git clone https://github.com/yourusername/vaccine-usage-analysis.git cd vaccine-usage-analysis
-
Install the required libraries:
pip install -r requirements.txt
The dataset is preprocessed to handle missing values, encode categorical variables, and normalize numerical features. Key preprocessing steps include:
- Filling missing values with appropriate imputation techniques.
- Encoding categorical features using one-hot encoding.
- Normalizing numerical features to ensure they have a similar scale.
The preprocessed data is split into training and testing sets. The Random Forest model is trained using the training set and evaluated on the testing set. Hyperparameter tuning is performed using GridSearchCV to optimize the model's performance.
The model's performance is evaluated using accuracy, precision, recall, and F1-score. The confusion matrix is also analyzed to understand the model's predictions better.
The Random Forest model achieved an accuracy of 84%, making it a reliable choice for predicting vaccine acceptance. Other models' performances are compared to highlight the strengths and weaknesses of each approach.
The following visualizations are included in the analysis:
- Count of H1N1 worry levels
- Number of health workers
- Sex distribution of vaccine consumers
- H1N1 awareness levels
- Frequency of hand washing
The project successfully demonstrates the use of machine learning algorithms to predict vaccine acceptance. The insights gained can help healthcare professionals and policymakers design more effective vaccination campaigns.
Future improvements can include:
- Incorporating additional features to enhance model accuracy.
- Exploring more advanced algorithms and ensemble techniques.
- Deploying the model as a web application for real-time predictions.
data/
: Contains the dataset.notebooks/
: Jupyter notebooks for data exploration and model training.scripts/
: Python scripts for preprocessing and model training.results/
: Results and visualizations.README.md
: Project documentation.