Pima_Indians_Diabetes

Kaggle problem: Predict the onset of diabetes based on diagnostic measures.

Create andactivate the virtual environment

First, create an environment with conda where we will pre-install python 3.9:

conda create --name PimaIndianDiabetes python=3.9

Then activate the environment and change directory:

conda activate PimaIndianDiabetes

cd anaconda3/envs/PimaIndianDiabetes

Instal the prerequisites

Use pip install to install the packages needed to run the code. These are the following: pandas, numpy, matplotlib, seaborn, scikit-learn and xgboost.

Download the dataset

Download the dataset from Kaggle: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

Run the code

Then you can download main.py and run it in the terminal:

python main.py

Dataset info

The dataset has many different variables: Pregnancies - Number of times pregnant

Glucose - Plasma glucose concentration a 2 hours in an oral glucose tolerance test

Blood Pressure - Diastolic blood pressure

SkinThickness - Triceps skin fold thickness

Insulin - 2-Hour serum insulin

BMI - Body Mass Index

DiabetesPedigreeFunction - Diabetes pedigree function

Age - Age

Outcome - Whether or not the person is diabetic

The variable 'Outcome' is the target and all the other variables are the predictors. Therefore, we use the predictors in four different ML models to find out the target value.

ML models

First, we standardize the variables to follow a normal distribution with a mean of 0 and standard deviation of 1. We do that by substracting the mean value and dividing with the standard deviation of each variable.

Then, we split the data into two sets, training and testing.

Then we apply these four ML models: Logistic Regression Classifier

Decision Tree Classifier

Random Forest Classifier

XGBoost Classifier

Evaluation

Finally, we plot the accuracy of each model.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pima_Indians_Diabetes

Create andactivate the virtual environment

Instal the prerequisites

Download the dataset

Run the code

Dataset info

ML models

Evaluation

About

Releases

Packages

Languages

License

theostath/Pima_Indians_Diabetes

Folders and files

Latest commit

History

Repository files navigation

Pima_Indians_Diabetes

Create andactivate the virtual environment

Instal the prerequisites

Download the dataset

Run the code

Dataset info

ML models

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages