Our models are contained in the NHANES.ipynb
notebook. In order to run the notebook, create a virtual environment and install the required modules.
# create a virtual environment, "nhanes"
$ mkvirtualenv --python=/usr/local/bin/python3 nhanes
$ workon nhanes
# install required modules
$ pip install -r requirements.txt
# download/merge data
$ python ./bootstrap.py
# start ipython notebook
$ ipython notebook
You can find our report here.
Prediction of disease onset from patient survey and lifestyle data is quickly becoming an important tool for diagnosing a disease before it progresses. In this study data from the National Health and Nutrition Examination Survey (NHANES) questionnaire is used to predict the onset of diabetes. An ensemble model using the output of several classification algorithms was developed to predict the onset on diabetes based on 16 features. The ensemble model had an AUC of 0.834 indicating high performance.
ALQ120Q
: How often drink alcohol over past 12 mosBMXBMI
: Body Mass Index (kg/m**2)BMXHT
: Standing Height (cm)BMXLEG
: Upper Leg Length (cm)BMXWAIST
: Waist Circumference (cm)BMXWT
: Weight (kg)BPQ020
: Ever told you had high blood pressureDMDEDUC2
: Education Level - Adults 20+INDHHINC
: Annual Household IncomeLBXTC
: Total cholesterol (mg/dL)MCQ250A
: Blood relatives have diabetesPAQ180
: Avg level of physical activity each dayRIAGENDR
: GenderRIDAGEYR
: Age at Screening Adjudicated - RecodeRIDRETH1
: Race/Ethnicity - RecodeSMD030
: Age started smoking cigarets regularly