Are Google searches for COVID-19 symptoms useful in predicting increases in COVID-19 patients?
Exploratory data analysis and hypothesis testing using Google's COVID-19 Open Data database
Explore the docs »
COVID-19 has been a rapidly evolving pandemic. It is important for hospitals and governments to be able to predict the spread of COVID-19 and future pandemics locally, in order to prepare and allocate resources. This project provides tools to explore data from Google's COVID-19 Open Data as well as analysis and hypothesis testing.
Python 3.9:
Data modules
- pandas
- numpy
Plotting modules
- matplotlib
- seaborn
- scipy.stats
Misc. modules
- math
- warnings
- datetime
- itertools
Custom functions found here.
- img: Figure image files
- src: Custom Python functions and Jupyter Notebooks
A PowerPoint presentation summarizing the data analysis and results can be found here.
This project explores 3 tables from the COVID-19 Open Data repo:
- Search Trends symptoms dataset
- Dimensions: (1 425 194, ~450)
- Hospitalization records
- Dimensions: (643 715, 11)
- Index: keys, codes, and names for countries and regions The search trends and hospital tables are joined using location keys and dates.
-
Load data with control over query size
-
Generate plots for EDA to explore variables
Example of hospitalization data versus time:
-
Hypothesis testing of correlation between Google symptom searches and new hospitalizations
- By location (country/region)
- By symptom names
- By date range
- By time shift between searches and new hospitalizations
Example of p-value heatmap:
Author: Christopher Shaffer