A collection of R scripts to support a final year Mathematics project, interested in applying statistical models to EPL football data. The analysis is split into the following two sections.
This section strives to construct a model to predict the probability of a home win, using logistic regression. The model utilises the following predictive variables; ability of the home team, ability of the away team, geographical distance between each team, and the impacts of Covid-19. The raw data, R script, and outputs can all be found within the folder titled "logistic_regression".
This section applies survival analysis techniques to investigate the time taken to score the first goal in a football match. The analysis considers the effects of the following independent variables; ability of the reference team, ability of the adverse team, location of the match, and the impacts of Covid-19. The raw data, R script, and outputs can all be found within the folder titled "survival_analysis".