Skip to content

Code for predicting pitcher performance on upcoming games. Deployed as a google cloud function

Notifications You must be signed in to change notification settings

TimCSheehan/pitcher_model_deploy

Repository files navigation

Pitcher Model

This repository contains a script that is deployed as a Google Cloud Function and is currently scheduled to run daily (at 7AM). The entrypoint main.py triggers a sequence to get the current days schedule and run a predictive model to predict the number of strike outs and walks thrown in each game by the starting pitcher. The model predictions are then written to a database and displayed on my personal website.

Files

  • exports.py handles writing to the database and is also equipped to send a summary email upon model completion (currently deactivated).
  • download_game_level_data.py handles the ETL pipeline for extracting game level variables including the season and historical stats of the pitcher, the pitchers team, the batting team, and the specific batters in the lineup. Relevent data is stored in a highly compressed format in the /data directory.
  • main.py calls ETL pipeline and trains model on most up to data data to make predictions for todays game. Current best performing model is an implementation of Gradient Boosting Trees (xgboost). To get an accurate estimate of model error, the model is first run holding out the previous two weeks of data and MSE is saved to a table before training on all past data to generate predictions.

Notebooks

Visualizations

Example Strike Zones

example strike zone

P(Strike | count, call_history)

examples of history bias

About

Code for predicting pitcher performance on upcoming games. Deployed as a google cloud function

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published