Phantasialand Waiting Times

This is a machine learning project focussed on creating and serving a machine learning model to predict future waiting times in the Phantasialand amusement park.

To this end, waiting time data from wartezeiten.app, weather data from the Deutscher Wetterdienst and data about German public and school holidays were analyzed and used to train different machine learning models (Linear Regression, XGBoost and LightGBM). I tuned the hyperparameters of the best model (it was LightGBM) and serve it as a WebApp using Streamlit.

You can learn more about this project on our blog: CI Insights (German).

You can try the WebApp here: http://predict-phantasialand.herokuapp.com/

This is how the interface looks like:

This repository comes without the data and the trained models. In order to reproduce the results, you will have to download the data and train the models by yourself. (If you are a member of Cologne Intelligence, you can also find them in the CIDD sharepoint, look for "Einstiegsprojekt Phantasialand".)

Installation

Setup a virtual environment and install all project dependencies. Python 3.8 or higher is required.

> python3 -m venv .venv/
> source .venv/bin/activate
> pip install -r requirements_dev.txt

Data Retrieval

Unfortunately, most of the data retrieval must be done by hand.

Weather Data

Download tageswerte_KL_01327_19370101_20201231_hist.zip and tageswerte_KL_02667_19570701_20201231_hist.zip from OpenData.DWD - Historical
Download tageswerte_KL_01327_akt.zip and tageswerte_KL_02667_akt.zip from OpenData.DWD - Recent
Place all four files in data/raw/dwd_weather

Holiday Data

Download the iCal calendar file containing all German public holidays from https://www.feiertage-deutschland.de/kalender-download/ and save it as Feiertage Deutschland.ics
Copy and paste the tables with school holiday information for 2019-2024 from https://www.schulferien.org/deutschland/ferien/ and save them as schulferien.txt. Look at data/raw/schulferien_template.txt for the file structure.
Place both files in data/raw/

Waiting Time Data

Activate the virtual environment and run

> python src/data/download_waiting_times.py data/raw/wartezeiten_app.csv

This will download all waiting time data from https://www.wartezeiten.app/phantasialand/ and may take a moment.

Data Processing and Model Training

Make sure that data/raw looks like this:

data/raw
├── Feiertage Deutschland.ics
├── dwd_weather
│   ├── tageswerte_KL_01327_19370101_20201231_hist.zip
│   ├── tageswerte_KL_01327_akt.zip
│   ├── tageswerte_KL_02667_19570701_20201231_hist.zip
│   └── tageswerte_KL_02667_akt.zip
├── schulferien.txt
├── sources.md
├── wartezeiten_app.csv

Run

> make data

to process the raw data and

> python src/training/train_lightgbm.py

to train the LightGBM model.

You may also have a look at the other training scripts in src/training or play with the parameters. All models that are trained with this scripts are saved using the MLflow model registry.

Streamlit WebApp

If you want to use the web app, you need to copy the desired model from the MLflow model registry to models/best/. The models trained and saved with MLflow are placed at mlruns/0/<some hash>/artifacts/model. Make sure to copy all files in this folder (especially MLmodel and model.pkl).

If you trained only one model, it should be easy to see which model you want to copy. Otherwise use the MLflow UI (mlflow ui --backend-store-uri sqlite:///mlflow.db) to find the path to your favorite model.

Afterwards you can open the WebApp with

> streamlit run src/app/app.py

Deployment

You can deploy the web app including the model used for prediction as Docker container.

Follow this steps:

Ensure that you have docker installed and dockerd is running
Ensure that you ran make data and placed your favorite model in models/best/
If you want to deploy the container via Heroku, follow this guide and build the container using heroku container:push -R. This will use the Dockerfile.web file, which is optimized for Heroku.
Otherwise run docker build -t phantasialand:latest . to build the container. This will use the Dockerfile file, which exposes the WebApp on a fixed port (8501).

Data Analysis and Evaluation

If you want to perform some data analyses or model evaluations, you may want to have a look at the notebooks in notebooks/.

End-to-End-Analysis: Note that the models are trained on exact weather data whereas users can only see weather bins (like sunny, overcast, light rain, heavy rain) as we do not have future weather data. This causes a bias in evaluation. To get realistic evaluation results, you can use src/evaluation/test_e2e.py. It transforms the exact weather data into weather bins before querying the model. The scripts generates a csv file containing predictions and actual values for all samples in the test set which can then be analyzed, e.g. by using one of the notebooks/evaluation/fm_e2e_xxx notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.web		Dockerfile.web
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
heroku.yml		heroku.yml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phantasialand Waiting Times

Installation

Data Retrieval

Weather Data

Holiday Data

Waiting Time Data

Data Processing and Model Training

Streamlit WebApp

Deployment

Data Analysis and Evaluation

About

Releases

Packages

Languages

License

cologneintelligence/predict-phantasialand

Folders and files

Latest commit

History

Repository files navigation

Phantasialand Waiting Times

Installation

Data Retrieval

Weather Data

Holiday Data

Waiting Time Data

Data Processing and Model Training

Streamlit WebApp

Deployment

Data Analysis and Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages