Welcome to this repository, which provides a script to scrape data from Študentska prehrana, a website that lists restaurants that offer discounted meals to students in Slovenia. The script retrieves data from the Internet Archive to compare the prices of meals in June 2022 with the current prices listed on the website. The data is cleaned and merged into a single dataframe for further analysis.
In addition to the script, this repository also includes a Jupyter notebook overview.ipynb
that calculates various statistics on the data, and histograms.ipynb
that plots histograms of the price changes (see them inside folder plots
).
I use the data as a basis for Boni 23 website (repo here)
The repository contains the following data files:
restavracije.csv
- A CSV file with restaurant data (can be opened in Excel)restavracije.json
- A JSON file with restaurant data
You can import the functions from the scraper module and use them in your own scripts as follows:
from scraper import load_data, merge_data
df_new, df_old = load_data()
df = merge_data(df_new, df_old)
Alternatively, you can run the script directly:
python scraper.py
This will save the data to both a CSV and a JSON file in the /data
directory.