Skip to content

Sal2040/world_holidays

Repository files navigation

World Holidays

This project is designed to fetch holiday data from the Calendarific API and send weekly emails with upcoming holidays based on the specified countries and holiday types.
It uses Google Cloud Storage (GCS) for intermediate storage and a PostgreSQL database for permanent storage of transformed and normalized data. It also uses Airflow DAGs for orchestration. The project is intended to be run either locally or on the Google Cloud Platforfm (GCP).

Scripts:

  • extract.py - extracts data from the Calendarific API and stores them in GCS
  • transform_load.py - transforms the extracted data into a normalized tabular form and uploads them into the database
  • send_email.py - fethces data from the database, formats them as text and sends them in via email

DAGs:

  • dag_extract_load.py - runs extract.py and transform_load.py sequientially every December 27 at noon in order to fetch holidays data for the upcoming year.
  • dag_send_email.py - runs send_emai.py every Friday at noon.

Module:

  • helpers.py - aggregates some basic reusable functions

Configuration:

  • pipeline_empty.conf is provided as a template for creating the configuration file.
  • requirements.txt contains Python dependencies.

Requirements

  • Python 3.6+
  • pip
  • PostgreSQL
  • Airflow
  • Active GCP project
  • Email client that provides access to its SMTP server (e.g. Gmail or Yahoo)

Local Setup:

  1. Clone the repository
git clone https://github.com/Sal2040/world_holidays.git
  1. Install the required packages:
pip install -r <path_to_your_directory>/world_holidays/requirements.txt
  1. Start psql as a user of your choice and run the create_database.sql script:
psql -U <username>
\i <path_to_your_directory>/world_holidays/create_database.sql
  1. Set up GCS:
  1. Sign up with https://calendarific.com/ in order to get an api key.

  2. Acquire SMTP credentials from your email service. If you use gmail, use the following:

    • SSL Port: 465
    • Server Name: smtp.gmail.com
    • Generate application password as shown here.
  3. Set up Airflow following the instructions here. Move DAGs to AIRFLOW_HOME.

  4. Configure

  • Create a pipeline.conf file in the project directory to store your configurations. Use the provided pipeline_empty.conf as a template.
  • Set the WH_HOME environment variable to the path of your project directory:
export WH_HOME=<path_to_your_directory>/world_holidays
  1. Run Airflow standalone.
airflow standalone

Cloud Setup:

For cloud-only deployment on the GCP, the local instances of PostgreSQL and Airflow need to be replaced with their cloud equivalents.

  1. PostgeSQL

  2. Airflow

    There are two options for setting up a cloud instance of Airflow:

    • Google Cloud Composer - this solution has the advantage of being a fully preinstalled and production-ready. Nevertheless, dependency conflicts are almost certain to appear and the solution is not trivial. It is also not possible to simply turn the instance off and on to save costs when it is not needed. It has to be completely deleted and set up again later.
    • Manually installing Airflow on Virtual Machine. Whereas this solutions is not suitable for serious production applications, it is good enough for a dummy project. In this case, the steps outlined in the Local Setup above can be simply repeated on the Virtual Machine.

Notes on Configuration:

The pipeline.conf file has two separate sections to define the countries:

  • [extract_config] defines the holidays in the countries that are downloaded from the calendarific API into the database. Modification requires a manual re-run of the extract_load DAG to update the database. Country removals are not reflected. Only additions.
  • [email_config] defines the holidays in the countries that will be included in the information email. Countries not included in the database cannot be included in the emails of course.

There are the following holiday types to choose from:

  • National holiday = bank holiday
  • Common local holiday = bank holiday
  • Half-day holiday
  • Local holiday
  • Muslim
  • Hebrew
  • Clock change/Daylight Saving Time
  • Season
  • Christian
  • Observance

The years config in the [extract_config] section is meant for initial population of the database. If left empty [], it defaults to the next year.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages