Skip to content


Repository files navigation

Data pipeline

Group 3: Spe C., Núria A., Xavier B.

The code aims to make available daily results of an statistical prediction model that ingests currency exchanges scrapped from and the Covid-19 dataset from

Pipeline flow diagram


The application runs in python the tasks for

  • Directing the pipeline (
  • Scrap data from (
  • Handling the AWS S3 interface for storing data(
  • The notificaton system (

Data processes are coded in R, this includes

  • The predictive model (dataprep4model.R)
  • Data preparation (dataprep.R).

The prediction model ingests COVID-19 data directly from, the output of the data preparation script is only used for the notification system.

The code is deployed on a free Heroku dyno properly set to run both python and Rcode.


  1. Start the process and direct it till the end. (
  2. Scrap currency data (
  3. Upload currency data to to S3 (currency_output.csv)
  4. Call the system to run the .R files
  5. Capture the output of R files and upload to S3
  • Updated model results (usdtwd_prediction.csv)
  • Recent covid values for report (dailystats.csv)
  1. Check S3 links and summarize content for report (
  2. Each step appends its status in the to_report object and is gathered which is report(notifypy)
  3. Set schedule to run daily


There's the file runtime.txtspecifying Heroku which python version should use to ensure same compatibility as development environment.

Python libraries are set in the file requirements.txt.

Heroku has default buildpacks for python but none to run R code. The set up uses a third-party buildpack for R in Heroku which is available here.

Using Heroku CLI to set the repository and push it.

$ git:remote -a mvtec-pipeline
$ git add .
$ git commit -am "deploy"
$ git push heroku main

Installing R runtime with buildpacks.

$ heroku buildpacks:add

Installing R packages

Docs When the r buildpack is deployed, init.R file will be executed so we use it to install the libraries.

### Example R code to install packages if not already installed

my_packages = c("tidyverse", "readxl", "countrycode","scales")

install_if_missing = function(p) {
  if (p %in% rownames(installed.packages()) == FALSE) {

invisible(sapply(my_packages, install_if_missing))


No description, website, or topics provided.






No releases published


No packages published