Skip to content

Commit

Permalink
Early code for Airflow v2.8.0 Docker image
Browse files Browse the repository at this point in the history
  • Loading branch information
rafidka committed Jan 17, 2024
1 parent b664b61 commit 4841c6c
Show file tree
Hide file tree
Showing 18 changed files with 448 additions and 14 deletions.
31 changes: 31 additions & 0 deletions CODING_GUIDELINES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Coding Guidelines

_This is a still work-in-progress and is likely to be updated during the early phases of the development of this repository._

This document contains the coding guidelines we follow in this repository. We follow the guidelines here strictly, so make sure your Pull Requests abide by them.

To make it easier for developers to know the guidelines for what they are contributing, this document has multiple sections. Use the list below to jump to the section related to the code iny our Pull Request.

## Table of Contents

- [Bash Scripts](#bash-scripts)
- [Python Scripts](#python-scripts)
- [Docker](#docker)

## Bash Scripts

For Bash scripts, we use [ShellCheck](https://www.shellcheck.net/) to help the developers common bugs and bad practices related to Bash scripts. We have GitHub workflows that execute ShellCheck on every Bash script in the repository, and fails if the code breaks any of its rules. To make it easier for the developer to test their code before publishing a PR, we have pre-commit hooks that automatically test your code. However, you need to setup `pre-commit` for the hooks to run. Check the README files for instructions.

## Python Scripts

For Python scripts, we follow [PEP8](https://peps.python.org/pep-0008/). Additionally, we use Flask8 rules. Failure to comply by these will result in your PR failing our GitHub workflows. Like Bash scripts, to make it easier for the developer to test their code before publishing a PR, we have pre-commit hooks that automatically test your code. However, you need to setup `pre-commit` for the hooks to run. Check the README files for instructions.

## Docker

For Dockerfile bootstrapping, don't add your code to the Dockerfile directly. Instead, create a Bash script under the bootstrap/ folder. Follow these rules when creating a new bootstrapping file:

1. Make sure the file name starts with a 3-digit number that indicates its order of execution.
2. Keep your files as small as possible (but not smaller!). This way you better employ Docker caching and reduce the number of unnecessary rebuilds.
3. If you need a system package in your bootstrap file, install it at the beginning and remove it at the end. For eaxmple, if you need to download a file using `wget` then do a `dnf install` at the beginning and a `dnf remove` at the end. This keeps the bootstrap files self-contained, and avoid leaving unnecessary system packages in the final Docker image.
- Don't worry about removing a package that is actaully needed in the final image. There is a step at the end that will do that.
- Don't worry about a certain DNF package being installed and removed multiple times during the bootstrapping process. Keeping bootstrapping files self-contained and avoiding leaving unnecessary packages is more important than the couple of seconds you will save optimizing the installation of system packages, especially considering Docker caching which means that steps are rarely repeated (assuming a well-written Dockerfile)
11 changes: 0 additions & 11 deletions dummy.py

This file was deleted.

3 changes: 0 additions & 3 deletions dummy.sh

This file was deleted.

97 changes: 97 additions & 0 deletions images/airflow/2.8.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#
# WARNING: Don't change this file manually. This file is auto-generated from the
# Jinja2-templated Dockerfile.j2 file, so you need to change that file instead.
#
# This file was generated on 2024-01-17 21:57:12.451896
#

FROM public.ecr.aws/amazonlinux/amazonlinux:2023

# Versions
ENV AIRFLOW_VERSION=2.8.0
ENV AIRFLOW_AMAZON_PROVIDERS_VERSION=8.13.0
ENV PYTHON_VERSION=3.11
ENV AIRFLOW_USER_HOME=/usr/local/airflow
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}

ENV PATH_DEFAULT=${PATH}
ENV PATH_AIRFLOW_USER=${AIRFLOW_USER_HOME}/.local/bin:${PATH_DEFAULT}

# Bootstrapping steps (root user - first pass)

COPY ./bootstrap/01-root-firstpass/001-init.sh /001-init.sh
RUN chmod +x /001-init.sh && /001-init.sh
RUN rm /001-init.sh

COPY ./bootstrap/01-root-firstpass/002-install-python.sh /002-install-python.sh
RUN chmod +x /002-install-python.sh && /002-install-python.sh
RUN rm /002-install-python.sh

COPY ./bootstrap/01-root-firstpass/003-install-mariadb.sh /003-install-mariadb.sh
RUN chmod +x /003-install-mariadb.sh && /003-install-mariadb.sh
RUN rm /003-install-mariadb.sh

COPY ./bootstrap/01-root-firstpass/004-create-airflow-user.sh /004-create-airflow-user.sh
RUN chmod +x /004-create-airflow-user.sh && /004-create-airflow-user.sh
RUN rm /004-create-airflow-user.sh

COPY ./bootstrap/01-root-firstpass/005-install-aws-cli.sh /005-install-aws-cli.sh
RUN chmod +x /005-install-aws-cli.sh && /005-install-aws-cli.sh
RUN rm /005-install-aws-cli.sh

COPY ./bootstrap/01-root-firstpass/999-install-needed-dnf-packages.sh /999-install-needed-dnf-packages.sh
RUN chmod +x /999-install-needed-dnf-packages.sh && /999-install-needed-dnf-packages.sh
RUN rm /999-install-needed-dnf-packages.sh


# Bootstrapping steps (airflow user)

USER root
COPY ./bootstrap/02-airflow/001-install-airflow.sh /001-install-airflow.sh
RUN chmod +x /001-install-airflow.sh
ENV PATH=${PATH_AIRFLOW_USER}}
USER airflow
RUN /001-install-airflow.sh
ENV PATH=${PATH_DEFAULT}
USER root
RUN rm /001-install-airflow.sh


# Bootstrapping steps (root user - second pass)
# Put in these steps stuff that you want to execute as a root user
# and also relies on the successful execution of the bootstrapping
# steps of the 'airflow' user. For example, giving ownership of the
# Airflow home user to the 'airflow' user, which requires having all
# files.

COPY ./bootstrap/03-root-secondpass/001-create-mwaa-dir.sh /001-create-mwaa-dir.sh
RUN chmod +x /001-create-mwaa-dir.sh && /001-create-mwaa-dir.sh
RUN rm /001-create-mwaa-dir.sh

COPY ./bootstrap/03-root-secondpass/999-chown-airflow-folder.sh /999-chown-airflow-folder.sh
RUN chmod +x /999-chown-airflow-folder.sh && /999-chown-airflow-folder.sh
RUN rm /999-chown-airflow-folder.sh


# Create a volume for syncing files with the sidecar. The actual folder
# is created by the `001-create-mwaa-dir.sh` script.
VOLUME ["/usr/local/mwaa"]

# TODO We should only expose this port if the comand is 'webserver'.
EXPOSE 8080

ENV PATH=${PATH_AIRFLOW_USER}}
RUN unset PATH_DEFAULT
RUN unset PATH_AIRFLOW_USER

WORKDIR ${AIRFLOW_USER_HOME}

COPY entrypoint.py /entrypoint.py
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

USER airflow

ENTRYPOINT ["/entrypoint.sh"]

CMD /bin/bash
66 changes: 66 additions & 0 deletions images/airflow/2.8.0/Dockerfile.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
FROM public.ecr.aws/amazonlinux/amazonlinux:2023

# Versions
ENV AIRFLOW_VERSION=2.8.0
ENV AIRFLOW_AMAZON_PROVIDERS_VERSION=8.13.0
ENV PYTHON_VERSION=3.11
ENV AIRFLOW_USER_HOME=/usr/local/airflow
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}

ENV PATH_DEFAULT=${PATH}
ENV PATH_AIRFLOW_USER=${AIRFLOW_USER_HOME}/.local/bin:${PATH_DEFAULT}

# Bootstrapping steps (root user - first pass)
{% for filename, filepath in bootstrapping_scripts_root_firstpass %}
COPY {{ filepath }} /{{ filename }}
RUN chmod +x /{{ filename }} && /{{ filename }}
RUN rm /{{ filename }}
{% endfor %}

# Bootstrapping steps (airflow user)
{% for filename, filepath in bootstrapping_scripts_airflow %}
USER root
COPY {{ filepath }} /{{ filename }}
RUN chmod +x /{{ filename }}
ENV PATH=${PATH_AIRFLOW_USER}}
USER airflow
RUN /{{ filename }}
ENV PATH=${PATH_DEFAULT}
USER root
RUN rm /{{ filename }}
{% endfor %}

# Bootstrapping steps (root user - second pass)
# Put in these steps stuff that you want to execute as a root user
# and also relies on the successful execution of the bootstrapping
# steps of the 'airflow' user. For example, giving ownership of the
# Airflow home user to the 'airflow' user, which requires having all
# files.
{% for filename, filepath in bootstrapping_scripts_root_secondpass %}
COPY {{ filepath }} /{{ filename }}
RUN chmod +x /{{ filename }} && /{{ filename }}
RUN rm /{{ filename }}
{% endfor %}

# Create a volume for syncing files with the sidecar. The actual folder
# is created by the `001-create-mwaa-dir.sh` script.
VOLUME ["/usr/local/mwaa"]

# TODO We should only expose this port if the comand is 'webserver'.
EXPOSE 8080

ENV PATH=${PATH_AIRFLOW_USER}}
RUN unset PATH_DEFAULT
RUN unset PATH_AIRFLOW_USER

WORKDIR ${AIRFLOW_USER_HOME}

COPY entrypoint.py /entrypoint.py
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

USER airflow

ENTRYPOINT ["/entrypoint.sh"]

CMD /bin/bash
4 changes: 4 additions & 0 deletions images/airflow/2.8.0/bootstrap/01-root-firstpass/001-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

dnf update -y
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash
set -e

dnf install -y wget xz tar

PYTHON_VERSION=3.11.7
PYTHON_MD5_CHECKSUM=d96c7e134c35a8c46236f8a0e566b69c

mkdir python_install
python_file=Python-$PYTHON_VERSION
python_tar=$python_file.tar
python_tar_xz=$python_tar.xz

# Download Python's source code archive.
mkdir python_source
wget "https://www.python.org/ftp/python/$PYTHON_VERSION/$python_tar_xz" -P /python_source

# Verify the checksum
echo "$PYTHON_MD5_CHECKSUM /python_source/$python_tar_xz" | md5sum --check - | grep --basic-regex "^/python_source/${python_tar_xz}: OK$"

cp /python_source/$python_tar_xz /python_install/$python_tar_xz
unxz ./python_install/$python_tar_xz
tar -xf ./python_install/$python_tar -C ./python_install

dnf install -y dnf-plugins-core
dnf builddep -y python3

pushd /python_install/$python_file
./configure
make install -s -j "$(nproc)" # use -j to set the cores for the build
popd

# Upgrade pip
pip3 install --upgrade pip

rm -rf /python_source /python_install

dnf remove -y wget xz tar
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
set -e

dnf install -y wget

MARIADB_RPM_COMMON_CHECKSUM=e87371d558efa97724f3728fb214cf19
MARIADB_RPM_SHARED_CHECKSUM=ed82ad5bc5b35cb2719a9471a71c6cdb
MARIADB_RPM_DEVEL_CHECKSUM=cfce6e9b53f4e4fb1cb14f1ed720c92c

# Installing mariadb-devel dependency for apache-airflow-providers-mysql.
MARIADB_RPM_COMMON=MariaDB-common-11.1.2-1.fc38.x86_64.rpm
MARIADB_RPM_SHARED=MariaDB-shared-11.1.2-1.fc38.x86_64.rpm
MARIADB_RPM_DEVEL=MariaDB-devel-11.1.2-1.fc38.x86_64.rpm

# Download the necessary RPMs.
mkdir /mariadb_rpm
wget https://mirror.mariadb.org/yum/11.1/fedora38-amd64/rpms/$MARIADB_RPM_COMMON -P /mariadb_rpm
wget https://mirror.mariadb.org/yum/11.1/fedora38-amd64/rpms/$MARIADB_RPM_SHARED -P /mariadb_rpm
wget https://mirror.mariadb.org/yum/11.1/fedora38-amd64/rpms/$MARIADB_RPM_DEVEL -P /mariadb_rpm

# Verify their checkums
echo "$MARIADB_RPM_COMMON_CHECKSUM /mariadb_rpm/$MARIADB_RPM_COMMON" | md5sum --check - | grep --basic-regex "^/mariadb_rpm/$MARIADB_RPM_COMMON: OK$"
echo "$MARIADB_RPM_SHARED_CHECKSUM /mariadb_rpm/$MARIADB_RPM_SHARED" | md5sum --check - | grep --basic-regex "^/mariadb_rpm/$MARIADB_RPM_SHARED: OK$"
echo "$MARIADB_RPM_DEVEL_CHECKSUM /mariadb_rpm/$MARIADB_RPM_DEVEL" | md5sum --check - | grep --basic-regex "^/mariadb_rpm/$MARIADB_RPM_DEVEL: OK$"

# Install the RPMs.
rpm -ivh /mariadb_rpm/*

rm -rf /mariadb_rpm

dnf remove -y wget
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
set -e

dnf install -y shadow-utils

# AIRFLOW_USER_HOME is defined in the Dockerfile.
adduser -s /bin/bash -d "${AIRFLOW_USER_HOME}" airflow

dnf remove -y shadow-utils
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

dnf install -y awscli-2
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
set -e

dnf install -y java-17-amazon-corretto # For Java lovers.
dnf install -y libcurl-devel # For pycurl
dnf install -y postgresql-devel # For psycopg2
dnf install -y procps # For 'ps' command, which is used for monitoring.
27 changes: 27 additions & 0 deletions images/airflow/2.8.0/bootstrap/02-airflow/001-install-airflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash
set -e

# List of required environment variables
required_vars=("AIRFLOW_VERSION" "AIRFLOW_AMAZON_PROVIDERS_VERSION" "PYTHON_VERSION")

# Function to check if environment variables are set
check_env_vars() {
for var in "${required_vars[@]}"; do
if [[ -z ${!var} ]]; then
echo "Error: Environment variable ${var} is not set."
exit 1
fi
done
}

# Check required environment variables
check_env_vars

CONSTRAINT_FILE="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip3 install --constraint "${CONSTRAINT_FILE}" \
pycurl \
psycopg2 \
"celery[sqs]" \
"apache-airflow[celery,statsd]==${AIRFLOW_VERSION}" \
"apache-airflow-providers-amazon[aiobotocore]==${AIRFLOW_AMAZON_PROVIDERS_VERSION}" \
watchtower
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

mkdir -p /usr/local/mwaa
chown -R airflow: /usr/local/mwaa
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

chown -R airflow: "${AIRFLOW_USER_HOME}"
6 changes: 6 additions & 0 deletions images/airflow/2.8.0/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e

python3 generate-dockerfile.py

docker build ./
23 changes: 23 additions & 0 deletions images/airflow/2.8.0/entrypoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
"""
This is the entrypoint of the Docker image when running Airflow components.
The script gets called with the Airflow component name, e.g. scheduler, as the
first and only argument. It accordingly runs the requested Airlfow component
after setting up the necessary configurations.
"""

import sys


def main() -> None:
"""Entrypoint of the script."""
print("Warming the Docker container.")
print(sys.argv)
# TODO Not yet implemented


if __name__ == '__main__':
main()
else:
print('This module cannot be imported.')
sys.exit(1)
Loading

0 comments on commit 4841c6c

Please sign in to comment.