Skip to content

Setup Apache Spark

Actions
Set up Apache Spark and add the command-line tools to the PATH
v1.2.0
Latest
Star (21)

✨ setup-spark

Test setup-spark action

This action sets up Apache Spark in your environment for use in GitHub Actions by:

  • installing and adding spark-submit and spark-shell to the PATH
  • setting required environment variables such as SPARK_HOME, PYSPARK_PYTHON in the workflow

This enables to test applications using a local Spark context in GitHub Actions.

🪄 Usage

You will need to setup Python and Java in the job before setting up Spark

Check for the latest Spark versions at https://spark.apache.org/downloads.html

Basic workflow:

steps:
- uses: actions/setup-python@v5
  with:
    python-version: '3.10'

- uses: actions/setup-java@v4
  with:
    java-version: '21'
    distribution: temurin

- uses: vemonet/setup-spark@v1
  with:
    spark-version: '3.5.3'
    hadoop-version: '3'

- run: spark-submit --version

See the action.yml file for a complete rundown of the available parameters.

You can also define various options, such as providing a specific URL to download the Spark .tgz, or using a specific scala version:

- uses: vemonet/setup-spark@v1
  with:
    spark-version: '3.5.3'
    hadoop-version: '3'
    scala-version: '2.13'
    spark-url: 'https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz'
    xms: '1024M'
    xmx: '2048M'
    log-level: 'debug'
    install-folder: '/home/runner/work'

️🏷️ Available versions

Check for the latest Spark versions at https://spark.apache.org/downloads.html

The Hadoop version stays quite stable.

The setup-spark action is tested for various versions of Spark and Hadoop in .github/workflows/test.yml

📝 Contributions

Contributions are welcome! Feel free to test other Spark versions, and submit issues, or pull requests.

See the contributor's guide for more details.

Setup Apache Spark is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.

About

Set up Apache Spark and add the command-line tools to the PATH
v1.2.0
Latest

Setup Apache Spark is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.