This action sets up Apache Spark in your environment for use in GitHub Actions by:
- installing and adding
spark-submit
andspark-shell
to thePATH
- setting required environment variables such as
SPARK_HOME
,PYSPARK_PYTHON
in the workflow
This enables to test applications using a local Spark context in GitHub Actions.
You will need to setup Python and Java in the job before setting up Spark
Check for the latest Spark versions at https://spark.apache.org/downloads.html
Basic workflow:
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: temurin
- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'
- run: spark-submit --version
See the action.yml file for a complete rundown of the available parameters.
You can also define various options, such as providing a specific URL to download the Spark .tgz
, or using a specific scala version:
- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'
scala-version: '2.13'
spark-url: 'https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz'
xms: '1024M'
xmx: '2048M'
log-level: 'debug'
install-folder: '/home/runner/work'
Check for the latest Spark versions at https://spark.apache.org/downloads.html
The Hadoop version stays quite stable.
The setup-spark
action is tested for various versions of Spark and Hadoop in .github/workflows/test.yml
Contributions are welcome! Feel free to test other Spark versions, and submit issues, or pull requests.
See the contributor's guide for more details.
Setup Apache Spark is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.