This code should help to jump start PySpark with Anaconda on AWS using Terraform.
- Install Terraform on Linux Ubuntu/Debian :
Ensure that your system is up to date, and you have the gnupg, software-properties-common, and curl packages installed. You will use these packages to verify HashiCorp's GPG signature, and install HashiCorp's Debian package repository.
a. sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
Add the HashiCorp GPG key.
b. curl -fsSL | sudo apt-key add -
Add the official HashiCorp Linux repository.
c. sudo apt-add-repository "deb [arch=amd64] $(lsb_release -cs) main"
Update to add the repository, and install the Terraform CLI.
d.sudo apt-get update && sudo apt-get install terraform
Adjust the scripts (
) inscripts
if necessary -
Set parameters in
Start cluster:
terraform init
terraform apply
- Destroy cluster:
terraform destroy
- Configure AWS on your local machine:
aws configure
- AWS instance cost for
- Dat Tran, github: datitran forked by * Willian A Santos, github: [wasantos] (
See LICENSE for details.