Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP, Depends #24] Build spark jar #25

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Demos/EstimatorApp/src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,29 +171,29 @@ Provision AWS systems, e.g. use

Once the AWS systems are available, setup `~/.ssh/config` and `/etc/hosts`, e.g.

```
# /etc/hosts
{aws_public_ip} ld4p_dev_spark_master
{aws_public_ip} ld4p_dev_spark_worker1
# plus any additional worker nodes
```

```
# ~/.ssh/config

Host ld4p_dev_spark_master
User root
Hostname {aws_public_dns}
IdentityFile ~/.ssh/{key}.pem
User {aws_user}
Hostname {use /etc/hosts name}
IdentityFile ~/.ssh/{key-pair}.pem
Port 22

Host ld4p_dev_spark_slave1
User root
Hostname {aws_public_dns}
IdentityFile ~/.ssh/{key}.pem
Host ld4p_dev_spark_worker1
User {aws_user}
Hostname {use /etc/hosts name}
IdentityFile ~/.ssh/{key-pair}.pem
Port 22

# plus any additional slave nodes
```

```
# /etc/hosts
{aws_public_ip} ld4p_dev_spark_master
{aws_public_ip} ld4p_dev_spark_slave1
# plus any additional slave nodes
# plus any additional worker nodes
```

Then the usual capistrano workflow can be used, i.e.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}


stardogBatchSize="100"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}

bootstrapServers="localhost:9092, 192.168.0.101:9092, 127.0.0.1:9092"
bootstrapServers=${?BOOTSTRAP_SERVERS}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}


bootstrapServers="localhost:9092, 192.168.0.101:9092, 127.0.0.1:9092"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${?ld4pData}
dataDir=${?LD4P_DATA}
2 changes: 1 addition & 1 deletion config/deploy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
set :repo_url, "https://github.com/sul-dlss/ld4p-data-pipeline.git"

# Default branch is :master
# ask :branch, `git rev-parse --abbrev-ref HEAD`.chomp
ask :branch, `git rev-parse --abbrev-ref HEAD`.chomp

# Default deploy_to directory is /var/www/my_app_name
set :deploy_to, "/opt/ld4p-data-pipeline"
Expand Down
75 changes: 13 additions & 62 deletions config/deploy/ld4p_dev.rb
Original file line number Diff line number Diff line change
@@ -1,69 +1,20 @@
# server-based syntax
# ======================
# Defines a single server with a list of roles and multiple properties.
# You can define all roles on a single server, or split them:

# server "example.com", user: "deploy", roles: %w{app db web}, my_property: :my_value
# server "example.com", user: "deploy", roles: %w{app web}, other_property: :other_value
# server "db.example.com", user: "deploy", roles: %w{db}

# spark masters
server "ld4p_dev_spark_master", user: "root", roles: %w{redhat spark master}

# spark slaves
server "ld4p_dev_spark_slave1", user: "root", roles: %w{redhat spark slave}
server "ld4p_dev_spark_slave2", user: "root", roles: %w{redhat spark slave}
server "ld4p_dev_spark_slave3", user: "root", roles: %w{redhat spark slave}


# role-based syntax
# ==================

# Defines a role with one or multiple servers. The primary server in each
# group is considered to be the first unless any hosts have the primary
# property set. Specify the username and a domain or IP for the server.
# Don't use `:all`, it's a meta role.

# role :app, %w{[email protected]}, my_property: :my_value
# role :web, %w{[email protected] [email protected]}, other_property: :other_value
# role :db, %w{[email protected]}


server 'ld4p_dev_spark_master', user: 'root', roles: %w{redhat spark master}

# Configuration
# =============
# You can set any configuration variable like in config/deploy.rb
# These variables are then only loaded and set in this stage.
# For available Capistrano configuration variables see the documentation page.
# http://capistranorb.com/documentation/getting-started/configuration/
# Feel free to add new variables to customise your setup.
# spark workers
server 'ld4p_dev_spark_worker1', user: 'root', roles: %w{redhat spark worker}
server 'ld4p_dev_spark_worker2', user: 'root', roles: %w{redhat spark worker}
server 'ld4p_dev_spark_worker3', user: 'root', roles: %w{redhat spark worker}

# ----
# Setup the environment variables for the spark app

set :ld4p_data, File.join(fetch(:deploy_to), 'current', 'src', 'main', 'resources', 'xsl')
set :bootstrap_servers, "ec2-34-213-81-65.us-west-2.compute.amazonaws.com:9092,ec2-34-214-42-7.us-west-2.compute.amazonaws.com:9092,ec2-52-36-184-167.us-west-2.compute.amazonaws.com:9092"

# Custom SSH Options
# ==================
# You may pass any option but keep in mind that net/ssh understands a
# limited set of options, consult the Net::SSH documentation.
# http://net-ssh.github.io/net-ssh/classes/Net/SSH.html#method-c-start
#
# Global options
# --------------
# set :ssh_options, {
# keys: %w(/home/rlisowski/.ssh/id_rsa),
# forward_agent: false,
# auth_methods: %w(password)
# }
#
# The server-based syntax can be used to override options:
# ------------------------------------
# server "example.com",
# user: "user_name",
# roles: %w{web app},
# ssh_options: {
# user: "user_name", # overrides user setting above
# keys: %w(/home/user_name/.ssh/id_rsa),
# forward_agent: false,
# auth_methods: %w(publickey password)
# # password: "please use keys"
# }
set :default_env, {
LD4P_DATA: fetch(:ld4p_data),
BOOTSTRAP_SERVERS: fetch(:bootstrap_servers)
}

43 changes: 43 additions & 0 deletions lib/capistrano/tasks/spark.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
namespace :spark do

after :deploy, 'spark:update_env'
after :deploy, 'spark:assembly'
after :deploy, 'spark:upload_assembly'

# Set the /etc/environment
desc 'Update the spark environment variables'
task :update_env do
on roles(:spark) do
# remove any existing entries
sudo("sed -i -e '/BEGIN_LD4P_ENV/,/END_LD4P_ENV/{ d; }' /etc/environment")
# append new entries
sudo("echo '### BEGIN_LD4P_ENV' | sudo tee -a /etc/environment > /dev/null")
sudo("echo 'export LD4P_DATA=#{fetch(:ld4p_data)}' | sudo tee -a /etc/environment > /dev/null")
sudo("echo 'export BOOTSTRAP_SERVERS=#{fetch(:bootstrap_servers)}' | sudo tee -a /etc/environment > /dev/null")
sudo("echo '### END_LD4P_ENV' | sudo tee -a /etc/environment > /dev/null")
end
end

# ----
# Build and deploy the spark projects

desc 'sbt SparkStreamingConvertors/assembly'
task :assembly do
system('sbt SparkStreamingConvertors/assembly')
end

desc 'Upload the spark package'
task :upload_assembly do
jars = Dir.glob('./SparkStreamingConvertors/**/*assembly*.jar')
on roles(:spark) do
lib_path = File.join(current_path, 'lib')
execute("mkdir -p #{lib_path}")
jars.each do |jar|
filename = File.basename(jar)
lib = File.join(lib_path, filename)
# puts "#{jar} -> #{lib}"
upload!(jar, lib)
end
end
end
end
2 changes: 1 addition & 1 deletion src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@


dataDir=${HOME}/Dev/data/ld4pData
dataDir=${ld4pData}
dataDir=${LD4P_DATA}