Ambari stack for easily installing and managing HDP IoT Demo which shows real time monitoring/alerts and predictions of driving violations generated by fleet of trucks. Also has optional steps to setup Ranger audits to Solr and a SILK dashboard to visualize audits
Videos on the demo itself available here:
- Arun's Hadoop Summit 2015 keynote
- Shaun/George's Hadoop Summit 2014 Keynote
- Nauman's Demo at Phoenix Data conference 2014
Pre-reqs:
- The service currently requires that it is installed on the Ambari server node and that Kafka and Zookeeper and also running on the same node.
- HBase and Storm must be available on the cluster and started.
- Falcon must be stopped before installing this service.
Previous versions:
- For 2.2 version of the steps see here
- Download HDP 2.3 sandbox VM image (Sandbox_HDP_2.3_VMWare.ova) from Hortonworks website
- Import Sandbox_HDP_2.3_VMWare.ova into VMWare and set the VM memory size to at least 8GB (preferably 10GB) RAM and at least 4cpus allocated.
- Now start the VM
- After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.
192.168.191.241 sandbox.hortonworks.com sandbox
- Connect to the VM via SSH (password hadoop) and restart Ambari server
- Make sure Storm, HBase, Kafka, Hive are up and Falcon is down and all these services are out of maintenance mode. You can SSH into Ambari server node and run the below as a shortcut.
- Before proceeding, you may want to wait a few minutes to ensure they stay up reliably or the demo setup may fail. If they do not, you may need to increase the memory/cpus allocated to the VM.
#Ambari password
export PASSWORD=admin
#Ambari host
export AMBARI_HOST=localhost
#detect name of cluster
output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters`
CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "\([^\"]*\)".*/\1/p'`
#make sure kafka, storm, falcon are out of maintenance mode
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove Falcon from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/FALCON
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove Kafka from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/KAFKA
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove Storm from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/STORM
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove Hbase from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/HBASE
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove Hive from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/HIVE
#Start Kafka, Storm, HBase, Hive
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start KAFKA via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/KAFKA
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start STORM via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/STORM
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start HBASE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/HBASE
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start HIVE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/HIVE
#stop Falcon
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop FALCON via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/FALCON
-
(Optional): Setup YARN queue for Spark using the steps here
-
(Optional): Setup Solr and Banana and 'Ranger Audits' dashboard using HDP search (Solr 5.2)
cd
wget https://github.com/abajwa-hw/security-workshops/raw/master/scripts/setup_solr_banana.sh
chmod +x setup_solr_banana.sh
#change <arguments> below
./setup_solr_banana.sh <arguments>
- argument options:
- if no arguments passed, FQDN will be used as hostname to setup dashboard/view (use this if you have created local hosts entry for host where Solr will run e.g. sandbox.hortonworks.com)
- if "publicip" is passed, the public ip address will be used as hostname to setup dashboard/view (use this on cloud environments)
- otherwise the passed in value will be assumed to be the hostname to setup dashboard/view
- Solr UI should be available at http://(your hostname):6083/solr/#/ranger_audits e.g. http://sandbox.hortonworks.com:6083/solr/#/ranger_audits
- An Empty Banana dashboard should be available at http://(your hostname):6083/banana e.g. http://sandbox.hortonworks.com:6083/banana.
-
(Optional): Setup Ranger using steps here:
-
(Optional): setup HBase Ranger plugin to audit to Solr
- On 2.3 Sandbox:
cd /usr/hdp/2.*/ranger-hbase-plugin/
vi /usr/hdp/2.*/ranger-hbase-plugin/install.properties
XAAUDIT.SOLR.IS_ENABLED=true
XAAUDIT.SOLR.SOLR_URL=http://sandbox.hortonworks.com:6083/solr/ranger_audits
./enable-hbase-plugin.sh
-
(Optional): setup Hive Ranger plugin to audit to Solr
- On 2.3 Sandbox:
cd /usr/hdp/2.*/ranger-hive-plugin/
vi /usr/hdp/2.*/ranger-hive-plugin/install.properties
XAAUDIT.SOLR.IS_ENABLED=true
XAAUDIT.SOLR.SOLR_URL=http://sandbox.hortonworks.com:6083/solr/ranger_audits
./enable-hive-plugin.sh
-
Now retart HBase and Hive to register the plugins.
-
Deploy the IoTDemo service as well as Apache Zeppelin service to visualize/analyze violations events generated via prebuilt notebook
VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/abajwa-hw/iotdemo-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/IOTDEMO
sudo git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/ZEPPELIN
#on sandbox
sudo service ambari restart
#on non sandbox
sudo service ambari-server restart
- Then you can click on 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard:
On bottom left -> Actions -> Add service -> check both 'IoT Demo' and 'Zeppelin' -> Next -> Next -> Configure service -> Next -> Deploy
Things to remember while configuring the service
-
The service currently requires that it is installed on the Ambari server node and that Kafka and Zookeeper and also running on the same node.
- If kafka is on a different node, the demo still could work (not tested) if you manually create the topics ahead of time
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper $ZK_HOST:2181 --replication-factor 1 --partitions 2 --topic truck_events
-
Under "Advanced demo-config"
- enter your github credentials to allow the service to access the IoT demo artifacts
- enter public name/IP of IoTDemo node: This is used to setup the Ambari view. Set this to the public host/IP of IoTDemo node (which must must be reachable from your local machine). If installing on sandbox (or local VM), change this to the IP address of VM. If installing on cloud, set this to public name/IP of IoTDemo node. Alternatively, if you already have a local hosts file entry for the internal hostname of the IoTDemo node (e.g. sandbox.hortonworks.com), you can leave this empty - it will default to internal hostname
- you should use the same value for publicname property in Zeppelin config as well
-
Under "Advanced user-config":
- enter your ambari user/password/port configuration (if not using default). These wil be used to check the required services are up
-
The IoT demo configs are available under "Advanced demo-env", but do not require updating as all required configs will be auto-populated:
- Ambari host
- Name node host/port
- Nimbus host
- Hive metastore host/port
- Supervisor host
- HBase master host
- Kafka host/port (also where ActiveMQ will be installed)
-
On successful deployment you will see the IOTDEMO service as part of Ambari stack and will be able to start/stop the service from here:
-
You can see the parameters you configured under 'Configs' tab
-
One benefit to wrapping the component in Ambari service is that you can now monitor/manage this service remotely via REST API
export SERVICE=IOTDEMO
export PASSWORD=admin
export AMBARI_HOST=localhost
#detect name of cluster
output=`curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters`
CLUSTER=`echo $output | sed -n 's/.*"cluster_name" : "\([^\"]*\)".*/\1/p'`
#get service status
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
#start service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
#stop service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
-
If iotdemo service was not deployed on same node as Ambari server:
- Transfer the compiled view jar from /root/iotdemo-view/target/*.jar on the iotdemo node to /var/lib/ambari-server/resources/views on ambari server node
- Follow steps below to stop IotDemo service, restart ambari, start IoTDemo service:
-
Otherwise, if iotdemo service was deployed on same node as Ambari server:
- Stop the IoTDemo service from Ambari
- Restart ambari
#sandbox
service ambari restart
#non sandbox
service ambari-server restart
- Start IotDemo service
- Open the webapp via Ambari view or at http://sandbox.hortonworks.com:8081/storm-demo-web-app/
- Login
- Generate <50 events
- Navigate to the monitoring and prediction webapps
-
If installed, open Zeppelin via the Ambari view or at http://sandbox.hortonworks.com:9995
-
Open the "IoT Data Analysis" notebook and execute the cells one by one to:
-
If you setup the 'spark' queue earlier, verify that Zeppelin submitted application to this queue. See here for screenshots
-
If Ranger is installed, you can also use it to secure Spark by setting authorization policies and getting audit reports. See sample steps/screenshots to (setup Ranger's YARN plugin)[https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-23.md#setup-yarn-plugin-for-ranger] and (setup YARN queue and Ranger policy on an Ambari installed HDP 2.3 cluster)[https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-23.md#yarn-audit-exercises-in-ranger].
-
If installed, open HDP Search (Solr 5.2) at http://sandbox.hortonworks.com:6083/solr
-
select the ranger_audits core and select the Query option to search through the HBase audit events
-
If installed, open the Silk Ranger Audits dashboard via Ambari view or http://sandbox.hortonworks.com:6083/banana
-
By default you will see a visualization of HBase reads/gets:
-
Now open Hive view and query the truck_events_text_partition table:
-
Now disable the global allow policy on Hbase and Hive and wait 30s:
-
Try running the same query in Hive view
-
At this point, you should should see some Hbase audit records with result=0
-
Confirm the same by opening the Audit tab of Ranger: http://sandbox.hortonworks.com:6080
- Re-enable the global allow policies.
-
Components going down? Increase VM memory/cores and restart
-
Storm Nimus is not coming up?
-
or getting error
java.lang.RuntimeException: Could not find leader nimbus from seed hosts [sandbox.hortonworks.com]. Did you specify a valid list of nimbus hosts for config nimbus.seeds
-
Solution: Stop storm and run below script to clean old data before starting Storm back up
-
setup/bin/cleanupstormdirs.sh
/usr/hdp/current/zookeeper-client/bin/zkCli.sh
rmr /storm
quit
- Other issues? Try resetting demo and restarting
setup/bin/cleanup.sh
- To remove the IOTDEMO service:
-
Stop the service via Ambari
-
Delete the service
-
export SERVICE=IOTDEMO export PASSWORD=admin export AMBARI_HOST=localhost
output=curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' http://$AMBARI_HOST:8080/api/v1/clusters
CLUSTER=echo $output | sed -n 's/.*"cluster_name" : "\([^\"]*\)".*/\1/p'
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
#if above errors out, run below first to fully stop the service #curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE
```
-
Remove artifacts
rm -rf /root/sedev
rm -rf /root/iot*
rm -rf /root/scala
rm -rf /root/maven
rm -rf /var/log/iotdemo.log
rm -rf /var/lib/ambari-server/resources/views/iot*
VERSION=hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'
rm -rf /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/IOTDEMO
service ambari-server restart ```