Adding instructions for SparkML

tormoalto · Aug 23, 2015 · 9bd2536 · 9bd2536
1 parent 843fca6
commit 9bd2536
Show file tree

Hide file tree

Showing 2 changed files with 152 additions and 0 deletions.
diff --git a/Geode.md b/Geode.md
@@ -0,0 +1,96 @@
+# Apache Geode Lab
+
+Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
+
+## Build
+
+* Obtaining the source code
+
+```
+$ git clone http://github.com/apache/incubator-geode
+```
+
+* Building
+
+```
+$ cd incubator-geode
+$ git checkout develop
+$ ./gradlew build -Dskip.tests=true
+```
+
+## Starting a Geode cluster
+
+* Starting a `locator` and a `server`
+
+```
+$ gfsh start locator --name=locator1
+...
+$ gfsh start server --name=server1 --locators=localhost[10334]
+...
+```
+
+## Using `gfsh` (Geode CLI)
+
+```
+$ gfsh
+    _________________________     __
+   / _____/ ______/ ______/ /____/ /
+  / /  __/ /___  /_____  / _____  /
+ / /__/ / ____/  _____/ / /    / /
+/______/_/      /______/_/    /_/    v1.0.0-incubating-SNAPSHOT
+
+Monitor and Manage GemFire
+gfsh>connect
+Connecting to Locator at [host=localhost, port=10334] ..
+Connecting to Manager at [host=192.168.1.94, port=1099] ..
+gfsh>list members
+  Name   | Id
+-------- | ---------------------------------------
+locator1 | anakin(locator1:70957:locator)<v0>:9773
+server1  | anakin(server1:71106)<v1>:34411
+```
+
+* Creating a `region` and basic operations
+
+```
+gfsh>create region --name=myRegion --type=PARTITION
+Member  | Status
+------- | ---------------------------------------
+server1 | Region "/myRegion" created on "server1"
+
+gfsh>put --key=1 --value="value1" --region=/myRegion
+Result      : true
+Key Class   : java.lang.String
+Key         : 1
+Value Class : java.lang.String
+Old Value   : <NULL>
+
+gfsh>get --key=1 --region=/myRegion
+Result      : true
+Key Class   : java.lang.String
+Key         : 1
+Value Class : java.lang.String
+Value       : value1
+```
+
+* Try to do another put using different values or using an existing key.
+* Try to remove an entry using a key
+* While still connected to `gfsh` stop the *locator* and the *server*
+
+```
+gfsh> stop server --name=server1
+gfsh> stop locator --name=locator1
+
+```
+## Final step
+
+* Before moving the to next lab run the script `startGeode.sh` under `$PROJECT/data`
+* Access the GemFire REST at http://192.168.56.10:8888/gemfire-api/docs/index.html
+* Execute the `deployFunctionVM.sh` to deploy functions for the Spark Connector.
+
+# References:
+
+* [Web site](http://geode.incubator.apache.org)
+* [Documentation](http://geode-docs.cfapps.io/)
+* [ Wiki](http://cwiki.apache.org/confluence/display/GEODE)
+* [JIRA](https://issues.apache.org/jira/browse/GEODE)
diff --git a/SparkML.md b/SparkML.md
@@ -0,0 +1,56 @@
+# Spark ML
+
+## Building the StocksSpark project
+
+StocksSpark is an sbt project. Just execute `sbt package` from project folder and the package will be generated under `target`.
+
+## Submitting Spark jobs
+
+There are two convenient scripts that submit spark jobs for **training** and **evaluating** the model, respectively `train.sh` and `evaluate.sh`.
+
+![Spark Cluster Overview](http://spark.apache.org/docs/latest/img/cluster-overview.png)
+
+### Training the model
+```
+spark-submit --class io.pivotal.demo.StockInferenceDemo --driver-memory 1G \
+  --executor-memory 1G \
+  --jars ~/.m2/repository/io/pivotal/gemfire/spark/gemfire-spark-connector_2.10/0.5.0/gemfire-spark-connector_2.10-0.5.0.jar,$GEODE_HOME/lib/gemfire-core-dependencies.jar \
+  --master local[*] $PROJECT/StocksSpark/target/scala-2.10/stocksspark_2.10-1.0.jar train
+```
+
+### Evaluating
+
+```
+spark-submit --class io.pivotal.demo.StockInferenceDemo --driver-memory 1G \
+  --executor-memory 1G \
+  --jars ~/.m2/repository/io/pivotal/gemfire/spark/gemfire-spark-connector_2.10/0.5.0/gemfire-spark-connector_2.10-0.5.0.jar,$GEODE_HOME/lib/gemfire-core-dependencies.jar \
+  --master local[*] $PROJECT/StocksSpark/target/scala-2.10/stocksspark_2.10-1.0.jar evaluate
+```
+
+## Automation through SpringXD
+
+```
+stream create --name training --definition "trigger --fixedDelay=300 | shell --command='./train.sh'" --deploy
+```
+## Querying results through Zeppelin
+
+### Using Geode Interpreter
+
+On Zeppelin UI:
+```
+%geode.oql
+select * from /Predictions order by entryTimestamp
+```
+### Using Spark SQL Interpreter
+
+```
+%sql
+PENDING
+```
+
+## References
+
+* [SBT - Scala building tool](http://www.scala-sbt.org/)
+* [Apache Spark ML Programming Guide](http://spark.apache.org/docs/latest/ml-guide.html)
+* [Apache Spark Cluster Overview](http://spark.apache.org/docs/latest/cluster-overview.html)
+* [Apache Geode Spark Connector](https://github.com/apache/incubator-geode/tree/develop/gemfire-spark-connector)