Skip to content

Commit

Permalink
Adding instructions for SparkML
Browse files Browse the repository at this point in the history
  • Loading branch information
William Markito committed Aug 23, 2015
1 parent 843fca6 commit 9bd2536
Show file tree
Hide file tree
Showing 2 changed files with 152 additions and 0 deletions.
96 changes: 96 additions & 0 deletions Geode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Apache Geode Lab

Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.

## Build

* Obtaining the source code

```
$ git clone http://github.com/apache/incubator-geode
```

* Building

```
$ cd incubator-geode
$ git checkout develop
$ ./gradlew build -Dskip.tests=true
```

## Starting a Geode cluster

* Starting a `locator` and a `server`

```
$ gfsh start locator --name=locator1
...
$ gfsh start server --name=server1 --locators=localhost[10334]
...
```

## Using `gfsh` (Geode CLI)

```
$ gfsh
_________________________ __
/ _____/ ______/ ______/ /____/ /
/ / __/ /___ /_____ / _____ /
/ /__/ / ____/ _____/ / / / /
/______/_/ /______/_/ /_/ v1.0.0-incubating-SNAPSHOT
Monitor and Manage GemFire
gfsh>connect
Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=192.168.1.94, port=1099] ..
gfsh>list members
Name | Id
-------- | ---------------------------------------
locator1 | anakin(locator1:70957:locator)<v0>:9773
server1 | anakin(server1:71106)<v1>:34411
```

* Creating a `region` and basic operations

```
gfsh>create region --name=myRegion --type=PARTITION
Member | Status
------- | ---------------------------------------
server1 | Region "/myRegion" created on "server1"
gfsh>put --key=1 --value="value1" --region=/myRegion
Result : true
Key Class : java.lang.String
Key : 1
Value Class : java.lang.String
Old Value : <NULL>
gfsh>get --key=1 --region=/myRegion
Result : true
Key Class : java.lang.String
Key : 1
Value Class : java.lang.String
Value : value1
```

* Try to do another put using different values or using an existing key.
* Try to remove an entry using a key
* While still connected to `gfsh` stop the *locator* and the *server*

```
gfsh> stop server --name=server1
gfsh> stop locator --name=locator1
```
## Final step

* Before moving the to next lab run the script `startGeode.sh` under `$PROJECT/data`
* Access the GemFire REST at http://192.168.56.10:8888/gemfire-api/docs/index.html
* Execute the `deployFunctionVM.sh` to deploy functions for the Spark Connector.

# References:

* [Web site](http://geode.incubator.apache.org)
* [Documentation](http://geode-docs.cfapps.io/)
* [ Wiki](http://cwiki.apache.org/confluence/display/GEODE)
* [JIRA](https://issues.apache.org/jira/browse/GEODE)
56 changes: 56 additions & 0 deletions SparkML.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Spark ML

## Building the StocksSpark project

StocksSpark is an sbt project. Just execute `sbt package` from project folder and the package will be generated under `target`.

## Submitting Spark jobs

There are two convenient scripts that submit spark jobs for **training** and **evaluating** the model, respectively `train.sh` and `evaluate.sh`.

![Spark Cluster Overview](http://spark.apache.org/docs/latest/img/cluster-overview.png)

### Training the model
```
spark-submit --class io.pivotal.demo.StockInferenceDemo --driver-memory 1G \
--executor-memory 1G \
--jars ~/.m2/repository/io/pivotal/gemfire/spark/gemfire-spark-connector_2.10/0.5.0/gemfire-spark-connector_2.10-0.5.0.jar,$GEODE_HOME/lib/gemfire-core-dependencies.jar \
--master local[*] $PROJECT/StocksSpark/target/scala-2.10/stocksspark_2.10-1.0.jar train
```

### Evaluating

```
spark-submit --class io.pivotal.demo.StockInferenceDemo --driver-memory 1G \
--executor-memory 1G \
--jars ~/.m2/repository/io/pivotal/gemfire/spark/gemfire-spark-connector_2.10/0.5.0/gemfire-spark-connector_2.10-0.5.0.jar,$GEODE_HOME/lib/gemfire-core-dependencies.jar \
--master local[*] $PROJECT/StocksSpark/target/scala-2.10/stocksspark_2.10-1.0.jar evaluate
```

## Automation through SpringXD

```
stream create --name training --definition "trigger --fixedDelay=300 | shell --command='./train.sh'" --deploy
```
## Querying results through Zeppelin

### Using Geode Interpreter

On Zeppelin UI:
```
%geode.oql
select * from /Predictions order by entryTimestamp
```
### Using Spark SQL Interpreter

```
%sql
PENDING
```

## References

* [SBT - Scala building tool](http://www.scala-sbt.org/)
* [Apache Spark ML Programming Guide](http://spark.apache.org/docs/latest/ml-guide.html)
* [Apache Spark Cluster Overview](http://spark.apache.org/docs/latest/cluster-overview.html)
* [Apache Geode Spark Connector](https://github.com/apache/incubator-geode/tree/develop/gemfire-spark-connector)

0 comments on commit 9bd2536

Please sign in to comment.