Skip to content

Commit

Permalink
More documentation updates for 1.3.0 release
Browse files Browse the repository at this point in the history
- changed zeppelin docs to use the '%jdbc' intepreter since the '%snappydata' interpreter is
  disabled by design in secure clusters
- tested and documented the steps to install and configure '%jdbc' interpreter from an
  upstream Zeppelin release
- remove inclusion of snappydata-zeppelin in the product jars
- scanned and updated more docs like building_from_source, isight/quick_start_steps
  • Loading branch information
sumwale committed Oct 17, 2021
1 parent 755c88b commit 436612f
Show file tree
Hide file tree
Showing 8 changed files with 72 additions and 123 deletions.
3 changes: 3 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -1167,9 +1167,11 @@ task product(type: Zip) {
into "${snappyProductDir}/benchmark"
}

/* (preferred one is the standard %jdbc interpreter in Zeppelin)
if (rootProject.hasProperty('enablePublish')) {
packageZeppelinInterpreter()
}
*/

if (rootProject.hasProperty('R.enable')) {
def targetRDir = "${snappyProductDir}/R"
Expand Down Expand Up @@ -1479,6 +1481,7 @@ task sparkPackage {
dependsOn ":snappy-core_${scalaBinaryVersion}:sparkPackage"
}

product.mustRunAfter clean, cleanAll
distTar.mustRunAfter clean, cleanAll, product
distZip.mustRunAfter clean, cleanAll, product
distRpm.mustRunAfter clean, cleanAll, product
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ The following core properties must be set in the **conf/leads** file:
| heap-size | Sets the maximum heap size for the Java VM, using SnappyData default resource manager settings. </br>For example, `-heap-size=8g` </br> It is recommended to allocate minimum **6-8 GB** of heap size per lead node. If you use the `-heap-size` option, by default SnappyData sets the critical-heap-percentage to 95% of the heap size, and the `eviction-heap-percentage` to 85.5% of the `critical-heap-percentage`. </br>SnappyData also sets resource management properties for eviction and garbage collection if the JVM supports them. | |
| dir | Working directory of the member that contains the SnappyData Server status file and the default location for the log file, persistent files, data dictionary, and so forth. | Current directory |
| classpath | Location of user classes required by the SnappyData Server. This path is appended to the current classpath | Appended to the current classpath |
| -zeppelin.interpreter.enable=true |Enable the SnappyData Zeppelin interpreter. Refer [How to use Apache Zeppelin with SnappyData](/howto/use_apache_zeppelin_with_snappydata.md) | |
| -zeppelin.interpreter.enable=true |Enable the SnappyData Zeppelin interpreter. No longer useful. Refer [How to use Apache Zeppelin with SnappyData](/howto/use_apache_zeppelin_with_snappydata.md) | |
| spark.executor.cores | The number of cores to use on each server. | |
| spark.jars | | |

Expand All @@ -69,7 +69,7 @@ You can add a line for each of the Lead members that you want to launch. Typical
In the following configuration, you are specifying the Spark UI port and the number of cores to use on each server as well as enabling the SnappyData Zeppelin interpreter

```
localhost -spark.ui.port=3333 -spark.executor.cores=16 -zeppelin.interpreter.enable=true
localhost -spark.ui.port=3333 -spark.executor.cores=16
```

!!!Tip
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,61 +5,43 @@ Multiple users can concurrently access a secure SnappyData cluster by configurin

!!! Note

* Currently, only the `%snappydata` and `%jdbc` interpreters are supported with a secure SnappyData cluster.
* Currently, only the `%jdbc` interpreter is supported with a secure SnappyData cluster.

* Each user accessing the secure SnappyData cluster should configure the `%snappydata` and `%jdbc` interpreters in Apache Zeppelin as described in this section.
* Each user accessing the secure SnappyData cluster should configure the `%jdbc` interpreter in Apache Zeppelin as described here.

## Step 1: Download, Install and Configure SnappyData
1. [Download and install SnappyData Enterprise Edition](../install.md) </br>

1. [Download and install SnappyData](../install.md).

2. [Configure the SnappyData cluster with security enabled](../security/security.md).

3. [Start the SnappyData cluster](start_snappy_cluster.md).

- Create a table and load data.
- Create a table and load data.

- Grant the required permissions for the users accessing the table.
- Grant the required permissions for the users accessing the table.

For example:

snappy> GRANT SELECT ON Table airline TO user2;
snappy> GRANT INSERT ON Table airline TO user3;
snappy> GRANT UPDATE ON Table airline TO user4;

!!! Note
User requiring INSERT, UPDATE or DELETE permissions also require explicit SELECT permission on a table.

5. Extract the contents of the Zeppelin binary package. </br>

6. Start the Zeppelin daemon using the command: </br> `./bin/zeppelin-daemon.sh start`

## Configure the JDBC Interpreter
Log on to Zeppelin from your web browser and configure the [JDBC Interpreter](https://zeppelin.apache.org/docs/0.8.2/interpreter/jdbc.html).
snappy> GRANT INSERT ON Table airline TO user3;
snappy> GRANT UPDATE ON Table airline TO user4;

Zeppelin web server is started on port 8080
http://<IP address>:8080/#/
To enable running `EXEC SCALA` also `GRANT`:

<a id="configinterpreter"></a>
## Configure the Interpreter
snappy> GRANT PRIVILEGE EXEC SCALA TO user2;

1. Log on to Zeppelin from your web browser and select **Interpreter** from the **Settings** option.

2. Edit the existing `%snappydata` and `%jdbc` interpreters and configure the interpreter properties.
The table lists the properties required for SnappyData:

| Property | Value |Description|
|--------|--------|--------|
|default.url|jdbc:snappydata://localhost:1527/|Specify the JDBC URL for SnappyData cluster in the format `jdbc:snappydata://<locator_hostname>:1527`|
|default.driver|io.snappydata.jdbc.ClientDriver|Specify the JDBC driver for SnappyData|
|default.password|<password>|The JDBC user password|
|default.user|<username>|The JDBC username|
!!! Note
User requiring INSERT, UPDATE or DELETE permissions also require explicit SELECT permission on a table.

3. **Dependency settings**</br> Since Zeppelin includes only PostgreSQL driver jar by default, you need to add the Client (JDBC) JAR file path for SnappyData with the `%jdbc` interpreter. The SnappyData Client (JDBC) JAR file (snappydata-jdbc-2.11\_1.3.0.jar) is available on [the release page](https://github.com/TIBCOSoftware/snappydata/releases/tag/v1.3.0). </br>
The SnappyData Client (JDBC) JAR file (snappydata-jdbc\_2.11-1.3.0.jar)can also be placed under **<ZEPPELIN\_HOME>/interpreter/jdbc** before starting Zeppelin instead of providing it in the dependency setting.
!!! IMPORTANT
Beware that granting EXEC SCALA privilege is overarching by design and essentially makes the user
equivalent to the database adminstrator since scala code can be used to modify anything using internal APIs.

4. If required, edit other properties, and then click **Save** to apply your changes.
4. Follow the remaining steps as given in [How to Use Apache Zeppelin with SnappyData](use_apache_zeppelin_with_snappydata.md)

**See also**

* [How to Use Apache Zeppelin with SnappyData](use_apache_zeppelin_with_snappydata.md)
* [How to connect using JDBC driver](/howto/connect_using_jdbc_driver.md)
* [How to connect using JDBC driver](../howto/connect_using_jdbc_driver.md)
104 changes: 28 additions & 76 deletions docs/howto/use_apache_zeppelin_with_snappydata.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,32 @@


## Step 1: Download, Install and Configure SnappyData
1. [Download and Install SnappyData](../install/install_on_premise.md) </br>
The product jars directory already includes the snappydata-zeppelin jar used by SnappyData and Zeppelin installations.
The table below lists the version of the SnappyData Zeppelin Interpreter and Apache Zeppelin Installer for the supported SnappyData Releases.

| SnappyData Zeppelin Interpreter | Apache Zeppelin Binary Package | SnappyData Release|
|---------------------------------|--------------------------------|-------------------|
|[Version 0.8.2.1](https://github.com/TIBCOSoftware/snappy-zeppelin-interpreter/releases/tag/v0.8.2.1) |[Version 0.8.2](http://archive.apache.org/dist/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-netinst.tgz) |[Release 1.3.0](https://github.com/TIBCOSoftware/snappydata/releases/tag/v1.3.0)|
|[Version 0.7.3.6](https://github.com/TIBCOSoftware/snappy-zeppelin-interpreter/releases/tag/v0.7.3.6) |[Version 0.7.3](http://archive.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-netinst.tgz) |[Release 1.2.0](https://github.com/TIBCOSoftware/snappydata/releases/tag/v1.2.0)|
1. [Download and install SnappyData](../install.md).

2. [Configure the SnappyData Cluster](../configuring_cluster/configuring_cluster.md).

3. In [lead node configuration](../configuring_cluster/configuring_cluster.md#configuring-leads) set the following properties:
3. [Start the SnappyData cluster](start_snappy_cluster.md).

- Enable the SnappyData Zeppelin interpreter by adding `-zeppelin.interpreter.enable=true`
4. Extract the contents of the [Zeppelin 0.8.2 binary package](http://archive.apache.org/dist/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-netinst.tgz).
Then `cd` into the extracted `zeppelin-0.8.2-bin-netinst` directory.</br>
Note that while these instructions work with any version of Zeppelin, the demo notebooks installed later
have been created and tested only on Zeppelin 0.8.2 and may not work correctly on other versions.

- In the **conf/spark-env.sh** file, set the `SPARK_PUBLIC_DNS` property to the public DNS name of the lead node. This enables the Member Logs to be displayed correctly to users accessing the [SnappyData Monitoring Console](../monitoring/monitoring.md) from outside the network.
In an AWS environment, this property is set automatically to the public address of the lead node so can be skipped.

4. [Start the SnappyData cluster](start_snappy_cluster.md).

5. Extract the contents of the Zeppelin binary package. </br>

6. The SnappyData Zeppelin interpreter is included in the product jars directory. Install it in Apache Zeppelin by executing the following command from Zeppelin's installation directory: </br>

./bin/install-interpreter.sh --name snappydata --artifact <product_install_directory>/jars/snappydata-zeppelin_2.11-<version_number>.jar

Zeppelin interpreter allows the SnappyData interpreter to be plugged into Zeppelin using which, you can run queries.
Install additional interpreters like below (angular is used by display panels of the sample notebooks installed later): </br>
5. Install a couple of additional interpreters (angular is used by display panels of the sample notebooks installed later): </br>

ZEPPELIN_INTERPRETER_DEP_MVNREPO=https://repo1.maven.org/maven2 ./bin/install-interpreter.sh --name angular,jdbc

These additional interpreters may need to be configured similar to the snappydata interpreter as described in the next section.
If you are using the `all` binary package from zeppelin instead of the `netinst` package linked in the previous step,
then you can skip this step.

6. Copy the [SnappyData JDBC client jar](https://github.com/TIBCOSoftware/snappydata/releases/download/v1.3.0/snappydata-jdbc_2.11-1.3.0.jar)
inside the `interpreter/jdbc` directory.

7. Download the predefined SnappyData notebooks with configuration [notebooks\_embedded\_zeppelin.tar.gz](https://github.com/TIBCOSoftware/snappy-zeppelin-interpreter/blob/master/examples/notebook/notebooks_embedded_zeppelin.tar.gz). </br> Extract and copy the contents of the compressed tar file (tar xzf) to the **notebook** folder in the Zeppelin installation on your local machine.
7. Download the predefined SnappyData notebooks with configuration [notebooks\_embedded\_zeppelin.tar.gz](https://github.com/TIBCOSoftware/snappy-zeppelin-interpreter/blob/master/examples/notebook/notebooks_embedded_zeppelin.tar.gz). </br>
Extract the contents of the compressed tar file (tar xzf) in the Zeppelin installation on your local machine.

8. Start the Zeppelin daemon using the command: </br> `bin/zeppelin-daemon.sh start`
8. Start the Zeppelin daemon using the command: </br> `./bin/zeppelin-daemon.sh start`

9. To ensure that the installation is successful, log into the Zeppelin UI (**http://localhost:8080** or <AWS-AMI\_PublicIP>:8080) from your web browser.

Expand All @@ -50,64 +40,26 @@ Refer [here](concurrent_apache_zeppelin_access_to_secure_snappydata.md) for inst
## Step 2: Configure Interpreter Settings

1. Log on to Zeppelin from your web browser and select **Interpreter** from the **Settings** option.
This will require administrator privileges, which has user name as `admin` by default.
This will require a user having administrator privileges, which is set to `admin` by default.
See **zeppelin-dir/conf/shiro.ini** file for the default admin password and other users and
update the file to use your preferred authentication scheme as required.

2. Click **Create** to add an interpreter. If the list of interpreters already has snappydata,
then skip this step and instead configure the existing interpreter as shown in the next step.</br> ![Create](../Images/create_interpreter.png)

3. From the **Interpreter group** drop-down select **SnappyData**.
![Configure Interpreter](../Images/snappydata_interpreter_properties.png)

!!! Note
If **SnappyData** is not displayed in the **Interpreter group** drop-down list, try the following options, and then restart Zeppelin daemon:

* Delete the **interpreter.json** file located in the **conf** directory (in the Zeppelin home directory).

* Delete the **zeppelin-spark_<_version_number_>.jar** file located in the **interpreter/SnappyData** directory (in the Zeppelin home directory).


4. Click the **Connect to existing process** option. The fields **Host** and **Port** are displayed.

5. Specify the host on which the SnappyData lead node is executing, and the SnappyData Zeppelin Port (Default is 3768).

| Property | Default Values | Description |
|----------|----------------|-------------|
|Host |localhost |Specify host on which the SnappyData lead node is executing |
|Port |3768 |Specify the Zeppelin server port |

6. Configure the interpreter properties. </br>The table lists the properties required for SnappyData.

| Property | Value | Description |
|----------|-------|-------------|
|default.url|jdbc:snappydata://localhost:1527/ | Specify the JDBC URL for SnappyData cluster in the format `jdbc:snappydata://<locator_hostname>:1527` |
|default.driver|io.snappydata.jdbc.ClientDriver| Specify the JDBC driver for SnappyData|
|snappydata.connection|localhost:1527| Specify the `host:clientPort` combination of the locator for the JDBC connection (only required if running smart connector) |
|master|local[*]| Specify the URI of the spark master (only local/split mode) |
|zeppelin.jdbc.concurrent.use|true| Specify the Zeppelin scheduler to be used. </br>Select **True** for Fair and **False** for FIFO |

7. If required, edit other properties, and then click **Save** to apply your changes.</br>


!!! Note
You can modify the default port number of the Zeppelin interpreter by setting the property:</br>
`-zeppelin.interpreter.port=<port_number>` in [lead node configuration](../configuring_cluster/configuring_cluster.md#configuring-leads).

## Additional Settings

1. Create a note and bind the interpreter by setting SnappyData as the default interpreter.</br> SnappyData Zeppelin Interpreter group consist of two interpreters. Click and drag *<_Interpreter_Name_>* to the top of the list to set it as the default interpreter.
2. Click on **edit** in the `jdbc` interpreter section.

| Interpreter Name | Description |
|------------------|-------------|
|%snappydata.snappydata or </br> %snappydata.spark | This interpreter is used to write Scala code in the paragraph. SnappyContext is injected in this interpreter and can be accessed using variable **snc** |
|%snappydata.sql | This interpreter is used to execute SQL queries on the SnappyData cluster. It also has features of executing approximate queries on the SnappyData cluster.|
3. Configure the interpreter properties. </br>The table below lists the properties required for SnappyData.

2. Click **Save** to apply your changes.
| Property | Value | Description |
|-------------|-------|-------------|
|default.driver |io.snappydata.jdbc.ClientDriver |Specify the JDBC driver for SnappyData |
|default.url |jdbc:snappydata://localhost:1527 |Specify the JDBC URL for SnappyData cluster in the format `jdbc:snappydata://<locator_hostname>:1527` |
|default.user |SQL user name or `app` |If security is enabled in the SnappyData cluster, then the configured user name else `app` |
|default.password |SQL user password or `app` |If security is enabled in the SnappyData cluster, then the password of the user else can be anything |
|zeppelin.splitQueries |true |Each query in a paragraph is executed apart and returns the result |
|zeppelin.jdbc.concurrent.use |true |Specify the Zeppelin scheduler to be used. </br>Select **True** for Fair and **False** for FIFO |
|zeppelin.jdbc.interpolation |true |If interpolation of `ZeppelinContext` objects into the paragraph text is allowed |

### Known Issue
4. If required, edit other properties, and then click **Save** to apply your changes.</br>

If you are using SnappyData Zeppelin Interpreter 0.7.1 and Zeppelin Installer 0.7 with SnappyData or future releases, the approximate result does not work on the sample table, when you execute a paragraph with the `%sql show-instant-results-first` directive.

## FAQs

Expand Down
14 changes: 6 additions & 8 deletions docs/install/building_from_source.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,27 @@ To build product artifacts in all supported formats (tarball, zip, rpm, deb):
```pre
> git clone https://github.com/TIBCOSoftware/snappydata.git --recursive
> cd snappydata
> ./gradlew cleanAll
> ./gradlew distProduct
> ./gradlew cleanAll distProduct
```

The artifacts are in **build-artifacts/scala-2.11/distributions**

You can also add the flags `-PenablePublish -PR.enable` to get them in the form as in an official
SnappyData distributions but that also requires zeppelin-interpreter and R as noted below.
SnappyData distributions but that also requires an installation of R as noted below.

To build all product artifacts that are in the official SnappyData distributions:

```pre
> git clone https://github.com/TIBCOSoftware/snappydata.git --recursive
> git clone https://github.com/TIBCOSoftware/snappy-zeppelin-interpreter.git
> cd snappydata
> ./gradlew cleanAll
> ./gradlew product copyShadowJars distTar -PenablePublish -PR.enable
> ./gradlew cleanAll product copyShadowJars distTar -PenablePublish -PR.enable
```

The artifacts are in **build-artifacts/scala-2.11/distributions**

Building SparkR (with the `R.enable` flag) requires R to be installed locally and at least the following
R packages along with their dependencies: knitr, markdown, rmarkdown, testthat
Building SparkR with the `-PR.enable` flag requires R 3.x or 4.x to be installed locally.
At least the following R packages along with their dependencies also need to be installed:
`knitr`, `markdown`, `rmarkdown`, `testthat`


## Repository Layout
Expand Down
12 changes: 12 additions & 0 deletions docs/isight/quick_start_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,18 @@ Connecting the SnappyData Interpreter to the SnappyData cluster is represented i

![Example](../Images/isightconnect.png)

## Important Note

The `%snappydata.*` interpreters described in the sections below are no longer preferred due to being
unsupported on secure clusters. The standard `%jdbc` interpreter with support for `EXEC SCALA` provides
equivalent functionality for both secure and insecure clusters.

Refer to [How to Use Apache Zeppelin with SnappyData](../howto/use_apache_zeppelin_with_snappydata.md) for more details.

The previous way noted below can still useful for AQP queries with the `show-instant-results-first` directive
as described in the sections below, but it works only for insecure clusters and for all other cases,
use of `%jdbc` interpreter should be preferred.

## Using the Interpreter
SnappyData Interpreter group consists of the interpreters `%snappydata.spark` and `%snappydata.sql`.
To use an interpreter, add the associated interpreter directive with the format, `%<Interpreter_name>` at the beginning of a paragraph in your note. In a paragraph, use one of the interpreters, and then enter required commands.
Expand Down
Loading

0 comments on commit 436612f

Please sign in to comment.