Skip to content

Latest commit

 

History

History
119 lines (87 loc) · 7.29 KB

dv-on-openshift.adoc

File metadata and controls

119 lines (87 loc) · 7.29 KB

Data Virtualization on OpenShift

This guide provides documentation and code recipes for deploying and configuring Data Virtualization on OpenShift. Typically a deployment of Data Virtualization includes defining a Virtual Database (VDB) using SQL/DDL and then configuring any of the data sources this VDB uses to connect and read and write data. Additionally, there may be scenarios where one may also need to configure the Authentication to RH-SSO.

There are couple different ways one can develop & deploy Data Virtualization.

  • Defining a YAML file with the vdb contents, then using the OpenShift Teiid Operator to deploy. Example below.

  • Defining a maven based Java project with a VDB file, using Teiid Spring Boot, then using the Teiid Operator to deploy the generated Fat Jar.

The user is expected to have working knowledge of of Openshift and the Operator model. If you have not already installed the teiid operator in your OpenShift instance, please install using directions here

Note
Each option below represents the same example with different ways of development.

Virtual Database defined in DDL

dv-customer.yaml
apiVersion: teiid.io/v1alpha1
kind: VirtualDatabase
metadata:
  name: dv-customer
spec:
  replicas: 1 (1)
  datasources: (2)
    - name: accountdb
      type: postgresql
      properties:
        - name: username
          value: user
        - name: password
          value: changeit
        - name: jdbc-url
          value: jdbc:postgresql://accounts/accounts
  build:
    source:
      dependencies: (3)
        - org.postgresql:postgresql:42.1.4
      ddl: | (4)
        CREATE DATABASE customer OPTIONS (ANNOTATION 'Customer VDB');
        USE DATABASE customer;
        ...
      mavenRepositories: (5)
        central: https://repo.maven.apache.org/maven2
  expose: (6)
    - Route
  1. defines the number of instances that you want to deploy. By default this is set to 1

  2. defines the list of all the configuration properties connection configuration to the data sources. The example above shows properties to connect to a PostgreSQL database. See the section on "Data Source Support" for the different supported data sources and the properties required for their configuration.

  3. defines a list of maven dependency JAR files in GAV format (groupId:artifactid:version) especially for defining the JDBC driver files or any custom dependencies for data sources. Check specific data source support under "Data Source Support" section. Typically most libraries that are available in public maven repositories are automatically added to the Operator build.

  4. defines the VDB in DDL form. Consult the Teiid documentation on how to build a VDB using DDL. Defining all DDL is beyond the scope of this document.

  5. In case any dependencies or VDB are defined in a private or non-public repository then use this property to define that repository location. One can configure more than one repository. See more details at here for full Maven configuration

  6. Defines what services to expose.

    • A Route exposes a route to the OData API.

    • A LoadBalancer type exposes a Ingress host/port that can be accessed from external application clients into the openShift/Kubernetes cluster.

    • A NodePort will expose the port on the Node for testing.

    • ExposeVia3scale will skip creating a Route, however will generate annotations on the service to be discovered by the 3scale.

A full example is provided here dv-customer.yaml

Virtual Database defined as Maven Artifact

This example shows how to deploy a VDB that is defined as a maven artifact using the Teiid Operator.

After your VDB is available in a maven repository, you can use a YAML based custom resource similar to below to deploy the VDB in OpenShift.

Note
The below YAML file is very similar to example above, most portions are omitted for space and clarity. Look for a link to a full example below.
dv-customer.yaml
apiVersion: teiid.io/v1alpha1
kind: VirtualDatabase
metadata:
  name: dv-customer
spec:
   ...
  build:
    source:
      maven: com.example:customer-vdb:1.0.0:vdb (1)
    mavenRepositories: (2)
      central: https://repo.maven.apache.org/maven2
  1. Define maven coordinates to locate the VDB in GAV format. An example VDB that is defined as Maven Artifact is defined here dv-customer. If you are sharing your VDB, or collaborating and would like to keep the versioned history then one should select to develop the VDB in this fashion. When using the VDB-Import feature the VDB must be defined in this fashion.

  2. If your maven repository is at private location, then provide the link here. See more details at here for full Maven configuration

Note
The options to provide the settings.xml file also apply here from above example Virtual Database defined in DDL

A full example is provided here dv-customer.yaml

VirtualDatabase Deployment

For deployment of Virtual Database check VDB Deployment

VirtualDatabase DDL Limitations

The above sections have shown different ways of configuring and deploying VDB, however the VDB is always built using DDL. To develop VDB using DDL please refer to Teiid’s Reference Guide, which will provide details about different statements and syntax that Teiid supports. When a VDB is being deployed in OpenShift as described above, the images that are generated needs to be in "immutable" state. That means, no matter how many times the image is stopped and started the same behavior must persist. However, when VDB is defined using the stataments like

IMPORT FOREIGN SCHEMA public FROM SERVER sampledb INTO accounts;

the metadata (schema) of the data source is imported at deployment time of the VDB, that happens when the image is being started. That also means image contents itself is being modified, which goes against the "immutable" principles of this architecture. However, as long as the underlaying data source always returns same metadata this is not an issue, if the data source returns different metadata each time image is started then that will be a issue.

It is strongly recommended that instead of using above IMPORT FOREIGN SCHEMA statement, one can physically define all the metadata of the underlying source, such as all the tables, procedures and functions that data source represents, then the image contents will always remain constant.

Also, IMPORT FOREIGN SCHEMA statement is an expensive operation, as it needs to query the underlying physical data source every time a pod restarts, not only it is going to place strain on the underlying physical data source and amount of time it takes can vary on data source which will contribute to Pod startup time. Imagine if you are starting 100 pods at a given time, all of them accessing the physical data source all at the same time and brining down the data source.

For these reasons, try to provide the full metadata rather than using above statement. There will be futher work to improve this process in upcoming releases.