Skip to content

Course: Analysis and Design of Information Systems NTUA 2021-2022

Notifications You must be signed in to change notification settings

el2kaps/Information-Systems

Repository files navigation

Information-Systems

This project is part of the course "Analysis and Design of Information Systems"
School of ECE, National Technical University of Athens 2021-2022
Team:
Eleftherios Lymperopoulos
Eleni-Elpida Kapsali
Theodoti Stoikou

A study of the Trino (Presto) polystore for executing SQL analytic queries

Trino download Trino: The Definitive Guide book download


The purpose of this project is to connect the distributed SQL query engine Trino with the databases MongoDB, Apache HBase, Apache Cassandra and study the properties "Query optimization", "Scalability" and "Performance" of the system.

The architecture of the implemented system:

✔️utility-scripts directory contains scripts we used to set up the VMs we used.
For example to create a passwordless connection between the VMs and NAT since only one VM has a public IPv4.

Set up the system

Step 1: Install the databases using the official guides
✔️MongoDB on Machine 1
Start MongoDB: sudo systemctl start mongod.service
Stop MongoDB: sudo systemctl start mongod.service
✔️Apache Cassandra on Machine 2
✔️Apache HBase on Machine 3

Step 2: Set up the databases
Now we must set up the databases so as to LISTEN to the private network's addresses 192.168.0.1, 192.168.0.2, 192.168.0.3 rather than localhost (127.0.0.1).
✔️MongoDB
Change network interfaces in mongod.conf to:

# network interfaces
net:
  port: 27017
  bindIp: 192.168.0.1
  and restart the database. <br>

Use the command sudo lsof -iTCP -sTCP:LISTEN | grep mongo to check that the database listens to the desired IP.
mongod.conf in our system is located in /etc/ directory.
✔️Cassandra
Replace cassandra.yaml with the cassandra.yaml in the Cassandra directory.
Use the command sudo lsof -iTCP -sTCP:LISTEN | grep cassandrato check that the database listens to the desired IP (192.168.02).
cassandra.yaml in our system is located in /etc/cassandra directory.

✔️HBase
HBase's set up is a little bit more tricky because Trino doesn't provides an HBase connenctor but a Phoenix connector. As Trino documentation states to query HBase data through Phoenix, you need:

  • Network access from the Trino coordinator and workers to the ZooKeeper servers. The default port is 2181.
  • A compatible version of Phoenix. We use version 5.
  • Install Azul Zulu to each macine.
    Azul Zulu is an open source implementation of the Java Standard Edition ("SE") specification. It is a binary build of the OpenJDK open source project. Zulu provides a Java Runtime Environment needed for Java applications to run.
    We used version zulu11.54.25-ca-jdk11.0.14.1-linux_x64 (Java 11).
    Add Azul Zulu to your PATH environment variable, so that you can execute java from any directory without specifying the full path.
export PATH=/home/user/azul-zulu/zulu11.54.25-ca-jdk11.0.14.1-linux_x64/bin:$PATH
  • Install Phoenix 5 using the official guide. In our system we use phoenix-hbase-2.4-5.1.2-bin.
  • Replace hbase-site.xml in both phoenix-hbase-2.4-5.1.2-bin/bin and hbase-2.4.11/conf with the hbase-site.xml in Hbase-conf directory.
  <property>
    <name>hbase.cluster.distributed</name>
    <value>false</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
  <property>
    <name>phoenix.schema.isNamespaceMappingEnabled</name>
    <value>true</value>
  </property>
</configuration>

Step 3: Set up Trino

  • Copy repository's Trino directories to the respective Machine.
  • Start Trino servers (one for each machine)
    cd Trino/trino/server-373
    bin/launcher run
    
  • To stop a Trino server run
    cd Trino/trino/server-373
    bin/launcher stop
    

Step 5: Run Trino CLI

  • At the machine where the coordinator Trino node runs
    cd Trino
    ./trino
    

✔️Now you can query the databases by writing SQL queries to Trino CLI.

Examples

  • Checks that all workers are connected

  • Check that we have access to all dbs

image

image

  • To work on a specific schema ex. mongodb.tpcds use mongodb.tpcds Output: USE Now store_returns refers to the store_returns table stored in MongoDB. image

  • Use DESCRIBE <table name> to view sql columns of the table. image

  • Use SHOW STATS for approximated statistics for the named table.

About

Course: Analysis and Design of Information Systems NTUA 2021-2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages