This project is part of the course "Analysis and Design of Information Systems"
School of ECE, National Technical University of Athens 2021-2022
Team:
Eleftherios Lymperopoulos
Eleni-Elpida Kapsali
Theodoti Stoikou
The purpose of this project is to connect the distributed SQL query engine Trino with the databases MongoDB, Apache HBase, Apache Cassandra and study the properties "Query optimization", "Scalability" and "Performance" of the system.
The architecture of the implemented system:
✔️utility-scripts
directory contains scripts we used to set up the VMs we used.
For example to create a passwordless connection between the VMs and NAT since only one VM has a public IPv4.
Step 1: Install the databases using the official guides
✔️MongoDB on Machine 1
Start MongoDB: sudo systemctl start mongod.service
Stop MongoDB: sudo systemctl start mongod.service
✔️Apache Cassandra on Machine 2
✔️Apache HBase on Machine 3
Step 2: Set up the databases
Now we must set up the databases so as to LISTEN to the private network's addresses 192.168.0.1, 192.168.0.2, 192.168.0.3 rather than localhost (127.0.0.1).
✔️MongoDB
Change network interfaces in mongod.conf
to:
# network interfaces
net:
port: 27017
bindIp: 192.168.0.1
and restart the database. <br>
Use the command sudo lsof -iTCP -sTCP:LISTEN | grep mongo
to check that the database listens to the desired IP.
mongod.conf
in our system is located in /etc/ directory.
✔️Cassandra
Replace cassandra.yaml
with the cassandra.yaml
in the Cassandra directory.
Use the command sudo lsof -iTCP -sTCP:LISTEN | grep cassandra
to check that the database listens to the desired IP (192.168.02).
cassandra.yaml
in our system is located in /etc/cassandra directory.
✔️HBase
HBase's set up is a little bit more tricky because Trino doesn't provides an HBase connenctor but a Phoenix connector.
As Trino documentation states to query HBase data through Phoenix, you need:
- Network access from the Trino coordinator and workers to the ZooKeeper servers. The default port is 2181.
- A compatible version of Phoenix. We use version 5.
- Install Azul Zulu to each macine.
Azul Zulu is an open source implementation of the Java Standard Edition ("SE") specification. It is a binary build of the OpenJDK open source project. Zulu provides a Java Runtime Environment needed for Java applications to run.
We used version zulu11.54.25-ca-jdk11.0.14.1-linux_x64 (Java 11).
Add Azul Zulu to your PATH environment variable, so that you can execute java from any directory without specifying the full path.
export PATH=/home/user/azul-zulu/zulu11.54.25-ca-jdk11.0.14.1-linux_x64/bin:$PATH
- Install Phoenix 5 using the official guide. In our system we use phoenix-hbase-2.4-5.1.2-bin.
- Replace
hbase-site.xml
in bothphoenix-hbase-2.4-5.1.2-bin/bin
andhbase-2.4.11/conf
with thehbase-site.xml
in Hbase-conf directory.
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
</configuration>
Step 3: Set up Trino
- Copy repository's
Trino
directories to the respective Machine. - Start Trino servers (one for each machine)
cd Trino/trino/server-373 bin/launcher run
- To stop a Trino server run
cd Trino/trino/server-373 bin/launcher stop
Step 5: Run Trino CLI
- At the machine where the coordinator Trino node runs
cd Trino ./trino
✔️Now you can query the databases by writing SQL queries to Trino CLI.
- Checks that all workers are connected
- Check that we have access to all dbs