Skip to content
This repository has been archived by the owner on Oct 8, 2020. It is now read-only.

Commit

Permalink
Merge branch 'release/0.4.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
Aklakan committed Jun 26, 2018
2 parents 88a207f + 585ba33 commit 278f6ef
Show file tree
Hide file tree
Showing 5,355 changed files with 6,456 additions and 508,097 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,6 @@ hs_err_pid*
stat*.txt

.idea

scalastyle-output.xml

10 changes: 10 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
language: scala
sudo: false
cache:
directories:
- $HOME/.m2
scala:
- 2.11.11
script:
- mvn scalastyle:check
- mvn test
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,30 @@ SANSA Query is a library to perform queries directly into [Spark](https://spark.
SANSA uses vertical partitioning (VP) approach and is designed to support extensible partitioning of RDF data. Instead of dealing with a single three-column table (s, p, o), data is partitioned into multiple tables based on the used RDF predicates, RDF term types and literal datatypes. The first column of these tables is always a string representing the subject. The second column always represents the literal value as a Scala/Java datatype. Tables for storing literals with language tags have an additional third string column for the language tag. Its uses [Sparqlify](https://github.com/AKSW/Sparqlify) as a scalable SPARQL-SQL rewriter.

### SANSA Query Spark
On SANSA Query Spark the method for partitioning an `RDD[Triple]` is located in [RdfPartitionUtilsSpark](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-spark-parent/sansa-rdf-spark-core/src/main/scala/net/sansa_stack/rdf/spark/partition/core/RdfPartitionUtilsSpark.scala). It uses an [RdfPartitioner](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartitioner.scala) which maps a Triple to a single [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartition.scala) instance.
On SANSA Query Spark the method for partitioning an `RDD[Triple]` is located in [RdfPartitionUtilsSpark](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-spark/src/main/scala/net/sansa_stack/rdf/spark/partition/core/RdfPartitionUtilsSpark.scala). It uses an [RdfPartitioner](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartitioner.scala) which maps a Triple to a single [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartition.scala) instance.

* [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartition.scala) - as the name suggests, represents a partition of the RDF data and defines two methods:
* [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartition.scala) - as the name suggests, represents a partition of the RDF data and defines two methods:
* `matches(Triple): Boolean`: This method is used to test whether a triple fits into a partition.
* `layout: TripleLayout`: This method returns the [TripleLayout](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/layout/TripleLayout.scala) associated with the partition, as explained below.
* `layout: TripleLayout`: This method returns the [TripleLayout](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/layout/TripleLayout.scala) associated with the partition, as explained below.
* Furthermore, RdfPartitions are expected to be serializable, and to define equals and hash code.
* TripleLayout instances are used to obtain framework-agnostic compact tabular representations of triples according to a partition. For this purpose it defines the two methods:
* `fromTriple(triple: Triple): Product`: This method must, for a given triple, return its representation as a [Product](https://www.scala-lang.org/files/archive/api/2.11.8/index.html#scala.Product) (this is the super class of all Scala tuples)
* `schema: Type`: This method must return the exact Scala type of the objects returned by `fromTriple`, such as `typeOf[Tuple2[String,Double]]`. Hence, layouts are expected to only yield instances of one specific type.

See the [available layouts](https://github.com/SANSA-Stack/SANSA-RDF/tree/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/layout) for details.
See the [available layouts](https://github.com/SANSA-Stack/SANSA-RDF/tree/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/layout) for details.

## Usage

The following Scala code shows how to query an RDF file SPARQL syntax (be it a local file or a file residing in HDFS):
```scala

val graphRdd = NTripleReader.load(spark, new File("path/to/rdf.nt"))
val spark: SparkSession = ...

val partitions = RdfPartitionUtilsSpark.partitionGraph(graphRdd)
val lang = Lang.NTRIPLES
val triples = spark.rdf(lang)("path/to/rdf.nt")


val partitions = RdfPartitionUtilsSpark.partitionGraph(triples)
val rewriter = SparqlifyUtils3.createSparqlSqlRewriter(spark, partitions)

val qef = new QueryExecutionFactorySparqlifySpark(spark, rewriter)
Expand All @@ -38,6 +42,8 @@ val port = 7531
val server = FactoryBeanSparqlServer.newInstance.setSparqlServiceFactory(qef).setPort(port).create()
server.join()


```
An overview is given in the [FAQ section of the SANSA project page](http://sansa-stack.net/faq/#sparql-queries). Further documentation about the builder objects can also be found on the [ScalaDoc page](http://sansa-stack.net/scaladocs/).

## How to Contribute
We always welcome new contributors to the project! Please see [our contribution guide](http://sansa-stack.net/contributing-to-sansa/) for more details on how to get started contributing to SANSA.
16 changes: 16 additions & 0 deletions bundle-scaladocs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

targetFolder="target/scaladocs-bundle"
mkdir -p "$targetFolder"

for srcFolder in `find . -type d -name scaladocs`; do
moduleName=$(basename $(dirname $(dirname $(dirname "$srcFolder"))))

if [[ "$moduleName" == "." ]]; then
moduleName="sansa-parent"
fi

cp -rf "$srcFolder" "$targetFolder/$moduleName"
# echo "$ --- $moduleName";
done

170 changes: 138 additions & 32 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>net.sansa-stack</groupId>
<artifactId>sansa-query-parent_2.11</artifactId>
<version>0.3.0</version>
<version>0.4.0</version>
<packaging>pom</packaging>

<name>SANSA Stack - Query Layer - Parent</name>
Expand All @@ -17,32 +17,28 @@
<url>http://sda.tech</url>
</organization>

<modules>
<module>sansa-query-spark-parent</module>
<module>sansa-query-flink-parent</module>
</modules>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<encoding>UTF-8</encoding>

<sansa.version>0.3.0</sansa.version>
<sansa.version>0.4.0</sansa.version>

<scala.version>2.11.11</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.classifier>${scala.binary.version}</scala.classifier>

<scala.version.suffix>_${scala.binary.version}</scala.version.suffix>

<spark.version>2.2.1</spark.version>
<flink.version>1.4.0</flink.version>
<spark.version>2.3.1</spark.version>
<flink.version>1.5.0</flink.version>

<jena.version>3.5.0</jena.version>
<jsa.subversion>2</jsa.subversion>
<jena.version>3.7.0</jena.version>
<jsa.subversion>3</jsa.subversion>

<jsa.version>${jena.version}-${jsa.subversion}</jsa.version>

<scalastyle.config.path>${project.basedir}/scalastyle-config.xml</scalastyle.config.path>

<httpcomponents.version>4.5.3</httpcomponents.version>
</properties>
Expand Down Expand Up @@ -91,6 +87,7 @@
</developer>
</developers>


<profiles>
<profile>
<id>doclint-java8-disable</id>
Expand Down Expand Up @@ -146,6 +143,20 @@
</plugins>
</build>
</profile>

<!-- profile necessary for Scalastyle plugin to find the conf file -->
<profile>
<id>root-dir</id>
<activation>
<file>
<exists>${project.basedir}/../../scalastyle-config.xml</exists>
</file>
</activation>
<properties>
<scalastyle.config.path>${project.basedir}/../scalastyle-config.xml</scalastyle.config.path>
</properties>
</profile>

</profiles>

<repositories>
Expand Down Expand Up @@ -196,6 +207,39 @@
<dependencyManagement>
<dependencies>


<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>sansa-rdf-common${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>

<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>sansa-rdf-spark${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>

<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>sansa-rdf-flink${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>


<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>sansa-query-spark${scala.version.suffix}</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>


<!-- http components -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
Expand Down Expand Up @@ -235,12 +279,21 @@
<version>${scala.version}</version>
</dependency>


<!-- Benchmarking bsbm and visualization of the results -->
<dependency>
<groupId>org.aksw.bsbm</groupId>
<artifactId>bsbm-jsa</artifactId>
<version>3.1.1</version>
<version>3.1.2</version>
</dependency>

<dependency>
<groupId>org.aksw.beast</groupId>
<artifactId>beast-bundle</artifactId>
<version>1.0.0</version>
</dependency>


<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand All @@ -258,7 +311,7 @@
<dependency>
<groupId>org.aksw.sparqlify</groupId>
<artifactId>sparqlify-core</artifactId>
<version>0.8.3</version>
<version>0.8.5</version>
<exclusions>
<exclusion>
<groupId>org.aksw.sparqlify</groupId>
Expand All @@ -275,25 +328,6 @@
</exclusions>
</dependency>

<dependency>
<groupId>net.sansa-stack</groupId>
<artifactId>sansa-rdf-common-partition${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>

<dependency>
<groupId>net.sansa-stack</groupId>
<artifactId>sansa-rdf-test-resources${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>

<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>sansa-rdf-partition-sparqlify${scala.version.suffix}</artifactId>
<version>${sansa.version}</version>
</dependency>


<dependency>
<groupId>org.aksw.jena-sparql-api</groupId>
<artifactId>jena-sparql-api-server-standalone</artifactId>
Expand All @@ -313,6 +347,13 @@
<version>3.0.3</version>
</dependency>

<dependency>
<groupId>com.holdenkarau</groupId>
<artifactId>spark-testing-base_${scala.binary.version}</artifactId>
<version>2.1.0_0.6.0</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
Expand Down Expand Up @@ -569,6 +610,66 @@
</configuration>
</plugin>

<!--This plugin's configuration is used to store Eclipse m2e settings
only. It has no influence on the Maven build itself. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>
net.alchim31.maven
</groupId>
<artifactId>
scala-maven-plugin
</artifactId>
<versionRange>
[3.3.1,)
</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
<goal>add-source</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore></ignore>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>

<!-- Scalastyle -->
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
<version>1.0.0</version>
<configuration>
<verbose>false</verbose>
<failOnViolation>true</failOnViolation>
<includeTestSourceDirectory>true</includeTestSourceDirectory>
<failOnWarning>false</failOnWarning>
<sourceDirectory>${project.basedir}/src/main/scala</sourceDirectory>
<testSourceDirectory>${project.basedir}/src/test/scala</testSourceDirectory>
<!-- we use a central config located in the root directory -->
<configLocation>${scalastyle.config.path}</configLocation>
<outputFile>${project.basedir}/scalastyle-output.xml</outputFile>
<outputEncoding>UTF-8</outputEncoding>
</configuration>
<executions>
<execution>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
</build>
Expand All @@ -592,4 +693,9 @@
</snapshotRepository>
</distributionManagement>

<modules>
<module>sansa-query-common</module>
<module>sansa-query-flink</module>
<module>sansa-query-spark</module>
</modules>
</project>
Loading

0 comments on commit 278f6ef

Please sign in to comment.