Merge branch 'release/0.4.0'

SANSA-Stack · Jun 26, 2018 · 278f6ef · 278f6ef
2 parents 88a207f + 585ba33
commit 278f6ef
Show file tree

Hide file tree

Showing 5,355 changed files with 6,456 additions and 508,097 deletions.
diff --git a/.gitignore b/.gitignore
@@ -51,3 +51,6 @@ hs_err_pid*
 stat*.txt
 
 .idea
+
+scalastyle-output.xml
+
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,10 @@
+language: scala
+sudo: false
+cache:
+  directories:
+  - $HOME/.m2
+scala:
+  - 2.11.11
+script:
+  - mvn scalastyle:check
+  - mvn test 
diff --git a/README.md b/README.md
@@ -10,26 +10,30 @@ SANSA Query is a library to perform queries directly into [Spark](https://spark.
 SANSA uses vertical partitioning (VP) approach and is designed to support extensible partitioning of RDF data. Instead of dealing with a single three-column table (s, p, o), data is partitioned into multiple tables based on the used RDF predicates, RDF term types and literal datatypes. The first column of these tables is always a string representing the subject. The second column always represents the literal value as a Scala/Java datatype. Tables for storing literals with language tags have an additional third string column for the language tag. Its uses [Sparqlify](https://github.com/AKSW/Sparqlify) as a scalable SPARQL-SQL rewriter.
 
 ### SANSA Query Spark
-On SANSA Query Spark the method for partitioning an `RDD[Triple]` is located in [RdfPartitionUtilsSpark](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-spark-parent/sansa-rdf-spark-core/src/main/scala/net/sansa_stack/rdf/spark/partition/core/RdfPartitionUtilsSpark.scala). It uses an [RdfPartitioner](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartitioner.scala) which maps a Triple to a single [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartition.scala) instance.
+On SANSA Query Spark the method for partitioning an `RDD[Triple]` is located in [RdfPartitionUtilsSpark](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-spark/src/main/scala/net/sansa_stack/rdf/spark/partition/core/RdfPartitionUtilsSpark.scala). It uses an [RdfPartitioner](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartitioner.scala) which maps a Triple to a single [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartition.scala) instance.
 
-* [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/core/RdfPartition.scala) - as the name suggests, represents a partition of the RDF data and defines two methods:
+* [RdfPartition](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/core/RdfPartition.scala) - as the name suggests, represents a partition of the RDF data and defines two methods:
   * `matches(Triple): Boolean`: This method is used to test whether a triple fits into a partition.
-  * `layout: TripleLayout`: This method returns the [TripleLayout](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/layout/TripleLayout.scala) associated with the partition, as explained below.
+  * `layout: TripleLayout`: This method returns the [TripleLayout](https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/layout/TripleLayout.scala) associated with the partition, as explained below.
   * Furthermore, RdfPartitions are expected to be serializable, and to define equals and hash code.
 * TripleLayout instances are used to obtain framework-agnostic compact tabular representations of triples according to a partition. For this purpose it defines the two methods:
   * `fromTriple(triple: Triple): Product`: This method must, for a given triple, return its representation as a [Product](https://www.scala-lang.org/files/archive/api/2.11.8/index.html#scala.Product) (this is the super class of all Scala tuples)
   * `schema: Type`: This method must return the exact Scala type of the objects returned by `fromTriple`, such as `typeOf[Tuple2[String,Double]]`. Hence, layouts are expected to only yield instances of one specific type.
 
-See the [available layouts](https://github.com/SANSA-Stack/SANSA-RDF/tree/develop/sansa-rdf-partition-parent/sansa-rdf-partition-core/src/main/scala/net/sansa_stack/rdf/partition/layout) for details.
+See the [available layouts](https://github.com/SANSA-Stack/SANSA-RDF/tree/develop/sansa-rdf-common/src/main/scala/net/sansa_stack/rdf/common/partition/layout) for details.
 
 ## Usage
 
 The following Scala code shows how to query an RDF file SPARQL syntax (be it a local file or a file residing in HDFS):
 ```scala
 
-val graphRdd = NTripleReader.load(spark, new File("path/to/rdf.nt"))
+val spark: SparkSession = ...
 
-val partitions = RdfPartitionUtilsSpark.partitionGraph(graphRdd)
+val lang = Lang.NTRIPLES
+val triples = spark.rdf(lang)("path/to/rdf.nt")
+
+
+val partitions = RdfPartitionUtilsSpark.partitionGraph(triples)
 val rewriter = SparqlifyUtils3.createSparqlSqlRewriter(spark, partitions)
 
 val qef = new QueryExecutionFactorySparqlifySpark(spark, rewriter)
@@ -38,6 +42,8 @@ val port = 7531
 val server = FactoryBeanSparqlServer.newInstance.setSparqlServiceFactory(qef).setPort(port).create()
 server.join()
 
-
 ```
 An overview is given in the [FAQ section of the SANSA project page](http://sansa-stack.net/faq/#sparql-queries). Further documentation about the builder objects can also be found on the [ScalaDoc page](http://sansa-stack.net/scaladocs/).
+
+## How to Contribute
+We always welcome new contributors to the project! Please see [our contribution guide](http://sansa-stack.net/contributing-to-sansa/) for more details on how to get started contributing to SANSA.
diff --git a/bundle-scaladocs.sh b/bundle-scaladocs.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+
+targetFolder="target/scaladocs-bundle"
+mkdir -p "$targetFolder"
+
+for srcFolder in `find . -type d -name scaladocs`; do
+  moduleName=$(basename $(dirname $(dirname $(dirname "$srcFolder"))))
+
+  if [[ "$moduleName" == "." ]]; then
+    moduleName="sansa-parent"
+  fi
+
+  cp -rf "$srcFolder" "$targetFolder/$moduleName"
+#  echo "$ --- $moduleName";
+done
+
diff --git a/pom.xml b/pom.xml
@@ -4,7 +4,7 @@
 
 	<groupId>net.sansa-stack</groupId>
 	<artifactId>sansa-query-parent_2.11</artifactId>
-	<version>0.3.0</version>
+	<version>0.4.0</version>
 	<packaging>pom</packaging>
 
 	<name>SANSA Stack - Query Layer - Parent</name>
@@ -17,32 +17,28 @@
 		<url>http://sda.tech</url>
 	</organization>
 
-	<modules>
-		<module>sansa-query-spark-parent</module>
-		<module>sansa-query-flink-parent</module>
-	</modules>
 
 	<properties>
 		<maven.compiler.source>1.8</maven.compiler.source>
 		<maven.compiler.target>1.8</maven.compiler.target>
 		<encoding>UTF-8</encoding>
 
-		<sansa.version>0.3.0</sansa.version>
+		<sansa.version>0.4.0</sansa.version>
 
 		<scala.version>2.11.11</scala.version>
 		<scala.binary.version>2.11</scala.binary.version>
 		<scala.classifier>${scala.binary.version}</scala.classifier>
 
 		<scala.version.suffix>_${scala.binary.version}</scala.version.suffix>
 
-		<spark.version>2.2.1</spark.version>
-		<flink.version>1.4.0</flink.version>
+		<spark.version>2.3.1</spark.version>
+		<flink.version>1.5.0</flink.version>
 
-		<jena.version>3.5.0</jena.version>
-		<jsa.subversion>2</jsa.subversion>
+		<jena.version>3.7.0</jena.version>
+		<jsa.subversion>3</jsa.subversion>
 
 		<jsa.version>${jena.version}-${jsa.subversion}</jsa.version>
-
+		<scalastyle.config.path>${project.basedir}/scalastyle-config.xml</scalastyle.config.path>
 
 		<httpcomponents.version>4.5.3</httpcomponents.version>
 	</properties>
@@ -91,6 +87,7 @@
 		</developer>
 	</developers>
 
+
 	<profiles>
 		<profile>
 			<id>doclint-java8-disable</id>
@@ -146,6 +143,20 @@
 				</plugins>
 			</build>
 		</profile>
+
+		<!-- profile necessary for Scalastyle plugin to find the conf file -->
+		<profile>
+			<id>root-dir</id>
+			<activation>
+				<file>
+					<exists>${project.basedir}/../../scalastyle-config.xml</exists>
+				</file>
+			</activation>
+			<properties>
+				<scalastyle.config.path>${project.basedir}/../scalastyle-config.xml</scalastyle.config.path>
+			</properties>
+		</profile>
+
 	</profiles>
 
 	<repositories>
@@ -196,6 +207,39 @@
 	<dependencyManagement>
 		<dependencies>
 
+
+			<dependency>
+				<groupId>${project.groupId}</groupId>
+				<artifactId>sansa-rdf-common${scala.version.suffix}</artifactId>
+				<version>${sansa.version}</version>
+			</dependency>
+
+			<dependency>
+				<groupId>${project.groupId}</groupId>
+				<artifactId>sansa-rdf-spark${scala.version.suffix}</artifactId>
+				<version>${sansa.version}</version>
+			</dependency>
+
+			<dependency>
+				<groupId>${project.groupId}</groupId>
+				<artifactId>sansa-rdf-flink${scala.version.suffix}</artifactId>
+				<version>${sansa.version}</version>
+			</dependency>
+
+
+			<dependency>
+				<groupId>${project.groupId}</groupId>
+				<artifactId>sansa-query-spark${scala.version.suffix}</artifactId>
+				<version>${project.version}</version>
+			</dependency>
+
+			<dependency>
+				<groupId>org.apache.spark</groupId>
+				<artifactId>spark-graphx_${scala.binary.version}</artifactId>
+				<version>${spark.version}</version>
+			</dependency>
+
+
 			<!-- http components -->
 			<dependency>
 				<groupId>org.apache.httpcomponents</groupId>
@@ -235,12 +279,21 @@
 				<version>${scala.version}</version>
 			</dependency>
 
+
+			<!-- Benchmarking bsbm and visualization of the results -->
 			<dependency>
 				<groupId>org.aksw.bsbm</groupId>
 				<artifactId>bsbm-jsa</artifactId>
-				<version>3.1.1</version>
+				<version>3.1.2</version>
+			</dependency>
+
+			<dependency>
+				<groupId>org.aksw.beast</groupId>
+				<artifactId>beast-bundle</artifactId>
+				<version>1.0.0</version>
 			</dependency>
 
+
 			<dependency>
 				<groupId>com.google.guava</groupId>
 				<artifactId>guava</artifactId>
@@ -258,7 +311,7 @@
 			<dependency>
 				<groupId>org.aksw.sparqlify</groupId>
 				<artifactId>sparqlify-core</artifactId>
-				<version>0.8.3</version>
+				<version>0.8.5</version>
 				<exclusions>
 					<exclusion>
 						<groupId>org.aksw.sparqlify</groupId>
@@ -275,25 +328,6 @@
 				</exclusions>
 			</dependency>
 
-			<dependency>
-				<groupId>net.sansa-stack</groupId>
-				<artifactId>sansa-rdf-common-partition${scala.version.suffix}</artifactId>
-				<version>${sansa.version}</version>
-			</dependency>
-
-			<dependency>
-				<groupId>net.sansa-stack</groupId>
-				<artifactId>sansa-rdf-test-resources${scala.version.suffix}</artifactId>
-				<version>${sansa.version}</version>
-			</dependency>
-
-			<dependency>
-				<groupId>${project.groupId}</groupId>
-				<artifactId>sansa-rdf-partition-sparqlify${scala.version.suffix}</artifactId>
-				<version>${sansa.version}</version>
-			</dependency>
-
-
 			<dependency>
 				<groupId>org.aksw.jena-sparql-api</groupId>
 				<artifactId>jena-sparql-api-server-standalone</artifactId>
@@ -313,6 +347,13 @@
 				<version>3.0.3</version>
 			</dependency>
 
+			<dependency>
+				<groupId>com.holdenkarau</groupId>
+				<artifactId>spark-testing-base_${scala.binary.version}</artifactId>
+				<version>2.1.0_0.6.0</version>
+				<scope>test</scope>
+			</dependency>
+
 			<dependency>
 				<groupId>junit</groupId>
 				<artifactId>junit</artifactId>
@@ -569,6 +610,66 @@
 					</configuration>
 				</plugin>
 
+				<!--This plugin's configuration is used to store Eclipse m2e settings 
+					only. It has no influence on the Maven build itself. -->
+				<plugin>
+					<groupId>org.eclipse.m2e</groupId>
+					<artifactId>lifecycle-mapping</artifactId>
+					<version>1.0.0</version>
+					<configuration>
+						<lifecycleMappingMetadata>
+							<pluginExecutions>
+								<pluginExecution>
+									<pluginExecutionFilter>
+										<groupId>
+											net.alchim31.maven
+										</groupId>
+										<artifactId>
+											scala-maven-plugin
+										</artifactId>
+										<versionRange>
+											[3.3.1,)
+										</versionRange>
+										<goals>
+											<goal>testCompile</goal>
+											<goal>compile</goal>
+											<goal>add-source</goal>
+										</goals>
+									</pluginExecutionFilter>
+									<action>
+										<ignore></ignore>
+									</action>
+								</pluginExecution>
+							</pluginExecutions>
+						</lifecycleMappingMetadata>
+					</configuration>
+				</plugin>
+
+				<!-- Scalastyle -->
+				<plugin>
+					<groupId>org.scalastyle</groupId>
+					<artifactId>scalastyle-maven-plugin</artifactId>
+					<version>1.0.0</version>
+					<configuration>
+						<verbose>false</verbose>
+						<failOnViolation>true</failOnViolation>
+						<includeTestSourceDirectory>true</includeTestSourceDirectory>
+						<failOnWarning>false</failOnWarning>
+						<sourceDirectory>${project.basedir}/src/main/scala</sourceDirectory>
+						<testSourceDirectory>${project.basedir}/src/test/scala</testSourceDirectory>
+						<!-- we use a central config located in the root directory -->
+						<configLocation>${scalastyle.config.path}</configLocation>
+						<outputFile>${project.basedir}/scalastyle-output.xml</outputFile>
+						<outputEncoding>UTF-8</outputEncoding>
+					</configuration>
+					<executions>
+						<execution>
+							<goals>
+								<goal>check</goal>
+							</goals>
+						</execution>
+					</executions>
+				</plugin>
 			</plugins>
 		</pluginManagement>
 	</build>
@@ -592,4 +693,9 @@
 		</snapshotRepository>
 	</distributionManagement>
 
+	<modules>
+		<module>sansa-query-common</module>
+		<module>sansa-query-flink</module>
+		<module>sansa-query-spark</module>
+	</modules>
 </project>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -51,3 +51,6 @@ hs_err_pid*
		stat*.txt

		.idea

		scalastyle-output.xml