Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.75 KB

README.md

File metadata and controls

30 lines (22 loc) · 1.75 KB

Spark SQL Avro Library

A library for querying Avro data with Spark SQL.

Version

This is the version 0.1 of https://github.com/databricks/spark-avro with a fix for a bug with Spark 1.2.0 and Scala 2.10.4 for Long Type not supported:

java.lang.RuntimeException: Unsupported type LONG
    at scala.sys.package$.error(package.scala:27)
    at com.databricks.spark.avro.AvroRelation.com$databricks$spark$avro$AvroRelation$$toSqlType(AvroRelation.scala:116)
    at com.databricks.spark.avro.AvroRelation$$anonfun$5.apply(AvroRelation.scala:97)
    at com.databricks.spark.avro.AvroRelation$$anonfun$5.apply(AvroRelation.scala:96)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

Building

This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package from the project root.

As an alternative you can import the classes under the path src/main/scala/com/databricks/spark/avro inside your project and build with Maven read this article to see how create a project to build a Spark Scala jar.

Using with Spark

This library requires Spark 1.2+.

The jar file produced above can be added to a spark using the --jars command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --jars ../sql-avro/target/scala-2.10/sql-avro_2.10-0.1.jar