Releases: databricks/spark-xml
Releases · databricks/spark-xml
Version 0.8.0
New Features
- Support for validating XML rows against an XSD
from_xml
for parsing an existing column or string to a structschema_of_xml
for inferring schema of XML in a string column
Changes: https://github.com/databricks/spark-xml/milestone/5?closed=1
Version 0.7.0
Fixes
- Important fix to XML writing, which could cause newlines to be inserted in the wrong place in output (#417)
- Ignore XML processing instructions, which otherwise fail parsing (#412)
- Ignore text children in mixed text/element nodes, instead of parsing element incorrectly (#416)
Changes: https://github.com/databricks/spark-xml/milestone/4?closed=1
Version 0.6.0
Fixes:
- Fixed an error that could cause records to be dropped when uncompressed files are read and XML tags happen to span an input split boundary, but fit within the stream read buffer (#400)
- Fixed issue with nested tags names in attributes (#374)
Improvements:
inferSchema
can now be set tofalse
during parsing to leave all values as string type (#393)- Also treat empty values as null if the nullValue is "" (#381)
- Log malformed records for debugging (#372)
Changes: https://github.com/databricks/spark-xml/issues?utf8=%E2%9C%93&q=milestone%3A0.6.0+is%3Aclosed+
Version 0.5.0
Spark-xml 0.5.0 include many bug fixes but also following
Improvements :
- Partial results support #358, #368 and #370
- XML self-closing tag support #352
- Scala 2.12 support #343
- Hadoop 2.9+ support #282
- Add an option
ignoreSurroundingSpaces
to allow to trim spaces between values #237
Removals, Behavior Changes and Deprecations
- Scala 2.10 drop #343
Issues Closed
https://github.com/databricks/spark-xml/milestone/1?closed=1
Version 0.4.1
Version 0.3.5
Version 0.4.0
Spark-xml 0.4.0 adds following
Features:
- Support for PERMISSIVE/DROPMALFORMED mode and corrupt record option - #107
Removals, Behavior Changes and Deprecations
- Deprecate saveAsXmlFile and promote the usage of write() - #150
- Deprecate xmlFile and promote the usage of read() - #150
- Drop 1.x compatibility from 0.4.0 - #150
- Make not supporting UserDefinedType as it became private - #150
- Change default values for valueTag and attributePrefix to
_
and_VALUE
- #142
Version 0.3.4
XML Data Source 0.3.4 adds following
Improvements:
- Produces correct order of columns for nested rows when user specifies a schema - #125
- No value in nested struct causes arrayIndexOutOfBounds - #121 by @lokm01
compression
aslias forcodec
option - #145- Remove dead codes - #144
- Fix nested element with name of parent bug - #161 by @mattroberts297
- Do not allow empty strings for attributePrefix, valueTag and rowTag - #170
- Add missed other default case when parsing/inferring XML documents - #166
- Minor documentation changes - #159 by @mattroberts297 and #143 by @anastasia
Version 0.3.3
XML Data Source 0.3.3 adds following
Improvements:
- Parse elements in array having attributes correctly
- Parse correctly duplicated
valueTag
field in few special cases - Parse non-existing element in an array as
null
- Support to parse XML documents when the datatypes in the same elements are structural data type and non-structural data type
- Ignore comments
- Improvement of documentation
Version 0.3.2
Spark-xml 0.3.2 adds following
Improvements:
- Fix a bug in type inference for empty values in structual types
- Performance improvement
- Support for parsing correctly when structural data types are specified
- Parse long characters within tags
- Added some more tests
- Parse correctly even if some attributes exist sparsely
- Ignore namespaces
- Improvement of documentation