PARQUET-758: Add Float16/Half-float logical type

Type involves a trade-off of reduced precision, in exchange for more efficient storage.
apache · Aug 26, 2022 · bc91163 · bc91163
1 parent 54e53e5
commit bc91163
Show file tree

Hide file tree

Showing 2 changed files with 15 additions and 0 deletions.
diff --git a/LogicalTypes.md b/LogicalTypes.md
@@ -245,6 +245,18 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
+### Half-precision floating-point
+
+Also known as `float16`. Used in contexts where precision is traded off for performance and efficient storage.
+
+It is stored as a two-byte fixed-length binary, following the 16-bit IEEE standards:
+
+* 1 sign bit
+* 5 bits exponent
+* 10 bits mantissa/significand
+
+sign (mantissa) * 10^exponent
+
 ## Temporal Types
 
 ### DATE

diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
@@ -34,6 +34,7 @@ enum Type {
   INT32 = 1;
   INT64 = 2;
   INT96 = 3;  // deprecated, only used by legacy implementations.
+  FLOAT16 = 2;
   FLOAT = 4;
   DOUBLE = 5;
   BYTE_ARRAY = 6;
@@ -416,6 +417,7 @@ enum Encoding {
    * BOOLEAN - 1 bit per value. 0 is false; 1 is true.
    * INT32 - 4 bytes per value.  Stored as little-endian.
    * INT64 - 8 bytes per value.  Stored as little-endian.
+   * FLOAT16 - 2 bytes per value. IEEE. Stored as little-endian.
    * FLOAT - 4 bytes per value.  IEEE. Stored as little-endian.
    * DOUBLE - 8 bytes per value.  IEEE. Stored as little-endian.
    * BYTE_ARRAY - 4 byte length stored as little endian, followed by bytes.
@@ -889,6 +891,7 @@ union ColumnOrder {
    *   INT32 - signed comparison
    *   INT64 - signed comparison
    *   INT96 (only used for legacy timestamps) - undefined
+   *   FLOAT16 - signed comparison of the represented value (*)
    *   FLOAT - signed comparison of the represented value (*)
    *   DOUBLE - signed comparison of the represented value (*)
    *   BYTE_ARRAY - unsigned byte-wise comparison