Skip to content

Commit

Permalink
PARQUET-758: Add Float16/Half-float logical type
Browse files Browse the repository at this point in the history
Type involves a trade-off of reduced precision,
in exchange for more efficient storage.
  • Loading branch information
anjakefala committed Aug 26, 2022
1 parent 54e53e5 commit bc91163
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
12 changes: 12 additions & 0 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,18 @@ comparison.
To support compatibility with older readers, implementations of parquet-format should
write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.

### Half-precision floating-point

Also known as `float16`. Used in contexts where precision is traded off for performance and efficient storage.

It is stored as a two-byte fixed-length binary, following the 16-bit IEEE standards:

* 1 sign bit
* 5 bits exponent
* 10 bits mantissa/significand

sign (mantissa) * 10^exponent

## Temporal Types

### DATE
Expand Down
3 changes: 3 additions & 0 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ enum Type {
INT32 = 1;
INT64 = 2;
INT96 = 3; // deprecated, only used by legacy implementations.
FLOAT16 = 2;
FLOAT = 4;
DOUBLE = 5;
BYTE_ARRAY = 6;
Expand Down Expand Up @@ -416,6 +417,7 @@ enum Encoding {
* BOOLEAN - 1 bit per value. 0 is false; 1 is true.
* INT32 - 4 bytes per value. Stored as little-endian.
* INT64 - 8 bytes per value. Stored as little-endian.
* FLOAT16 - 2 bytes per value. IEEE. Stored as little-endian.
* FLOAT - 4 bytes per value. IEEE. Stored as little-endian.
* DOUBLE - 8 bytes per value. IEEE. Stored as little-endian.
* BYTE_ARRAY - 4 byte length stored as little endian, followed by bytes.
Expand Down Expand Up @@ -889,6 +891,7 @@ union ColumnOrder {
* INT32 - signed comparison
* INT64 - signed comparison
* INT96 (only used for legacy timestamps) - undefined
* FLOAT16 - signed comparison of the represented value (*)
* FLOAT - signed comparison of the represented value (*)
* DOUBLE - signed comparison of the represented value (*)
* BYTE_ARRAY - unsigned byte-wise comparison
Expand Down

0 comments on commit bc91163

Please sign in to comment.