Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] support for Arrow's float16 #2362

Closed
asfimport opened this issue Sep 3, 2019 · 6 comments
Closed

[Java] support for Arrow's float16 #2362

asfimport opened this issue Sep 3, 2019 · 6 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 3, 2019

DESCRIPTION

 

I'm wondering if there's any interest in supporting Arrow's float16 type in Parquet.

There seem to be one or two float16 / halffloat tickets here (e.g., PARQUET-1403) but nothing that speaks to adding half-float support to Parquet in-general.

PLANS

I'm able to spend some time on this, if someone points me  in the right direction.

 

  1. Add the HALFFLOAT or FLOAT16 enum (any preferred naming convention?) to https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32

  2. Add HALFFLOAT to org.apache.parquet.schema.PrimitiveType

  3. Add HALFFLOAT support to org.apache.parquet.arrow.schema.SchemaConverter

  4. Add encoding for new type at org.apache.parquet.column.Encoding

  5. ??

    If anyone has any interest in this, pointers, or comments, they would be greatly appreciated!

Reporter: The Alchemist
Assignee: Jiashen Zhang / @zhangjiashen

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-1647. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

JAVIER ANDRES RECASENS SANCHEZ:
Any updates regarding this issue? We are very interested in float16 support. Thanks!

@asfimport
Copy link
Collaborator Author

Orestis:
[~the_alchemist] Thank you for the initiative. Is there any update for this issue? 

@asfimport
Copy link
Collaborator Author

The Alchemist:
[~jrecasens] , [~orecoupa] :

Unfortunately, I have moved on and don't have the time to work float16 Parquet support.

@asfimport
Copy link
Collaborator Author

JAVIER ANDRES RECASENS SANCHEZ:
[~the_alchemist] thanks for the update.

 

Is there anyone that could help with this?

@asfimport
Copy link
Collaborator Author

Ben Harkins / @benibus:
I'm currently working on this, so feel free to assign me. Although it's probably worth mentioning that the current plan is to implement this as a logical type in accordance with the proposal PR for PARQUET-758, which deviates from some of the plan in this issue's description.

@asfimport
Copy link
Collaborator Author

Freddy Fostvedt:
Thanks for putting effort into this @benibus , this is a very valuable piece at my place of work. It will save very significant costs on data processing / training cost if we can reduce memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant