Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Support float16 in writing/reading parquet #32728

Closed
asfimport opened this issue Aug 18, 2022 · 12 comments
Closed

[C++] Support float16 in writing/reading parquet #32728

asfimport opened this issue Aug 18, 2022 · 12 comments

Comments

@asfimport
Copy link
Collaborator

Half-float values are not supported in Parquet. Here is a previous issue that talks about that: https://issues.apache.org/jira/browse/PARQUET-1647

So, this will not work:

import pyarrow as pa
import pyarrow.parquet as pq
import numpy as np
arr = pa.array(np.float16([0.1, 2.2, 3]))
table = pa.table({'a': arr})
pq.write_table(table, "test_halffloat.parquet") 

This is a proposal to store float16 values in Parquet as FixedSizeBinary, and then restore them to float16 when reading them back in.

Reporter: Anja Boskovic / @anjakefala
Watchers: Rok Mihevc / @rok

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17464. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
@anjakefala Should we first have the corresponding logical type standardized in the Parquet spec? Inventing our own conventions will not make these files very portable.

@asfimport
Copy link
Collaborator Author

Anja Boskovic / @anjakefala:
That does seem like a reasonable suggestion! For context, this proposal was directly inspired by this PR: #12449

I am new to this community; how do you suggest I propose standardising the logical type in the Parquet spec? Do I open an issue in the Parquet Jira?

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
There's already an old JIRA open apparently: https://issues.apache.org/jira/browse/PARQUET-758

Also, AFAIK format additions are discussed on the parquet dev mailing-list (see e.g. an unrelated proposal I did so in https://lists.apache.org/thread/l15qq12v38w9jnkd6p9mdd11kr0nq3gr).

@asfimport
Copy link
Collaborator Author

Anja Boskovic / @anjakefala:
(y)

Thanks!

@asfimport
Copy link
Collaborator Author

Anja Boskovic / @anjakefala:
The ML thread with the conversation on adding float16 to the Parquet spec is here: https://lists.apache.org/thread/03vmcj7ygwvsbno764vd1hr954p62zr5

@asfimport
Copy link
Collaborator Author

Anja Boskovic / @anjakefala:
And the proposal to the Parquet repo is here: apache/parquet-format#184

@asfimport
Copy link
Collaborator Author

Rok Mihevc / @rok:
I'm not sure if this is the best forum for this question, but should we consider bfloat16 (and perhaps other non-IEEE defined types)? See https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
I don't think so, or at least not for now.

@asfimport
Copy link
Collaborator Author

Apache Arrow JIRA Bot:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@asfimport
Copy link
Collaborator Author

Anja Boskovic / @anjakefala:
An update! Parquet-1222 (https://issues.apache.org/jira/browse/PARQUET-1222) which was a blocker for adding float16 support to parquet, has been merged.

@anjakefala
Copy link
Collaborator

The Parquet spec updated is ready for discussion. The next step is to have Java and C++ implementations for float16 ready.

@assignUser
Copy link
Member

assignUser commented Jan 25, 2024

Closed with #37582 the above example works with pyarrow 15.0.0 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants