This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 80
Back out "Delegate IColumn.fill_null/drop_null
to Arrow"
#109
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Original commit changeset: 14ad956d406f Original Phabricator Diff: D32770009 The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero: ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502 In contrast, PyArrow 2.0 will make NullBuffer to be null: ``` >>> pa.__version__ '2.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() >>> a.buffers() [None, <pyarrow.lib.Buffer object at 0x7fc4b8031870>] ``` Differential Revision: D32940474 fbshipit-source-id: 2558dbf181b6adf75a0e3f5a9cfbadac1b3d03a4
This pull request was exported from Phabricator. Differential Revision: D32940474 |
PR being reverted: #96 |
facebook-github-bot
pushed a commit
that referenced
this pull request
Dec 8, 2021
Summary: Pull Request resolved: #109 Original commit changeset: 14ad956d406f Original Phabricator Diff: D32770009 The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero: ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502 In contrast, PyArrow 2.0 will make NullBuffer to be null: ``` >>> pa.__version__ '2.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() >>> a.buffers() [None, <pyarrow.lib.Buffer object at 0x7fc4b8031870>] ``` Reviewed By: OswinC Differential Revision: D32940474 fbshipit-source-id: 91da656cf20f0c0c3be022e764877d0854d4d87e
Related issue: #64 |
wenleix
added a commit
to wenleix/velox
that referenced
this pull request
Jan 28, 2022
Summary: In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Differential Revision: D33836988 fbshipit-source-id: bdb7277671eaf4fc25dfad46ed5bbe3272569ace
wenleix
added a commit
to wenleix/velox
that referenced
this pull request
Jan 29, 2022
…r#941) Summary: Pull Request resolved: facebookincubator#941 In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Differential Revision: D33836988 fbshipit-source-id: c16fa29d9850f4e5fb73b4421a2d6de55ec4702c
wenleix
added a commit
to wenleix/velox
that referenced
this pull request
Jan 31, 2022
…r#941) Summary: Pull Request resolved: facebookincubator#941 In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Reviewed By: pedroerp Differential Revision: D33836988 fbshipit-source-id: 6e87a703e9ad6125682ac6241777c492aeb9ca92
wenleix
added a commit
to wenleix/velox
that referenced
this pull request
Jan 31, 2022
…r#941) Summary: Pull Request resolved: facebookincubator#941 In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Reviewed By: pedroerp Differential Revision: D33836988 fbshipit-source-id: 584ab47801b6d5be5b2461f6a61de7e42a850280
wenleix
added a commit
to wenleix/velox
that referenced
this pull request
Jan 31, 2022
…r#941) Summary: Pull Request resolved: facebookincubator#941 In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Reviewed By: pedroerp Differential Revision: D33836988 fbshipit-source-id: 5e53ee3249061dcc0205c1a9696e5b1ea974617d
facebook-github-bot
pushed a commit
to facebookincubator/velox
that referenced
this pull request
Feb 1, 2022
Summary: Pull Request resolved: #941 In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) : ``` import pyarrow as pa >>> pa.__version__ '6.0.0' >>> a = pa.array([1, 2, None, 3]) >>> a = a.fill_null(12) >>> a.buffers() [<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>] ``` User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment) Reviewed By: pedroerp Differential Revision: D33836988 fbshipit-source-id: 0336f683627960aecf6e6bba29e895994fa8055e
YLGH
pushed a commit
to YLGH/torcharrow
that referenced
this pull request
May 7, 2022
…ded module (pytorch#109) Summary: X-link: pytorch/torchrec#109 Pull Request resolved: pytorch/torchrec#25 X-link: pytorch/torchrec#95 Reviewed By: xing-liu Differential Revision: D34334373 fbshipit-source-id: 3ead26af5ec3019fde72f391b62329472e0e4b27
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Original commit changeset: 14ad956d406f
Original Phabricator Diff: D32770009
The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true
In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero:
So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502
In contrast, PyArrow 2.0 will make NullBuffer to be null:
Differential Revision: D32940474