Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Back out "Delegate IColumn.fill_null/drop_null to Arrow" #109

Closed
wants to merge 1 commit into from

Conversation

wenleix
Copy link
Contributor

@wenleix wenleix commented Dec 8, 2021

Summary:
Original commit changeset: 14ad956d406f

Original Phabricator Diff: D32770009

The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true

In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero:

import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]

So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502

In contrast, PyArrow 2.0 will make NullBuffer to be null:

>>> pa.__version__
'2.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[None, <pyarrow.lib.Buffer object at 0x7fc4b8031870>]

Differential Revision: D32940474

Summary:
Original commit changeset: 14ad956d406f

Original Phabricator Diff: D32770009

The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true

In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero:
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```
So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502

In contrast, PyArrow 2.0 will make NullBuffer to be null:
```
>>> pa.__version__
'2.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
>>> a.buffers()
[None, <pyarrow.lib.Buffer object at 0x7fc4b8031870>]
```

Differential Revision: D32940474

fbshipit-source-id: 2558dbf181b6adf75a0e3f5a9cfbadac1b3d03a4
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Dec 8, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D32940474

@wenleix
Copy link
Contributor Author

wenleix commented Dec 8, 2021

PR being reverted: #96

facebook-github-bot pushed a commit that referenced this pull request Dec 8, 2021
Summary:
Pull Request resolved: #109

Original commit changeset: 14ad956d406f

Original Phabricator Diff: D32770009

The commit fails GitHub CI: https://github.com/facebookresearch/torcharrow/runs/4448410206?check_suite_focus=true

In PyArrow 6.0, it's legit for an array to have non-null NullBuffer while null_count is zero:
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```
So this triggers https://github.com/facebookincubator/velox/blob/674562b94780b8a895fd291c310778e4de73e7e9/velox/vector/arrow/Bridge.cpp#L499-L502

In contrast, PyArrow 2.0 will make NullBuffer to be null:
```
>>> pa.__version__
'2.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
>>> a.buffers()
[None, <pyarrow.lib.Buffer object at 0x7fc4b8031870>]
```

Reviewed By: OswinC

Differential Revision: D32940474

fbshipit-source-id: 91da656cf20f0c0c3be022e764877d0854d4d87e
@wenleix wenleix closed this in a7a1ed6 Dec 13, 2021
@wenleix
Copy link
Contributor Author

wenleix commented Jan 18, 2022

Related issue: #64

@wenleix wenleix deleted the export-D32940474-to-fbsync branch January 20, 2022 21:18
wenleix added a commit to wenleix/velox that referenced this pull request Jan 28, 2022
Summary:
In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Differential Revision: D33836988

fbshipit-source-id: bdb7277671eaf4fc25dfad46ed5bbe3272569ace
wenleix added a commit to wenleix/velox that referenced this pull request Jan 29, 2022
…r#941)

Summary:
Pull Request resolved: facebookincubator#941

In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Differential Revision: D33836988

fbshipit-source-id: c16fa29d9850f4e5fb73b4421a2d6de55ec4702c
wenleix added a commit to wenleix/velox that referenced this pull request Jan 31, 2022
…r#941)

Summary:
Pull Request resolved: facebookincubator#941

In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Reviewed By: pedroerp

Differential Revision: D33836988

fbshipit-source-id: 6e87a703e9ad6125682ac6241777c492aeb9ca92
wenleix added a commit to wenleix/velox that referenced this pull request Jan 31, 2022
…r#941)

Summary:
Pull Request resolved: facebookincubator#941

In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Reviewed By: pedroerp

Differential Revision: D33836988

fbshipit-source-id: 584ab47801b6d5be5b2461f6a61de7e42a850280
wenleix added a commit to wenleix/velox that referenced this pull request Jan 31, 2022
…r#941)

Summary:
Pull Request resolved: facebookincubator#941

In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Reviewed By: pedroerp

Differential Revision: D33836988

fbshipit-source-id: 5e53ee3249061dcc0205c1a9696e5b1ea974617d
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this pull request Feb 1, 2022
Summary:
Pull Request resolved: #941

In PyArrow 6.0, it's legit for an arrow array to have non-null NullBuffer while null_count is zero (see also pytorch/torcharrow#109) :
```
import pyarrow as pa
>>> pa.__version__
'6.0.0'
>>> a = pa.array([1, 2, None, 3])
>>> a = a.fill_null(12)
>>> a.buffers()
[<pyarrow.lib.Buffer object at 0x7fe688222330>, <pyarrow.lib.Buffer object at 0x7fe6680c2ef0>]
```

User also reports similiar issue when converting arrow array reading from Parquet, see pytorch/torcharrow#146 (comment)

Reviewed By: pedroerp

Differential Revision: D33836988

fbshipit-source-id: 0336f683627960aecf6e6bba29e895994fa8055e
YLGH pushed a commit to YLGH/torcharrow that referenced this pull request May 7, 2022
…ded module (pytorch#109)

Summary:
X-link: pytorch/torchrec#109

Pull Request resolved: pytorch/torchrec#25

X-link: pytorch/torchrec#95

Reviewed By: xing-liu

Differential Revision: D34334373

fbshipit-source-id: 3ead26af5ec3019fde72f391b62329472e0e4b27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants