Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: PyArrow documentation for round trip conversions with pandas #60638

Open
1 task done
johnasiano opened this issue Jan 1, 2025 · 0 comments
Open
1 task done

DOC: PyArrow documentation for round trip conversions with pandas #60638

johnasiano opened this issue Jan 1, 2025 · 0 comments
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@johnasiano
Copy link

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#pyarrow

Documentation problem

This issue was first brought up: #50074

The documentation mentions two different approaches.

First is using StringDtype:

import pandas as pd
import pyarrow as pa
df = pd.DataFrame({"x": ["foo", "bar", "baz"]}, dtype=pd.StringDtype("pyarrow"))
df_pa = pa.Table.from_pandas(df).to_pandas()
pd.testing.assert_frame_equal(df, df_pa)

Second is using ArrowDtype:

import pandas as pd
import pyarrow as pa
df = pd.DataFrame({"x": ["foo", "bar", "baz"]}, dtype=pd.ArrowDtype(pa.string()))
df_pa = pa.Table.from_pandas(df).to_pandas()
pd.testing.assert_frame_equal(df, df_pa)

However these both have assertion errors.

Using astype as shown below doesn't have the assertion error.

import pandas as pd
import pyarrow as pa
df = pd.DataFrame({"x": ["foo", "bar", "baz"]}, dtype="string[pyarrow]")
df_pa = pa.Table.from_pandas(df).to_pandas().astype("string[pyarrow]")
pd.testing.assert_frame_equal(df, df_pa)

The two approaches mentioned in the documentation are also mentioned in the issue from 2022 as working versions / fixes. However I think these approaches may not work with the current version of pandas.

Suggested fix for documentation

Documentation should be updated to reflect the .astype("string[pyarrow]") as possibly being the best practice approach for this situation.

@johnasiano johnasiano added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant