Skip to content

Commit

Permalink
[SPARK-50657][PYTHON] Upgrade the minimum version of pyarrow to 11.0.0
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Upgrade the minimum version of `pyarrow` to 11.0.0

### Why are the changes needed?
according to my test in apache#49267, pyspark with `pyarrow=10.0.0` has already been broken

- pyspark-sql failed
- pyspark-connect failed
- pyspark-pandas failed

see https://github.com/zhengruifeng/spark/actions/runs/12464102622/job/34787749014

### Does this PR introduce _any_ user-facing change?
doc changes

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#49282 from zhengruifeng/mini_arrow_11.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
zhengruifeng authored and dongjoon-hyun committed Dec 25, 2024
1 parent ef4be07 commit 9c9bdab
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 8 deletions.
2 changes: 1 addition & 1 deletion dev/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ py4j>=0.10.9.7

# PySpark dependencies (optional)
numpy>=1.21
pyarrow>=10.0.0
pyarrow>=11.0.0
six==1.16.0
pandas>=2.0.0
scipy
Expand Down
6 changes: 3 additions & 3 deletions python/docs/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ Installable with ``pip install "pyspark[connect]"``.
Package Supported version Note
========================== ================= ==========================
`pandas` >=2.0.0 Required for Spark Connect
`pyarrow` >=10.0.0 Required for Spark Connect
`pyarrow` >=11.0.0 Required for Spark Connect
`grpcio` >=1.67.0 Required for Spark Connect
`grpcio-status` >=1.67.0 Required for Spark Connect
`googleapis-common-protos` >=1.65.0 Required for Spark Connect
Expand All @@ -223,7 +223,7 @@ Installable with ``pip install "pyspark[sql]"``.
Package Supported version Note
========= ================= ======================
`pandas` >=2.0.0 Required for Spark SQL
`pyarrow` >=10.0.0 Required for Spark SQL
`pyarrow` >=11.0.0 Required for Spark SQL
========= ================= ======================

Additional libraries that enhance functionality but are not included in the installation packages:
Expand All @@ -240,7 +240,7 @@ Installable with ``pip install "pyspark[pandas_on_spark]"``.
Package Supported version Note
========= ================= ================================
`pandas` >=2.0.0 Required for Pandas API on Spark
`pyarrow` >=10.0.0 Required for Pandas API on Spark
`pyarrow` >=11.0.0 Required for Pandas API on Spark
========= ================= ================================

Additional libraries that enhance functionality but are not included in the installation packages:
Expand Down
2 changes: 1 addition & 1 deletion python/docs/source/migration_guide/pyspark_upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Upgrading from PySpark 3.5 to 4.0
* In Spark 4.0, Python 3.8 support was dropped in PySpark.
* In Spark 4.0, the minimum supported version for Pandas has been raised from 1.0.5 to 2.0.0 in PySpark.
* In Spark 4.0, the minimum supported version for Numpy has been raised from 1.15 to 1.21 in PySpark.
* In Spark 4.0, the minimum supported version for PyArrow has been raised from 4.0.0 to 10.0.0 in PySpark.
* In Spark 4.0, the minimum supported version for PyArrow has been raised from 4.0.0 to 11.0.0 in PySpark.
* In Spark 4.0, ``Int64Index`` and ``Float64Index`` have been removed from pandas API on Spark, ``Index`` should be used directly.
* In Spark 4.0, ``DataFrame.iteritems`` has been removed from pandas API on Spark, use ``DataFrame.items`` instead.
* In Spark 4.0, ``Series.iteritems`` has been removed from pandas API on Spark, use ``Series.items`` instead.
Expand Down
2 changes: 1 addition & 1 deletion python/packaging/classic/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ def _supports_symlinks():
# python/packaging/connect/setup.py
_minimum_pandas_version = "2.0.0"
_minimum_numpy_version = "1.21"
_minimum_pyarrow_version = "10.0.0"
_minimum_pyarrow_version = "11.0.0"
_minimum_grpc_version = "1.67.0"
_minimum_googleapis_common_protos_version = "1.65.0"

Expand Down
2 changes: 1 addition & 1 deletion python/packaging/connect/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@
# python/packaging/classic/setup.py
_minimum_pandas_version = "2.0.0"
_minimum_numpy_version = "1.21"
_minimum_pyarrow_version = "10.0.0"
_minimum_pyarrow_version = "11.0.0"
_minimum_grpc_version = "1.59.3"
_minimum_googleapis_common_protos_version = "1.56.4"

Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/pandas/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def require_minimum_pandas_version() -> None:
def require_minimum_pyarrow_version() -> None:
"""Raise ImportError if minimum version of pyarrow is not installed"""
# TODO(HyukjinKwon): Relocate and deduplicate the version specification.
minimum_pyarrow_version = "10.0.0"
minimum_pyarrow_version = "11.0.0"

import os

Expand Down

0 comments on commit 9c9bdab

Please sign in to comment.