Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: rename python package to google-spark-connect #25

Merged
merged 4 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Dataproc Spark Connect Client
# Google Spark Connect Client

A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/) client with
additional functionalities that allow applications to communicate with a remote Dataproc
Expand All @@ -8,13 +8,13 @@ Spark cluster using the Spark Connect protocol without requiring additional step

.. code-block:: console

pip install dataproc_spark_connect
pip install google_spark_connect

## Uninstall

.. code-block:: console

pip uninstall dataproc_spark_connect
pip uninstall google_spark_connect


## Setup
Expand All @@ -28,12 +28,12 @@ If you are running the client outside of Google Cloud, you must set following en

## Usage

1. Install the latest version of Dataproc Python client and Dataproc Spark Connect modules:
1. Install the latest version of Dataproc Python client and Google Spark Connect modules:

.. code-block:: console

pip install google_cloud_dataproc --force-reinstall
pip install dataproc_spark_connect --force-reinstall
pip install google_spark_connect --force-reinstall

2. Add the required import into your PySpark application or notebook:

Expand Down Expand Up @@ -85,14 +85,14 @@ This will happen even if you are running the client from a non-GCE instance.

.. code-block:: console

VERSION=<version> gsutil cp dist/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl gs://<your_bucket_name>
VERSION=<version> gsutil cp dist/google_spark_connect-${VERSION}-py2.py3-none-any.whl gs://<your_bucket_name>

4. Download the new SDK on Vertex, then uninstall the old version and install the new one.

.. code-block:: console

%%bash
export VERSION=<version>
gsutil cp gs://<your_bucket_name>/dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl .
yes | pip uninstall dataproc_spark_connect
pip install dataproc_spark_connect-${VERSION}-py2.py3-none-any.whl
gsutil cp gs://<your_bucket_name>/google_spark_connect-${VERSION}-py2.py3-none-any.whl .
yes | pip uninstall google_spark_connect
pip install google_spark_connect-${VERSION}-py2.py3-none-any.whl
15 changes: 15 additions & 0 deletions google/cloud/spark_connect/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,19 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib.metadata
import warnings

from .session import GoogleSparkSession

old_package_name = "dataproc-spark-connect"
current_package_name = "google-spark-connect"
try:
importlib.metadata.distribution(old_package_name)
warnings.warn(
f"Package '{old_package_name}' is already installed in your environment. "
f"This might cause conflicts with '{current_package_name}'. "
f"Consider uninstalling '{old_package_name}' and only install '{current_package_name}'."
)
except:
pass
2 changes: 1 addition & 1 deletion google/cloud/spark_connect/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ class GoogleSparkSession(SparkSession):
Examples
--------

Create a Spark session with Dataproc Spark Connect.
Create a Spark session with Google Spark Connect.

>>> spark = (
... GoogleSparkSession.builder
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@


setup(
name="dataproc-spark-connect",
name="google-spark-connect",
version="0.2.0",
description="Dataproc client library for Spark Connect",
description="Google client library for Spark Connect",
long_description=long_description,
author="Google LLC",
url="https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python",
Expand Down