Merge tag 'tags/2.0.4' into v2-integration

Kubeflow Pipelines 2.0.4 release
kubeflow · Dec 4, 2023 · 6f0dabd · 6f0dabd
2 parents 94ca63d + a226de2
commit 6f0dabd
Show file tree

Hide file tree

Showing 139 changed files with 3,394 additions and 1,297 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -3,6 +3,9 @@ version: 2
 sphinx:
   configuration: docs/conf.py
 python:
-  version: 3.7
   install:
     - requirements: docs/requirements.txt
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.7"
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,43 @@
 # Changelog
 
+### [2.0.4](https://github.com/kubeflow/pipelines/compare/2.0.3...2.0.4) (2023-12-01)
+
+
+### Features
+
+* **components:** [endpoint_batch_predict] add retry function for retryable error ([a54ac75](https://github.com/kubeflow/pipelines/commit/a54ac75c8ce94919969e939ec6d93d7959f2bd35))
+* **components:** [third party model inference] Initialize component ([1e089e6](https://github.com/kubeflow/pipelines/commit/1e089e6f6d2529bd396f4cdd27a0cad3bf7824a6))
+* **components:** Add ability to tune chat model with `preview.llm.rlhf_pipeline` ([f67cbfa](https://github.com/kubeflow/pipelines/commit/f67cbfa81f7d4caf63879e5e544ad55c0f3d7940))
+* **components:** Add chat dataset preprocessor to `preview.llm.infer_pipeline` ([d8f2c14](https://github.com/kubeflow/pipelines/commit/d8f2c140cecf0e1b9a5437e8eb46731e89cc6b41))
+* **components:** add environment variable support to GCPC's `create_custom_training_job_from_component` ([91f50da](https://github.com/kubeflow/pipelines/commit/91f50da23505b680e77201c14a5ffaf1824bc919))
+* **components:** add LLM Eval pipeline parameter for customizing eval dataset reference ground truth field ([f6aad5e](https://github.com/kubeflow/pipelines/commit/f6aad5e4e0448583490ef688ecb23af74caf4dab))
+* **components:** Add tuning dataset support to LLM Eval Text Generation and Text Classification Pipelines. Include new LLM Eval Preprocessor component to both pipelines ([b55ed6e](https://github.com/kubeflow/pipelines/commit/b55ed6e4b4ffced24a7fbd020406c4909df84381))
+* **components:** Create new eval dataset preprocessor for formatting eval dataset ([6cfad2b](https://github.com/kubeflow/pipelines/commit/6cfad2b348649341211a8a9cad7ea9329c130fae))
+* **components:** Edit embedding pipeline to use generic batch predict component to support multilingual embeddings model ([397b1c9](https://github.com/kubeflow/pipelines/commit/397b1c97be1ef3baf37b664ca56ee119f9bbe7d3))
+* **components:** Enable endpoint_batch_predict component to take in publisher model by either model_address + model_id or just model_id ([1f69834](https://github.com/kubeflow/pipelines/commit/1f698349f14871b85c4e68fcc1ed475220eaf9e6))
+* **components:** Fix batch prediction model parameters payload sanitization error(batch prediction job) ([fb4512d](https://github.com/kubeflow/pipelines/commit/fb4512dc0ac0201b900932fd1598f4d382f039ee))
+* **components:** Group `preview.llm.rlhf_pipeline` components for more readability ([c23b720](https://github.com/kubeflow/pipelines/commit/c23b720f1058bc44ea41a3e4bcdfdc4e3505c47f))
+* **components:** Group `preview.llm.rlhf_pipeline` components for more readability ([bcd5922](https://github.com/kubeflow/pipelines/commit/bcd59220f4cee29b317cc209e0e006dee0a258a8))
+* **components:** Group `preview.llm.rlhf_pipeline` components for more readability ([a927984](https://github.com/kubeflow/pipelines/commit/a9279843946183429f6572516acee6523de36e53))
+* **components:** Update image tag used by RLHF components ([4a5cbbf](https://github.com/kubeflow/pipelines/commit/4a5cbbfb8d5ccf721fc29c61605deb8df7926750))
+* **sdk:** add executor output path and executor input message placeholders ([\#10240](https://github.com/kubeflow/pipelines/issues/10240)) ([d3323c0](https://github.com/kubeflow/pipelines/commit/d3323c06f3d5a66323dd8fb2eb06eb4a0924476b))
+* **sdk:** add local execution config #localexecution ([\#10234](https://github.com/kubeflow/pipelines/issues/10234)) ([0d7913c](https://github.com/kubeflow/pipelines/commit/0d7913ce4ed35fe762ba5021dee2d4b09b5efca9))
+* **sdk:** support `.after()` referencing task in `ParallelFor` group ([\#10257](https://github.com/kubeflow/pipelines/issues/10257)) ([11f60d8](https://github.com/kubeflow/pipelines/commit/11f60d813a3bbf5549c993a8384771be37d337e5))
+
+
+### Bug Fixes
+
+* **backend:** Pipeline and PipelineVersion Description column value should be optional. ([\#10205](https://github.com/kubeflow/pipelines/issues/10205)) ([0948561](https://github.com/kubeflow/pipelines/commit/0948561fdabc6dff29c3171ed69614e4b9b9061a))
+* **components:** fix GCPC AutoMLImageTrainingJobRunOp ModuleNotFoundError ([9f278f3](https://github.com/kubeflow/pipelines/commit/9f278f3682662b24b46be2d9ef4a783bcc1f9b0c))
+* **frontend:** Support running old v1 pipeline. Fix [\#10153](https://github.com/kubeflow/pipelines/issues/10153) ([\#10276](https://github.com/kubeflow/pipelines/issues/10276)) ([f5cb2d7](https://github.com/kubeflow/pipelines/commit/f5cb2d7d6f9dd60be047d77486cd861ad9b0293c))
+
+
+### Other Pull Requests
+
+* feat(components):[text2sql] Integration with first party LLM model inference pipeline ([71e5a93](https://github.com/kubeflow/pipelines/commit/71e5a938efeebebeb88c3077e1b9d88097e60c9b))
+* No public description ([a8dd311](https://github.com/kubeflow/pipelines/commit/a8dd3117d5e07656b19bccdd4a0cac8860b9c2fd))
+* feat(components):[text2sql] Generate SQL queries by model batch prediction ([2910d0b](https://github.com/kubeflow/pipelines/commit/2910d0bb5276daf5aeb79d6ba7c09f7856b899e4))
+
 ### [2.0.3](https://github.com/kubeflow/pipelines/compare/2.0.2...2.0.3) (2023-10-27)
 
 

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-2.0.3
+2.0.4
diff --git a/api/v2alpha1/python/setup.py b/api/v2alpha1/python/setup.py
@@ -25,7 +25,7 @@
     author_email='[email protected]',
     url='https://github.com/kubeflow/pipelines',
     packages=setuptools.find_namespace_packages(include=['kfp.*']),
-    python_requires='>=3.7.0',
+    python_requires='>=3.7.0,<3.13.0',
     install_requires=['protobuf>=3.13.0,<4'],
     include_package_data=True,
     license='Apache 2.0',

diff --git a/backend/api/v1beta1/python_http_client/README.md b/backend/api/v1beta1/python_http_client/README.md
@@ -3,8 +3,8 @@ This file contains REST API specification for Kubeflow Pipelines. The file is au
 
 This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
 
-- API version: 2.0.3
-- Package version: 2.0.3
+- API version: 2.0.4
+- Package version: 2.0.4
 - Build package: org.openapitools.codegen.languages.PythonClientCodegen
 For more information, please visit [https://www.google.com](https://www.google.com)
 

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/__init__.py b/backend/api/v1beta1/python_http_client/kfp_server_api/__init__.py
@@ -14,7 +14,7 @@
 
 from __future__ import absolute_import
 
-__version__ = "2.0.3"
+__version__ = "2.0.4"
 
 # import apis into sdk package
 from kfp_server_api.api.experiment_service_api import ExperimentServiceApi

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/api_client.py b/backend/api/v1beta1/python_http_client/kfp_server_api/api_client.py
@@ -78,7 +78,7 @@ def __init__(self, configuration=None, header_name=None, header_value=None,
             self.default_headers[header_name] = header_value
         self.cookie = cookie
         # Set default User-Agent.
-        self.user_agent = 'OpenAPI-Generator/2.0.3/python'
+        self.user_agent = 'OpenAPI-Generator/2.0.4/python'
         self.client_side_validation = configuration.client_side_validation
 
     def __enter__(self):

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/configuration.py b/backend/api/v1beta1/python_http_client/kfp_server_api/configuration.py
@@ -351,8 +351,8 @@ def to_debug_report(self):
         return "Python SDK Debug Report:\n"\
                "OS: {env}\n"\
                "Python Version: {pyversion}\n"\
-               "Version of the API: 2.0.3\n"\
-               "SDK Package Version: 2.0.3".\
+               "Version of the API: 2.0.4\n"\
+               "SDK Package Version: 2.0.4".\
                format(env=sys.platform, pyversion=sys.version)
 
     def get_host_settings(self):

diff --git a/backend/api/v1beta1/python_http_client/setup.py b/backend/api/v1beta1/python_http_client/setup.py
@@ -13,7 +13,7 @@
 from setuptools import setup, find_packages  # noqa: H301
 
 NAME = "kfp-server-api"
-VERSION = "2.0.3"
+VERSION = "2.0.4"
 # To install the library, run the following
 #
 # python setup.py install

diff --git a/backend/api/v1beta1/swagger/kfp_api_single_file.swagger.json b/backend/api/v1beta1/swagger/kfp_api_single_file.swagger.json
@@ -2,7 +2,7 @@
   "swagger": "2.0",
   "info": {
     "title": "Kubeflow Pipelines API",
-    "version": "2.0.3",
+    "version": "2.0.4",
     "description": "This file contains REST API specification for Kubeflow Pipelines. The file is autogenerated from the swagger definition.",
     "contact": {
       "name": "google",

diff --git a/backend/api/v2beta1/python_http_client/README.md b/backend/api/v2beta1/python_http_client/README.md
@@ -3,8 +3,8 @@ This file contains REST API specification for Kubeflow Pipelines. The file is au
 
 This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
 
-- API version: 2.0.3
-- Package version: 2.0.3
+- API version: 2.0.4
+- Package version: 2.0.4
 - Build package: org.openapitools.codegen.languages.PythonClientCodegen
 For more information, please visit [https://www.google.com](https://www.google.com)
 

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/__init__.py b/backend/api/v2beta1/python_http_client/kfp_server_api/__init__.py
@@ -14,7 +14,7 @@
 
 from __future__ import absolute_import
 
-__version__ = "2.0.3"
+__version__ = "2.0.4"
 
 # import apis into sdk package
 from kfp_server_api.api.auth_service_api import AuthServiceApi

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/api_client.py b/backend/api/v2beta1/python_http_client/kfp_server_api/api_client.py
@@ -78,7 +78,7 @@ def __init__(self, configuration=None, header_name=None, header_value=None,
             self.default_headers[header_name] = header_value
         self.cookie = cookie
         # Set default User-Agent.
-        self.user_agent = 'OpenAPI-Generator/2.0.3/python'
+        self.user_agent = 'OpenAPI-Generator/2.0.4/python'
         self.client_side_validation = configuration.client_side_validation
 
     def __enter__(self):

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/configuration.py b/backend/api/v2beta1/python_http_client/kfp_server_api/configuration.py
@@ -351,8 +351,8 @@ def to_debug_report(self):
         return "Python SDK Debug Report:\n"\
                "OS: {env}\n"\
                "Python Version: {pyversion}\n"\
-               "Version of the API: 2.0.3\n"\
-               "SDK Package Version: 2.0.3".\
+               "Version of the API: 2.0.4\n"\
+               "SDK Package Version: 2.0.4".\
                format(env=sys.platform, pyversion=sys.version)
 
     def get_host_settings(self):

diff --git a/backend/api/v2beta1/python_http_client/setup.py b/backend/api/v2beta1/python_http_client/setup.py
@@ -13,7 +13,7 @@
 from setuptools import setup, find_packages  # noqa: H301
 
 NAME = "kfp-server-api"
-VERSION = "2.0.3"
+VERSION = "2.0.4"
 # To install the library, run the following
 #
 # python setup.py install

diff --git a/backend/api/v2beta1/swagger/kfp_api_single_file.swagger.json b/backend/api/v2beta1/swagger/kfp_api_single_file.swagger.json
@@ -2,7 +2,7 @@
   "swagger": "2.0",
   "info": {
     "title": "Kubeflow Pipelines API",
-    "version": "2.0.3",
+    "version": "2.0.4",
     "description": "This file contains REST API specification for Kubeflow Pipelines. The file is autogenerated from the swagger definition.",
     "contact": {
       "name": "google",

diff --git a/backend/src/apiserver/model/pipeline.go b/backend/src/apiserver/model/pipeline.go
@@ -34,7 +34,7 @@ type Pipeline struct {
 	UUID           string `gorm:"column:UUID; not null; primary_key;"`
 	CreatedAtInSec int64  `gorm:"column:CreatedAtInSec; not null;"`
 	Name           string `gorm:"column:Name; not null; unique_index:namespace_name;"` // Index improves performance of the List ang Get queries
-	Description    string `gorm:"column:Description; not null; size:65535;"`           // Same as below, set size to large number so it will be stored as longtext
+	Description    string `gorm:"column:Description; size:65535;"`                     // Same as below, set size to large number so it will be stored as longtext
 	// TODO(gkcalat): this is deprecated. Consider removing and adding data migration logic at the server startup.
 	Parameters string         `gorm:"column:Parameters; size:65535;"`
 	Status     PipelineStatus `gorm:"column:Status; not null;"`

diff --git a/backend/src/apiserver/model/pipeline_version.go b/backend/src/apiserver/model/pipeline_version.go
@@ -42,7 +42,7 @@ type PipelineVersion struct {
 	Status PipelineVersionStatus `gorm:"column:Status; not null;"`
 	// Code source url links to the pipeline version's definition in repo.
 	CodeSourceUrl   string `gorm:"column:CodeSourceUrl;"`
-	Description     string `gorm:"column:Description; not null; size:65535;"`     // Set size to large number so it will be stored as longtext
+	Description     string `gorm:"column:Description; size:65535;"`               // Set size to large number so it will be stored as longtext
 	PipelineSpec    string `gorm:"column:PipelineSpec; not null; size:33554432;"` // Same as common.MaxFileLength (32MB in server). Argo imposes 700kB limit
 	PipelineSpecURI string `gorm:"column:PipelineSpecURI; not null; size:65535;"` // Can store references to ObjectStore files
 }

diff --git a/components/google-cloud/Dockerfile b/components/google-cloud/Dockerfile
@@ -44,7 +44,7 @@ RUN pip3 install -U "fsspec>=0.7.4" "gcsfs>=0.6.0" "pandas<=1.3.5" "scikit-learn
 RUN pip3 install -U google-cloud-notebooks
 
 # Install main package
-RUN pip3 install "git+https://github.com/kubeflow/pipelines.git@google-cloud-pipeline-components-2.5.0#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"
+RUN pip3 install "git+https://github.com/kubeflow/pipelines.git@google-cloud-pipeline-components-2.6.0#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"
 
 # Note that components can override the container entry ponint.
 ENTRYPOINT ["python3","-m","google_cloud_pipeline_components.container.v1.aiplatform.remote_runner"]
diff --git a/components/google-cloud/RELEASE.md b/components/google-cloud/RELEASE.md
@@ -1,4 +1,19 @@
 ## Upcoming release
+* Fix `v1.automl.training_job.AutoMLImageTrainingJobRunOp` `ModuleNotFoundError`
+
+
+## Release 2.6.0
+* Bump supported KFP versions to kfp>=2.0.0b10,<=2.4.0
+* Add LLM Eval pipeline parameter for customizing eval dataset reference ground truth field
+* Create new eval dataset preprocessor for formatting eval dataset in tuning dataset format.
+* Support customizing eval dataset format in Eval LLM Text Generation Pipeline (`preview.model_evaluation.evaluation_llm_text_generation_pipeline`) and LLM Text Classification Pipeline (`preview.model_evaluation.evaluation_llm_classification_pipeline`). Include new LLM Eval Preprocessor component in both pipelines.
+* Fix the output parameter `output_dir` of `preview.automl.vision.DataConverterJobOp`.
+* Fix batch prediction model parameters payload sanitization error .
+* Add ability to perform inference with chat datasets to `preview.llm.infer_pipeline`.
+* Add ability to tune chat models with `preview.llm.rlhf_pipeline`.
+* Group `preview.llm.rlhf_pipeline` components for better readability.
+* Add environment variable support to GCPC's `create_custom_training_job_from_component` (both `v1` and `preview` namespaces)
+* Apply latest GCPC image vulnerability resolutions (base OS and software updates).
 
 ## Release 2.5.0
 * Upload tensorboard metrics from `preview.llm.rlhf_pipeline` if a `tensorboard_resource_id` is provided at runtime.

diff --git a/components/google-cloud/docs/source/versions.json b/components/google-cloud/docs/source/versions.json
@@ -1,4 +1,9 @@
 [
+  {
+    "version": "https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.6.0",
+    "title": "2.6.0",
+    "aliases": []
+  },
   {
     "version": "https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.5.0",
     "title": "2.5.0",

diff --git a/components/google-cloud/google_cloud_pipeline_components/__init__.py b/components/google-cloud/google_cloud_pipeline_components/__init__.py
@@ -15,6 +15,8 @@
 import sys
 import warnings
 
+from google_cloud_pipeline_components.version import __version__
+
 if sys.version_info < (3, 8):
   warnings.warn(
       (

diff --git a/...nts/google-cloud/google_cloud_pipeline_components/_implementation/llm/deployment_graph.py b/...nts/google-cloud/google_cloud_pipeline_components/_implementation/llm/deployment_graph.py
@@ -0,0 +1,111 @@
+# Copyright 2023 The Kubeflow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Graph component for uploading and deploying a tuned model adapter."""
+
+import json
+from typing import NamedTuple, Optional
+
+from google_cloud_pipeline_components import _placeholders
+from google_cloud_pipeline_components._implementation.llm import deploy_llm_model
+from google_cloud_pipeline_components._implementation.llm import function_based
+from google_cloud_pipeline_components._implementation.llm import upload_llm_model
+import kfp
+
+PipelineOutput = NamedTuple(
+    'Outputs', model_resource_name=str, endpoint_resource_name=str
+)
+
+
+@kfp.dsl.pipeline(
+    name='llm-deployment-graph',
+    description='Uploads a tuned model and deploys it to an endpoint.',
+)
+def pipeline(
+    output_adapter_path: str,
+    large_model_reference: str,
+    model_display_name: Optional[str] = None,
+    deploy_model: bool = True,
+) -> PipelineOutput:
+  # fmt: off
+  """Uploads a tuned language model and (optionally) deploys it to an endpoint.
+
+  Args:
+    output_adapter_path: Path to the trained model adapter if LoRA tuning was used.
+    large_model_reference: Name of the base model. Supported values are `text-bison@001`, `t5-small`, `t5-large`, `t5-xl` and `t5-xxl`. `text-bison@001` and `t5-small` are supported in `us-central1` and `europe-west4`. `t5-large`, `t5-xl` and `t5-xxl` are only supported in `europe-west4`.
+    model_display_name: Name of the fine-tuned model shown in the Model Registry. If not provided, a default name will be created.
+    deploy_model: Whether to deploy the model to an endpoint in `us-central1`. Default is True.
+
+  Returns:
+    model_resource_name: Path to the model uploaded to the Model Registry. This will be an empty string if the model was not deployed.
+    endpoint_resource_name: Path the Online Prediction Endpoint. This will be an empty string if the model was not deployed.
+  """
+  # fmt: on
+  upload_location = 'us-central1'
+  adapter_artifact = kfp.dsl.importer(
+      artifact_uri=output_adapter_path,
+      artifact_class=kfp.dsl.Artifact,
+  ).set_display_name('Import Tuned Adapter')
+
+  regional_endpoint = function_based.resolve_regional_endpoint(
+      upload_location=upload_location
+  ).set_display_name('Resolve Regional Endpoint')
+
+  display_name = function_based.resolve_model_display_name(
+      large_model_reference=large_model_reference,
+      model_display_name=model_display_name,
+  ).set_display_name('Resolve Model Display Name')
+
+  reference_model_metadata = function_based.resolve_reference_model_metadata(
+      large_model_reference=large_model_reference,
+  ).set_display_name('Resolve Model Metadata')
+
+  upload_model = function_based.resolve_upload_model(
+      large_model_reference=reference_model_metadata.outputs[
+          'large_model_reference'
+      ]
+  ).set_display_name('Resolve Upload Model')
+  upload_task = (
+      upload_llm_model.upload_llm_model(
+          project=_placeholders.PROJECT_ID_PLACEHOLDER,
+          location=upload_location,
+          regional_endpoint=regional_endpoint.output,
+          artifact_uri=adapter_artifact.output,
+          model_display_name=display_name.output,
+          model_reference_name='text-bison@001',
+          upload_model=upload_model.output,
+      )
+      .set_env_variable(
+          name='VERTEX_AI_PIPELINES_RUN_LABELS',
+          value=json.dumps({'tune-type': 'rlhf'}),
+      )
+      .set_display_name('Upload Model')
+  )
+  deploy_model = function_based.resolve_deploy_model(
+      deploy_model=deploy_model,
+      large_model_reference=reference_model_metadata.outputs[
+          'large_model_reference'
+      ],
+  ).set_display_name('Resolve Deploy Model')
+  deploy_task = deploy_llm_model.create_endpoint_and_deploy_model(
+      project=_placeholders.PROJECT_ID_PLACEHOLDER,
+      location=upload_location,
+      model_resource_name=upload_task.outputs['model_resource_name'],
+      display_name=display_name.output,
+      regional_endpoint=regional_endpoint.output,
+      deploy_model=deploy_model.output,
+  ).set_display_name('Deploy Model')
+  return PipelineOutput(
+      model_resource_name=upload_task.outputs['model_resource_name'],
+      endpoint_resource_name=deploy_task.outputs['endpoint_resource_name'],
+  )
diff --git a/components/google-cloud/google_cloud_pipeline_components/_implementation/llm/env.py b/components/google-cloud/google_cloud_pipeline_components/_implementation/llm/env.py
@@ -16,7 +16,7 @@
 
 
 def get_private_image_tag() -> str:
-  return os.getenv('PRIVATE_IMAGE_TAG', '20231010_1107_RC00')
+  return os.getenv('PRIVATE_IMAGE_TAG', '20231031_0507_RC00')
 
 
 def get_use_test_machine_spec() -> bool: