merge kfp 2.0.3

kubeflow · Oct 27, 2023 · c697471 · c697471
2 parents e735b67 + 58ce09e
commit c697471
Show file tree

Hide file tree

Showing 124 changed files with 6,666 additions and 1,037 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,47 @@
 # Changelog
 
+### [2.0.3](https://github.com/kubeflow/pipelines/compare/2.0.2...2.0.3) (2023-10-27)
+
+
+### Features
+
+* **backend:** Support consuming parent DAG input artifact ([\#10162](https://github.com/kubeflow/pipelines/issues/10162)) ([52f5cf5](https://github.com/kubeflow/pipelines/commit/52f5cf51c4a6c233aae57125561c0fc95c4fd20f))
+* **backend:** Update driver and launcher images ([\#10164](https://github.com/kubeflow/pipelines/issues/10164)) ([c0093ec](https://github.com/kubeflow/pipelines/commit/c0093ecef6bc5f056efa135d019267327115d79d))
+* **components:** [endpoint_batch_predict] Initialize component ([0d75611](https://github.com/kubeflow/pipelines/commit/0d7561199751e83b4d7e1603c3d32d4088a7e208))
+* **components:** [text2sql] Generate column names by model batch predict ([1bee8be](https://github.com/kubeflow/pipelines/commit/1bee8be071a91f44c0129837c381863327cb337d))
+* **components:** [text2sql] Generate table names by model batch prediction ([ebb4245](https://github.com/kubeflow/pipelines/commit/ebb42450d0b07eaa8de35a3f6b70eacb5f26f0d8))
+* **components:** [text2sql] Implement preprocess component logic ([21079b5](https://github.com/kubeflow/pipelines/commit/21079b5910e597a38b67853f3ecfb3929344371e))
+* **components:** [text2sql] Initialize  preprocess component and integrate with text2sql pipeline ([9aa750e](https://github.com/kubeflow/pipelines/commit/9aa750e62f6e225d037ecdda9bf7cab95f05675d))
+* **components:** [text2sql] Initialize evaluation component ([ea93979](https://github.com/kubeflow/pipelines/commit/ea93979eed02e131bd20180da149b9465670dfe1))
+* **components:** [text2sql] Initialize validate and process component ([633ddeb](https://github.com/kubeflow/pipelines/commit/633ddeb07e9212d2e373dba8d20a0f6d67ab037d))
+* **components:** Add ability to preprocess chat llama datasets to `_implementation.llm.chat_dataset_preprocessor` ([99fd201](https://github.com/kubeflow/pipelines/commit/99fd2017a76660f30d0a04b71542cbef45783633))
+* **components:** Add question_answer support for AutoSxS default instructions ([412216f](https://github.com/kubeflow/pipelines/commit/412216f832a848bfc61ce289aed819d7f2860fdd))
+* **components:** Add sliced evaluation metrics support for custom and unstructured AutoML models in evaluation feature attribution pipeline ([d8a0660](https://github.com/kubeflow/pipelines/commit/d8a0660df525f5695015e507e981bceff836dd3d))
+* **components:** Add sliced evaluation metrics support for custom and unstructured AutoML models in evaluation pipeline ([0487f9a](https://github.com/kubeflow/pipelines/commit/0487f9a8b1d8ab0d96d757bd4b598ffd353ecc81))
+* **components:** add support for customizing model_parameters in LLM eval text generation and LLM eval text classification pipelines ([d53ddda](https://github.com/kubeflow/pipelines/commit/d53dddab1c8a042e58e06ff6eb38be82fefddb0a))
+* **components:** Make `model_checkpoint` optional for `preview.llm.infer_pipeline` ([e8fb699](https://github.com/kubeflow/pipelines/commit/e8fb6990dfdf036c941c522f9b384ff679b38ca6))
+* **components:** migrate `DataflowFlexTemplateJobOp` to GA namespace (now `v1.dataflow.DataflowFlexTemplateJobOp`) ([faba922](https://github.com/kubeflow/pipelines/commit/faba9223ee846d459f7bb497a6faa3c153dcf430))
+* **components:** Set display names for SFT, RLHF and LLM inference pipelines ([1386a82](https://github.com/kubeflow/pipelines/commit/1386a826ba2bcdbc19eb2007ca43f6acd1031e4d))
+* **components:** Support service account in kubeflow model_batch_predict component ([1682ce8](https://github.com/kubeflow/pipelines/commit/1682ce8adeb2c55a155588eae7492b2f0a8b783a))
+* **components:** Update image tag used by llm pipelines ([4d71fda](https://github.com/kubeflow/pipelines/commit/4d71fdac3fc92dd4d54c6be3a28725667b8f3c5e))
+* **sdk:** support a Pythonic artifact authoring style ([\#9932](https://github.com/kubeflow/pipelines/issues/9932)) ([8d00d0e](https://github.com/kubeflow/pipelines/commit/8d00d0eb9a1442ed994b6a90acea88604efc6423))
+* **sdk:** support collecting outputs from conditional branches using `dsl.OneOf` ([\#10067](https://github.com/kubeflow/pipelines/issues/10067)) ([2d3171c](https://github.com/kubeflow/pipelines/commit/2d3171cbfec626055e59b8a58ce83fb54ecad113))
+
+
+### Bug Fixes
+
+* **components:** [text2sql] Turn model_inference_results_path to model_inference_results_directory and remove duplicate comment ([570e56d](https://github.com/kubeflow/pipelines/commit/570e56dd09af32e173cf041eed7497e4533ec186))
+* **frontend:** Replace twitter artifactory endpoint with npm endpoint. ([\#10099](https://github.com/kubeflow/pipelines/issues/10099)) ([da6a360](https://github.com/kubeflow/pipelines/commit/da6a3601468282c0592eae8e89a3d97b982e2d43))
+* **sdk:** fix bug when `dsl.importer` argument is provided by loop variable ([\#10116](https://github.com/kubeflow/pipelines/issues/10116)) ([73d51c8](https://github.com/kubeflow/pipelines/commit/73d51c8a23afad97efb6d7e7436c081fa22ce24d))
+* **sdk:** Fix OOB for IPython and refactor. Closes [\#10075](https://github.com/kubeflow/pipelines/issues/10075). ([\#10094](https://github.com/kubeflow/pipelines/issues/10094)) ([c903271](https://github.com/kubeflow/pipelines/commit/c9032716ab2013df56cb1078a703d48ed8e36fb4))
+* **sdk:** type annotation for client credentials ([\#10158](https://github.com/kubeflow/pipelines/issues/10158)) ([02e00e8](https://github.com/kubeflow/pipelines/commit/02e00e8439e9753dbf82856ac9c5a7cec8ce3243))
+
+
+### Other Pull Requests
+
+* feat(components) Extend kserve component ([\#10136](https://github.com/kubeflow/pipelines/issues/10136)) ([2054b7c](https://github.com/kubeflow/pipelines/commit/2054b7c45d4831c787115563c8be0048abcb9be1))
+* No public description ([0e240db](https://github.com/kubeflow/pipelines/commit/0e240db39799cb0afbd8c7f982ffdd4f9eb58121))
+
 ### [2.0.2](https://github.com/kubeflow/pipelines/compare/2.0.0...2.0.2) (2023-10-11)
 
 

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-2.0.2
+2.0.3
diff --git a/backend/api/v1beta1/python_http_client/README.md b/backend/api/v1beta1/python_http_client/README.md
@@ -3,8 +3,8 @@ This file contains REST API specification for Kubeflow Pipelines. The file is au
 
 This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
 
-- API version: 2.0.2
-- Package version: 2.0.2
+- API version: 2.0.3
+- Package version: 2.0.3
 - Build package: org.openapitools.codegen.languages.PythonClientCodegen
 For more information, please visit [https://www.google.com](https://www.google.com)
 

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/__init__.py b/backend/api/v1beta1/python_http_client/kfp_server_api/__init__.py
@@ -14,7 +14,7 @@
 
 from __future__ import absolute_import
 
-__version__ = "2.0.2"
+__version__ = "2.0.3"
 
 # import apis into sdk package
 from kfp_server_api.api.experiment_service_api import ExperimentServiceApi

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/api_client.py b/backend/api/v1beta1/python_http_client/kfp_server_api/api_client.py
@@ -78,7 +78,7 @@ def __init__(self, configuration=None, header_name=None, header_value=None,
             self.default_headers[header_name] = header_value
         self.cookie = cookie
         # Set default User-Agent.
-        self.user_agent = 'OpenAPI-Generator/2.0.2/python'
+        self.user_agent = 'OpenAPI-Generator/2.0.3/python'
         self.client_side_validation = configuration.client_side_validation
 
     def __enter__(self):

diff --git a/backend/api/v1beta1/python_http_client/kfp_server_api/configuration.py b/backend/api/v1beta1/python_http_client/kfp_server_api/configuration.py
@@ -351,8 +351,8 @@ def to_debug_report(self):
         return "Python SDK Debug Report:\n"\
                "OS: {env}\n"\
                "Python Version: {pyversion}\n"\
-               "Version of the API: 2.0.2\n"\
-               "SDK Package Version: 2.0.2".\
+               "Version of the API: 2.0.3\n"\
+               "SDK Package Version: 2.0.3".\
                format(env=sys.platform, pyversion=sys.version)
 
     def get_host_settings(self):

diff --git a/backend/api/v1beta1/python_http_client/setup.py b/backend/api/v1beta1/python_http_client/setup.py
@@ -13,7 +13,7 @@
 from setuptools import setup, find_packages  # noqa: H301
 
 NAME = "kfp-server-api"
-VERSION = "2.0.2"
+VERSION = "2.0.3"
 # To install the library, run the following
 #
 # python setup.py install

diff --git a/backend/api/v1beta1/swagger/kfp_api_single_file.swagger.json b/backend/api/v1beta1/swagger/kfp_api_single_file.swagger.json
@@ -2,7 +2,7 @@
   "swagger": "2.0",
   "info": {
     "title": "Kubeflow Pipelines API",
-    "version": "2.0.2",
+    "version": "2.0.3",
     "description": "This file contains REST API specification for Kubeflow Pipelines. The file is autogenerated from the swagger definition.",
     "contact": {
       "name": "google",

diff --git a/backend/api/v2beta1/python_http_client/README.md b/backend/api/v2beta1/python_http_client/README.md
@@ -3,8 +3,8 @@ This file contains REST API specification for Kubeflow Pipelines. The file is au
 
 This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
 
-- API version: 2.0.2
-- Package version: 2.0.2
+- API version: 2.0.3
+- Package version: 2.0.3
 - Build package: org.openapitools.codegen.languages.PythonClientCodegen
 For more information, please visit [https://www.google.com](https://www.google.com)
 

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/__init__.py b/backend/api/v2beta1/python_http_client/kfp_server_api/__init__.py
@@ -14,7 +14,7 @@
 
 from __future__ import absolute_import
 
-__version__ = "2.0.2"
+__version__ = "2.0.3"
 
 # import apis into sdk package
 from kfp_server_api.api.auth_service_api import AuthServiceApi

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/api_client.py b/backend/api/v2beta1/python_http_client/kfp_server_api/api_client.py
@@ -78,7 +78,7 @@ def __init__(self, configuration=None, header_name=None, header_value=None,
             self.default_headers[header_name] = header_value
         self.cookie = cookie
         # Set default User-Agent.
-        self.user_agent = 'OpenAPI-Generator/2.0.2/python'
+        self.user_agent = 'OpenAPI-Generator/2.0.3/python'
         self.client_side_validation = configuration.client_side_validation
 
     def __enter__(self):

diff --git a/backend/api/v2beta1/python_http_client/kfp_server_api/configuration.py b/backend/api/v2beta1/python_http_client/kfp_server_api/configuration.py
@@ -351,8 +351,8 @@ def to_debug_report(self):
         return "Python SDK Debug Report:\n"\
                "OS: {env}\n"\
                "Python Version: {pyversion}\n"\
-               "Version of the API: 2.0.2\n"\
-               "SDK Package Version: 2.0.2".\
+               "Version of the API: 2.0.3\n"\
+               "SDK Package Version: 2.0.3".\
                format(env=sys.platform, pyversion=sys.version)
 
     def get_host_settings(self):

diff --git a/backend/api/v2beta1/python_http_client/setup.py b/backend/api/v2beta1/python_http_client/setup.py
@@ -13,7 +13,7 @@
 from setuptools import setup, find_packages  # noqa: H301
 
 NAME = "kfp-server-api"
-VERSION = "2.0.2"
+VERSION = "2.0.3"
 # To install the library, run the following
 #
 # python setup.py install

diff --git a/backend/api/v2beta1/swagger/kfp_api_single_file.swagger.json b/backend/api/v2beta1/swagger/kfp_api_single_file.swagger.json
@@ -2,7 +2,7 @@
   "swagger": "2.0",
   "info": {
     "title": "Kubeflow Pipelines API",
-    "version": "2.0.2",
+    "version": "2.0.3",
     "description": "This file contains REST API specification for Kubeflow Pipelines. The file is autogenerated from the swagger definition.",
     "contact": {
       "name": "google",

diff --git a/backend/src/v2/compiler/argocompiler/argo.go b/backend/src/v2/compiler/argocompiler/argo.go
@@ -116,8 +116,8 @@ func Compile(jobArg *pipelinespec.PipelineJob, kubernetesSpecArg *pipelinespec.S
 		wf:        wf,
 		templates: make(map[string]*wfapi.Template),
 		// TODO(chensun): release process and update the images.
-		driverImage:   "gcr.io/ml-pipeline/kfp-driver@sha256:fa68f52639b4f4683c9f8f468502867c9663823af0fbcff1cbe7847d5374bf5c",
-		launcherImage: "gcr.io/ml-pipeline/kfp-launcher@sha256:6641bf94acaeec03ee7e231241800fce2f0ad92eee25371bd5248ca800a086d7",
+		driverImage:   "gcr.io/ml-pipeline/kfp-driver@sha256:8e60086b04d92b657898a310ca9757631d58547e76bbbb8bfc376d654bef1707",
+		launcherImage: "gcr.io/ml-pipeline/kfp-launcher@sha256:50151a8615c8d6907aa627902dce50a2619fd231f25d1e5c2a72737a2ea4001e",
 		job:           job,
 		spec:          spec,
 		executors:     deploy.GetExecutors(),

diff --git a/backend/src/v2/driver/driver.go b/backend/src/v2/driver/driver.go
@@ -768,7 +768,11 @@ func resolveInputs(ctx context.Context, dag *metadata.DAG, iterationIndex *int,
 	if err != nil {
 		return nil, err
 	}
-	glog.Infof("parent DAG input parameters %+v", inputParams)
+	inputArtifacts, err := mlmd.GetInputArtifactsByExecutionID(ctx, dag.Execution.GetID())
+	if err != nil {
+		return nil, err
+	}
+	glog.Infof("parent DAG input parameters: %+v, artifacts: %+v", inputParams, inputArtifacts)
 	inputs = &pipelinespec.ExecutorInput_Inputs{
 		ParameterValues: make(map[string]*structpb.Value),
 		Artifacts:       make(map[string]*pipelinespec.ArtifactList),
@@ -998,7 +1002,15 @@ func resolveInputs(ctx context.Context, dag *metadata.DAG, iterationIndex *int,
 		}
 		switch t := artifactSpec.Kind.(type) {
 		case *pipelinespec.TaskInputsSpec_InputArtifactSpec_ComponentInputArtifact:
-			return nil, artifactError(fmt.Errorf("component input artifact not implemented yet"))
+			inputArtifactName := artifactSpec.GetComponentInputArtifact()
+			if inputArtifactName == "" {
+				return nil, artifactError(fmt.Errorf("component input artifact key is empty"))
+			}
+			v, ok := inputArtifacts[inputArtifactName]
+			if !ok {
+				return nil, artifactError(fmt.Errorf("parent DAG does not have input artifact %s", inputArtifactName))
+			}
+			inputs.Artifacts[name] = v
 
 		case *pipelinespec.TaskInputsSpec_InputArtifactSpec_TaskOutputArtifact:
 			taskOutput := artifactSpec.GetTaskOutputArtifact()

diff --git a/components/google-cloud/Dockerfile b/components/google-cloud/Dockerfile
@@ -28,7 +28,14 @@ RUN pip3 install -U google-cloud-storage
 RUN pip3 install -U google-api-python-client
 
 # Required by dataflow_launcher
-RUN pip3 install -U "apache_beam[gcp]"
+# Pin to `2.50.0` for compatibility with `google-cloud-aiplatform`, which
+# depends on `shapely<3.0.0dev`.
+# Prefer an exact pin, since GCPC's apache_beam version must match the
+# version the in custom Dataflow worker images for the Dataflow job to succeed.
+# Inexact pins risk that the apache_beam in GCPC drifts away from a
+# user-specified version in the image.
+# From docs: """When running your pipeline, launch the pipeline using the Apache Beam SDK with the same version and language version as the SDK on your custom container image. This step avoids unexpected errors from incompatible dependencies or SDKs.""" https://cloud.google.com/dataflow/docs/guides/using-custom-containers#before_you_begin_2
+RUN pip3 install -U "apache_beam[gcp]==2.50.0"
 
 # Required for sklearn/train_test_split_jsonl
 RUN pip3 install -U "fsspec>=0.7.4" "gcsfs>=0.6.0" "pandas<=1.3.5" "scikit-learn<=1.0.2"
@@ -37,7 +44,7 @@ RUN pip3 install -U "fsspec>=0.7.4" "gcsfs>=0.6.0" "pandas<=1.3.5" "scikit-learn
 RUN pip3 install -U google-cloud-notebooks
 
 # Install main package
-RUN pip3 install "git+https://github.com/kubeflow/pipelines.git@google-cloud-pipeline-components-2.4.1#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"
+RUN pip3 install "git+https://github.com/kubeflow/pipelines.git@google-cloud-pipeline-components-2.5.0#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"
 
 # Note that components can override the container entry ponint.
 ENTRYPOINT ["python3","-m","google_cloud_pipeline_components.container.v1.aiplatform.remote_runner"]
diff --git a/components/google-cloud/RELEASE.md b/components/google-cloud/RELEASE.md
@@ -1,7 +1,19 @@
 ## Upcoming release
+
+## Release 2.5.0
 * Upload tensorboard metrics from `preview.llm.rlhf_pipeline` if a `tensorboard_resource_id` is provided at runtime.
 * Support `incremental_train_base_model`, `parent_model`, `is_default_version`, `model_version_aliases`, `model_version_description` in `AutoMLImageTrainingJobRunOp`.
 * Add `preview.automl.vision` and `DataConverterJobOp`.
+* Set display names for `preview.llm` pipelines.
+* Add sliced evaluation metrics support for custom and unstructured AutoML models in evaluation pipeline and evaluation pipeline with feature attribution.
+* Support `service_account` in `ModelBatchPredictOp`.
+* Release `DataflowFlexTemplateJobOp` to GA namespace (`v1.dataflow.DataflowFlexTemplateJobOp`).
+* Make `model_checkpoint` optional for `preview.llm.infer_pipeline`. If not provided, the base model associated with the `large_model_reference` will be used.
+* Bump `apache_beam[gcp]` version in GCPC container image from `<2.34.0` to `==2.50.0` for compatibility with `google-cloud-aiplatform`, which depends on `shapely<3.0.0dev`. Note: upgrades to `google-cloud-pipeline-components`>=2.5.0 and later may require using a Dataflow worker image with `apache_beam==2.50.0`.
+* Apply latest GCPC image vulnerability resolutions (base OS and software updates)
+* Add support for customizing model_parameters (maxOutputTokens, topK, topP, and
+ temperature) in LLM eval text generation and LLM eval text classification
+  pipelines.
 
 ## Release 2.4.1
 * Disable caching for LLM pipeline tasks that store temporary artifacts.

diff --git a/components/google-cloud/docs/source/versions.json b/components/google-cloud/docs/source/versions.json
@@ -1,4 +1,9 @@
 [
+  {
+    "version": "https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.5.0",
+    "title": "2.5.0",
+    "aliases": []
+  },
   {
     "version": "https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.4.1",
     "title": "2.4.1",

diff --git a/components/google-cloud/google_cloud_pipeline_components/__init__.py b/components/google-cloud/google_cloud_pipeline_components/__init__.py
@@ -12,4 +12,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Google Cloud Pipeline Components."""
-from google_cloud_pipeline_components.version import __version__
+import sys
+import warnings
+
+if sys.version_info < (3, 8):
+  warnings.warn(
+      (
+          'Python 3.7 has reached end-of-life. Google Cloud Pipeline Components'
+          ' will drop support for Python 3.7 on April 23, 2024. To use new'
+          ' versions of the KFP SDK after that date, you will need to upgrade'
+          ' to Python >= 3.8. See https://devguide.python.org/versions/ for'
+          ' more details.'
+      ),
+      FutureWarning,
+      stacklevel=2,
+  )
diff --git a/components/google-cloud/google_cloud_pipeline_components/_implementation/llm/env.py b/components/google-cloud/google_cloud_pipeline_components/_implementation/llm/env.py
@@ -16,7 +16,7 @@
 
 
 def get_private_image_tag() -> str:
-  return os.getenv('PRIVATE_IMAGE_TAG', '20230918_1327_RC00')
+  return os.getenv('PRIVATE_IMAGE_TAG', '20231010_1107_RC00')
 
 
 def get_use_test_machine_spec() -> bool:

diff --git a/...nents/google-cloud/google_cloud_pipeline_components/_implementation/llm/function_based.py b/...nents/google-cloud/google_cloud_pipeline_components/_implementation/llm/function_based.py
@@ -268,6 +268,15 @@ def resolve_reference_model_metadata(
           reward_model_path='gs://vertex-rlhf-restricted/pretrained_models/palm/t5x_otter_pretrain/',
           is_supported=True,
       ),
+      'chat-bison@001': reference_model_metadata(
+          large_model_reference='BISON',
+          reference_model_path=(
+              'gs://vertex-rlhf-restricted/pretrained_models/palm/t5x_bison/'
+          ),
+          reward_model_reference='OTTER',
+          reward_model_path='gs://vertex-rlhf-restricted/pretrained_models/palm/t5x_otter_pretrain/',
+          is_supported=True,
+      ),
       'elephant': reference_model_metadata(
           large_model_reference='ELEPHANT',
           reference_model_path=(
@@ -356,9 +365,14 @@ def generate_default_instruction(
   task = task.lower()
   if task == 'summarization':
     return f'Summarize in less than {target_sequence_length} words.'
+
+  elif task == 'question_answer':
+    return f'Answer the question in less than {target_sequence_length} words.'
+
   else:
     raise ValueError(
-        f'Task not recognized: {task}. Supported tasks are: summarization.'
+        f'Task not recognized: {task}. Supported tasks are: "summarization",'
+        ' "question_answer".'
     )
 
 
@@ -456,3 +470,22 @@ def resolve_upload_model(large_model_reference: str) -> bool:
   if large_model_reference in supported_models:
     return True
   return False
+
+
+@dsl.component(base_image=_image.GCPC_IMAGE_TAG, install_kfp_package=False)
+def resolve_instruction(
+    large_model_reference: str, instruction: Optional[str] = None
+) -> str:
+  """Resolves the instruction to use for a given reference model.
+
+  Args:
+    large_model_reference: Base model tuned by the pipeline.
+    instruction: Instruction provided at runtime.
+
+  Returns:
+    Instruction to use during tokenization based on model type. Returns an empty
+      string for chat models because the instruction is prepended as the default
+      context. Otherwise the original instruction is returned.
+  """
+  instruction = instruction or ''
+  return instruction if 'chat' not in large_model_reference.lower() else ''