Skip to content

Commit

Permalink
Migrate prs entry script (#2991)
Browse files Browse the repository at this point in the history
# Description

Please add an informative description that covers that changes made by
the pull request and link all relevant issues.

# All Promptflow Contribution checklist:
- [x] **The pull request does not introduce [breaking changes].**
- [x] **CHANGELOG is updated for new features, bug fixes or other
significant changes.**
- [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).**
- [x] **Create an issue and link to the pull request to get dedicated
review from promptflow team. Learn more: [suggested
workflow](../CONTRIBUTING.md#suggested-workflow).**

## General Guidelines and Best Practices
- [x] Title of the pull request is clear and informative.
- [ ] There are a small number of commits, each of which have an
informative message. This means that previously merged commits do not
appear in the history of the PR. For more information on cleaning up the
commits in your PR, [see this
page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md).

### Testing Guidelines
- [x] Pull request includes test coverage for the included changes.

---------

Co-authored-by: junanchen <[email protected]>
  • Loading branch information
jac86 and junanchen authored May 22, 2024
1 parent 04c0eb8 commit f17b01e
Show file tree
Hide file tree
Showing 54 changed files with 1,951 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,10 @@
"piezo",
"Piezo",
"cmpop",
"finalizer",
"finalizers",
"amlbi",
"cmpop",
"omap",
"Machinal",
"azureopenaimodelconfiguration",
Expand Down
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
/src/promptflow-azure/ @microsoft/promptflow-sdk
/src/promptflow-recording/ @microsoft/promptflow-sdk
/src/promptflow-evals/ @microsoft/pf-eval-team-contributor
/src/promptflow-parallel/ @microsoft/promptflow-sdk
/src/promptflow-rag/ @microsoft/promptflow-rag

/scripts/docs/ @microsoft/promptflow-sdk
Expand Down
2 changes: 2 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ promptflow-azure:
- src/promptflow-azure/**
promptflow-evals:
- src/promptflow-evals/**
promptflow-parallel:
- src/promptflow-parallel/**
promptflow:
- src/promptflow/**
promptflow-tools:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/promptflow-executor-e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ on:
- src/promptflow-tracing/promptflow/**
- src/promptflow-core/promptflow/**
- src/promptflow-devkit/promptflow/**
- src/promptflow-parallel/promptflow/**
- scripts/building/**
- src/promptflow-recording/recordings/local/executor_node_cache.*
- .github/workflows/promptflow-executor-e2e-test.yml
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/promptflow-executor-unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ on:
- src/promptflow-tracing/promptflow/**
- src/promptflow-core/promptflow/**
- src/promptflow-devkit/promptflow/**
- src/promptflow-parallel/promptflow/**
- scripts/building/**
- src/promptflow-recording/recordings/local/executor_node_cache.*
- .github/workflows/promptflow-executor-unit-test.yml
Expand Down
87 changes: 87 additions & 0 deletions .github/workflows/promptflow-parallel-e2e-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: promptflow-parallel-e2e-test

on:
schedule:
- cron: "40 10 * * *" # 2:40 PST every day
pull_request:
paths:
- src/promptflow/**
- src/promptflow-core/**
- src/promptflow-tracing/**
- src/promptflow-parallel/**
- .github/workflows/promptflow-parallel-e2e-test.yml
workflow_dispatch:

permissions:
id-token: write
contents: read

env:
IS_IN_CI_PIPELINE: "true"
WORKING_DIRECTORY: ${{ github.workspace }}/src/promptflow-parallel

jobs:
parallel-e2e-test:
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-13]
python-version: ['3.8', '3.9', '3.10', '3.11']
fail-fast: false
# snok/install-poetry need this to support Windows
defaults:
run:
shell: bash
environment:
internal
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: set test mode
# Always run in replay mode for now until we figure out the test resource to run live mode
run: echo "PROMPT_FLOW_TEST_MODE=replay" >> $GITHUB_ENV
#run: echo "PROMPT_FLOW_TEST_MODE=$(if [[ "${{ github.event_name }}" == "pull_request" ]]; then echo replay; else echo live; fi)" >> $GITHUB_ENV
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: snok/install-poetry@v1
- name: install promptflow packages in editable mode
run: |
set -xe
poetry install --with ci,test
poetry run pip show promptflow-tracing
poetry run pip show promptflow-core
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: run e2e tests
run: poetry run pytest -m e2etest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: Upload Test Results
if: always()
uses: actions/upload-artifact@v3
with:
name: Test Results (Python ${{ matrix.python-version }}) (OS ${{ matrix.os }})
path: |
${{ env.WORKING_DIRECTORY }}/*.xml
${{ env.WORKING_DIRECTORY }}/htmlcov/
parallel-e2e-test-report:
needs: test
runs-on: ubuntu-latest
permissions:
checks: write
pull-requests: write
contents: read
issues: read
if: always()
steps:
- name: checkout
uses: actions/checkout@v4
- name: Publish Test Results
uses: "./.github/actions/step_publish_test_results"
with:
testActionFileName: promptflow-parallel-e2e-test.yml
testResultTitle: Parallel E2E Test Result
osVersion: ubuntu-latest
pythonVersion: 3.9
coverageThreshold: 40
context: test/parallel-e2e
87 changes: 87 additions & 0 deletions .github/workflows/promptflow-parallel-unit-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: promptflow-parallel-unit-test

on:
schedule:
- cron: "40 10 * * *" # 2:40 PST every day
pull_request:
paths:
- src/promptflow/**
- src/promptflow-core/**
- src/promptflow-tracing/**
- src/promptflow-parallel/**
- .github/workflows/promptflow-parallel-unit-test.yml
workflow_dispatch:

permissions:
id-token: write
contents: read

env:
IS_IN_CI_PIPELINE: "true"
WORKING_DIRECTORY: ${{ github.workspace }}/src/promptflow-parallel

jobs:
parallel-unit-test:
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-13]
python-version: ['3.8', '3.9', '3.10', '3.11']
fail-fast: false
# snok/install-poetry need this to support Windows
defaults:
run:
shell: bash
environment:
internal
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: set test mode
# Always run in replay mode for now until we figure out the test resource to run live mode
run: echo "PROMPT_FLOW_TEST_MODE=replay" >> $GITHUB_ENV
#run: echo "PROMPT_FLOW_TEST_MODE=$(if [[ "${{ github.event_name }}" == "pull_request" ]]; then echo replay; else echo live; fi)" >> $GITHUB_ENV
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: snok/install-poetry@v1
- name: install promptflow packages in editable mode
run: |
set -xe
poetry install --with ci,test
poetry run pip show promptflow-tracing
poetry run pip show promptflow-core
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: run unit tests
run: poetry run pytest -m unittest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: Upload Test Results
if: always()
uses: actions/upload-artifact@v3
with:
name: Test Results (Python ${{ matrix.python-version }}) (OS ${{ matrix.os }})
path: |
${{ env.WORKING_DIRECTORY }}/*.xml
${{ env.WORKING_DIRECTORY }}/htmlcov/
parallel-unit-test-report:
needs: test
runs-on: ubuntu-latest
permissions:
checks: write
pull-requests: write
contents: read
issues: read
if: always()
steps:
- name: checkout
uses: actions/checkout@v4
- name: Publish Test Results
uses: "./.github/actions/step_publish_test_results"
with:
testActionFileName: promptflow-parallel-unit-test.yml
testResultTitle: Parallel Unit Test Result
osVersion: ubuntu-latest
pythonVersion: 3.9
coverageThreshold: 40
context: test/parallel-unit
Empty file.
8 changes: 8 additions & 0 deletions src/promptflow-parallel/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Prompt flow parallel computing

[![Python package](https://img.shields.io/pypi/v/promptflow-parallel)](https://pypi.org/project/promptflow-parallel/)
[![Python](https://img.shields.io/pypi/pyversions/promptflow.svg?maxAge=2592000)](https://pypi.python.org/pypi/promptflow-core/)
[![License: MIT](https://img.shields.io/github/license/microsoft/promptflow)](https://github.com/microsoft/promptflow/blob/main/LICENSE)

# Introduction
Promptflow parallel leverages [AzureML Parallel Computing](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-parallel-job-in-pipeline?view=azureml-api-2&tabs=cliv2) to run flows at large scale.
6 changes: 6 additions & 0 deletions src/promptflow-parallel/promptflow/parallel/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
from .processor import ParallelRunProcessor, create_processor

__all__ = ["ParallelRunProcessor", "create_processor"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
27 changes: 27 additions & 0 deletions src/promptflow-parallel/promptflow/parallel/_config/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, Optional


def output_file_pattern(suffix: str) -> str:
return f"temp_*_*_{suffix}"


@dataclass
class ParallelRunConfig:
pf_model_dir: Optional[Path] = None
input_dir: Optional[Path] = None
output_dir: Optional[Path] = None
output_file_pattern: str = output_file_pattern("parallel_run_step.jsonl")
input_mapping: Dict[str, str] = field(default_factory=dict)
side_input_dir: Optional[Path] = None # side input to apply input mapping with
connections_override: Optional[Dict[str, str]] = None
debug_output_dir: Optional[Path] = None
logging_level: str = "INFO"

@property
def is_debug_enabled(self):
return self.logging_level.upper() == "DEBUG" and self.debug_output_dir is not None
97 changes: 97 additions & 0 deletions src/promptflow-parallel/promptflow/parallel/_config/parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import itertools
from argparse import ArgumentParser, Namespace
from pathlib import Path
from typing import Dict, Iterable, List, Tuple

from promptflow.parallel._config.model import ParallelRunConfig, output_file_pattern


def parse(args: List[str]) -> ParallelRunConfig:
parsed = _do_parse(args)
return _to_parallel_run_config(parsed)


def _to_parallel_run_config(parsed_args: Namespace) -> ParallelRunConfig:
return ParallelRunConfig(
pf_model_dir=parsed_args.pf_model,
input_dir=next(map(Path, iter(parsed_args.input_assets.values())), None),
output_dir=parsed_args.output_uri_file or parsed_args.output,
output_file_pattern=output_file_pattern(parsed_args.append_row_file_name),
input_mapping=parsed_args.input_mapping,
side_input_dir=parsed_args.pf_run_outputs,
connections_override=_get_connection_overrides(parsed_args),
debug_output_dir=parsed_args.pf_debug_info,
logging_level=parsed_args.logging_level,
)


def _get_connection_overrides(parsed_args: Namespace) -> Dict[str, str]:
return dict(
itertools.chain(
_retrieve_connection_overrides(parsed_args.pf_connections),
_retrieve_connection_overrides(parsed_args.pf_deployment_names),
_retrieve_connection_overrides(parsed_args.pf_model_names),
)
)


def _retrieve_connection_overrides(arg: str) -> Iterable[Tuple[str, str]]:
connection = arg.strip().strip('"') if arg else None
if not connection:
return
connection_params = connection.split(",")
for connection_param in connection_params:
if connection_param.strip() == "":
continue
key, value = connection_param.split("=")[0:2]
yield key.strip(), value.strip()


def _do_parse(args: List[str]) -> Namespace:
parser = ArgumentParser(description="Prompt Flow Parallel Run Config")
parser.add_argument("--amlbi_pf_model", dest="pf_model", type=Path, required=False, default=None)
parser.add_argument("--amlbi_pf_connections", dest="pf_connections", required=False)
parser.add_argument("--amlbi_pf_deployment_names", dest="pf_deployment_names", required=False)
parser.add_argument("--amlbi_pf_model_names", dest="pf_model_names", required=False)
parser.add_argument("--output_uri_file", dest="output_uri_file", type=Path, required=False, default=None)
parser.add_argument("--output", dest="output", type=Path, required=False, default=None)
parser.add_argument(
"--append_row_file_name", dest="append_row_file_name", required=False, default="parallel_run_step.jsonl"
)
parser.add_argument("--amlbi_pf_run_outputs", dest="pf_run_outputs", type=Path, required=False, default=None)
parser.add_argument("--amlbi_pf_debug_info", dest="pf_debug_info", type=Path, required=False, default=None)
parser.add_argument("--logging_level", dest="logging_level", required=False, default="INFO")
parsed_args, unknown_args = parser.parse_known_args(args)

setattr(parsed_args, "input_mapping", _parse_prefixed_args(unknown_args, "--pf_input_"))
setattr(parsed_args, "input_assets", _parse_prefixed_args(unknown_args, "--input_asset_"))

return parsed_args


def _parse_prefixed_args(args: List[str], prefix: str) -> Dict[str, str]:
"""parse prompt flow input args to dictionary.
Example:
>>> argv = ["--pf_input_uri=uri1", "--pf_input_arg2", "arg2"]
>>> _parse_prefixed_args(argv, "--pf_input_uri")
{"uri": "uri1"}
"""
parsed = {}
pre_arg_name = None
for _, arg in enumerate(args):
if arg.startswith(prefix):
if "=" in arg:
arg_name, arg_value = arg.split("=")
if len(arg_name) > len(prefix):
parsed[arg_name[len(prefix) :]] = arg_value
elif pre_arg_name is None:
pre_arg_name = arg[len(prefix) :]
continue
elif pre_arg_name is not None:
parsed[pre_arg_name] = arg
pre_arg_name = None
return parsed
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
Loading

0 comments on commit f17b01e

Please sign in to comment.