Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Sagemaker pipeline schedules if specified #3271

Open
wants to merge 40 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
5570c0c
Create Sagemaker pipeline schedules if specified
htahir1 Dec 20, 2024
6f7bf28
Add property to check if orchestrator is schedulable
htahir1 Dec 20, 2024
0212388
Add EventBridge rule for SageMaker pipeline execution
htahir1 Dec 20, 2024
2125312
Update IAM policy and trust relationship for EventBridge
htahir1 Dec 20, 2024
966f712
Refactor schedule metadata generation for Sagemaker orchestrator
htahir1 Dec 20, 2024
dd7110c
Add scheduling support for SageMaker orchestrator
htahir1 Dec 20, 2024
fcf934e
Remove trust relationship logic in Sagemaker orchestrator
htahir1 Dec 20, 2024
358b3e8
Handle unsupported schedule in custom orchestrator
htahir1 Dec 20, 2024
7ce08b6
Refactor yield statement to use 'yield from' syntax
htahir1 Dec 20, 2024
72bdae1
Ensure IAM permissions for scheduled SageMaker pipelines
htahir1 Dec 20, 2024
abf4610
Update authentication instructions for SageMaker orchestrator
htahir1 Dec 20, 2024
67705c2
Refactor Sagemaker orchestrator metadata handling
htahir1 Dec 20, 2024
d190f31
Add unit tests for SageMaker orchestrator metadata
htahir1 Dec 20, 2024
f1cabc7
Add exception handling for pipeline preparation errors
htahir1 Dec 20, 2024
726b47a
Add timezone information to first execution message
htahir1 Dec 20, 2024
80c8e8e
Add timezone support to AWS SageMaker orchestrator
htahir1 Dec 20, 2024
da8fd35
Update error handling in SagemakerOrchestrator
htahir1 Dec 20, 2024
ab0c06d
Update error handling messages for AWS in Sagemaker orchestrator
htahir1 Dec 20, 2024
3c12d45
Refactor error handling in SagemakerOrchestrator
htahir1 Dec 20, 2024
51a499a
Handle insufficient permissions creating EventBridge rules
htahir1 Dec 20, 2024
7d65a20
Update error message for EventBridge creation failure
htahir1 Dec 20, 2024
e2ecaf3
Remove logging in SagemakerOrchestrator class
htahir1 Dec 20, 2024
a5fd82b
Refactor orchestrator metadata computation logic
htahir1 Dec 20, 2024
d229460
Merge branch 'develop' into feature/add-sagemaker-schedule
htahir1 Dec 22, 2024
a005e5a
Update handling of scheduled pipeline updates in SageMaker.md
htahir1 Dec 22, 2024
645903a
Merge branch 'feature/add-sagemaker-schedule' of github.com:zenml-io/…
htahir1 Dec 22, 2024
8811f3f
Add optional IAM permissions for policy updates
htahir1 Dec 23, 2024
44fa6c5
Remove redundant code for getting SageMaker session
htahir1 Dec 23, 2024
3146b57
Add pipeline scheduler role and handling for scheduling errors
htahir1 Jan 9, 2025
b52f7ed
Merge remote-tracking branch 'origin/develop' into feature/add-sagema…
htahir1 Jan 9, 2025
e52109b
Refactor Sagemaker orchestrator test methods
htahir1 Jan 9, 2025
0272c5e
Update SageMaker orchestrator for scheduled pipelines
htahir1 Jan 9, 2025
6826613
Auto-update of LLM Finetuning template
actions-user Jan 9, 2025
dcfb03e
Auto-update of Starter template
actions-user Jan 9, 2025
87ef96b
Add comment about rounding up to 1 minute for SageMaker
htahir1 Jan 9, 2025
eb96d95
Auto-update of E2E template
actions-user Jan 9, 2025
9542ed8
Auto-update of NLP template
actions-user Jan 9, 2025
467f25b
Validate and format cron expression for SageMaker
htahir1 Jan 9, 2025
48f93ff
Merge branch 'feature/add-sagemaker-schedule' of github.com:zenml-io/…
htahir1 Jan 9, 2025
b995970
Update start time calculation for SageMaker Orchestrator
htahir1 Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 119 additions & 8 deletions docs/book/component-guide/orchestrators/sagemaker.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ There are three ways you can authenticate your orchestrator and link it to the I

{% tabs %}
{% tab title="Authentication via Service Connector" %}
The recommended way to authenticate your SageMaker orchestrator is by registering an [AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md) and connecting it to your SageMaker orchestrator:
The recommended way to authenticate your SageMaker orchestrator is by registering an [AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md) and connecting it to your SageMaker orchestrator. If you plan to use scheduled pipelines, ensure the credentials used by the service connector have the necessary EventBridge and IAM permissions listed in the [Required IAM Permissions](#required-iam-permissions) section:

```shell
zenml service-connector register <CONNECTOR_NAME> --type aws -i
Expand All @@ -72,7 +72,7 @@ zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set
{% endtab %}

{% tab title="Explicit Authentication" %}
Instead of creating a service connector, you can also configure your AWS authentication credentials directly in the orchestrator:
Instead of creating a service connector, you can also configure your AWS authentication credentials directly in the orchestrator. If you plan to use scheduled pipelines, ensure these credentials have the necessary EventBridge and IAM permissions listed in the [Required IAM Permissions](#required-iam-permissions) section:

```shell
zenml orchestrator register <ORCHESTRATOR_NAME> \
Expand All @@ -88,7 +88,7 @@ See the [`SagemakerOrchestratorConfig` SDK Docs](https://sdkdocs.zenml.io/latest
{% endtab %}

{% tab title="Implicit Authentication" %}
If you neither connect your orchestrator to a service connector nor configure credentials explicitly, ZenML will try to implicitly authenticate to AWS via the `default` profile in your local [AWS configuration file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
If you neither connect your orchestrator to a service connector nor configure credentials explicitly, ZenML will try to implicitly authenticate to AWS via the `default` profile in your local [AWS configuration file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). If you plan to use scheduled pipelines, ensure this profile has the necessary EventBridge and IAM permissions listed in the [Required IAM Permissions](#required-iam-permissions) section:

```shell
zenml orchestrator register <ORCHESTRATOR_NAME> \
Expand Down Expand Up @@ -153,10 +153,6 @@ Alternatively, for a more detailed view of log messages during SageMaker pipelin

![SageMaker CloudWatch Logs](../../.gitbook/assets/sagemaker-cloudwatch-logs.png)

### Run pipelines on a schedule

The ZenML Sagemaker orchestrator doesn't currently support running pipelines on a schedule. We maintain a public roadmap for ZenML, which you can find [here](https://zenml.io/roadmap). We welcome community contributions (see more [here](https://github.com/zenml-io/zenml/blob/main/CONTRIBUTING.md)) so if you want to enable scheduling for Sagemaker, please [do let us know](https://zenml.io/slack)!

### Configuration at pipeline or step level

When running your ZenML pipeline with the Sagemaker orchestrator, the configuration set when configuring the orchestrator as a ZenML component will be used by default. However, it is possible to provide additional configuration at the pipeline or step level. This allows you to run whole pipelines or individual steps with alternative configurations. For example, this allows you to run the training process with a heavier, GPU-enabled instance type, while running other steps with lighter instances.
Expand Down Expand Up @@ -339,4 +335,119 @@ This approach allows for more granular tagging, giving you flexibility in how yo

Note that if you wish to use this orchestrator to run steps on a GPU, you will need to follow [the instructions on this page](../../how-to/pipeline-development/training-with-gpus/README.md) to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
### Scheduling Pipelines

The SageMaker orchestrator supports running pipelines on a schedule using AWS EventBridge. You can configure schedules in three ways:

* Using a cron expression
* Using a fixed interval
* Running once at a specific time

```python
from zenml import pipeline
from datetime import datetime, timedelta

# Using a cron expression (runs daily at 2 AM UTC)
@pipeline(schedule=Schedule(cron_expression="0 2 * * *"))
def my_scheduled_pipeline():
# Your pipeline steps here
pass

# Using an interval (runs every 2 hours)
@pipeline(schedule=Schedule(interval_second=timedelta(hours=2)))
def my_interval_pipeline():
# Your pipeline steps here
pass

# Running once at a specific time
@pipeline(schedule=Schedule(run_once_start_time=datetime(2024, 12, 31, 23, 59)))
def my_one_time_pipeline():
# Your pipeline steps here
pass
```

When you deploy a scheduled pipeline, ZenML will:
1. Create an EventBridge rule with the specified schedule
2. Configure the necessary IAM permissions
3. Set up the SageMaker pipeline as the target

{% hint style="info" %}
If you run the same pipeline with a schedule multiple times, the existing schedule will be updated with the new settings rather than creating a new schedule. This allows you to modify schedules by simply running the pipeline again with new schedule parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do this to keep it consistent with all other orchestrators.

{% endhint %}

#### Required IAM Permissions

When using scheduled pipelines, you need to ensure your IAM role has the correct permissions and trust relationships. Here's a detailed breakdown of why each permission is needed:

1. **Trust Relationships**
Your execution role needs to trust both SageMaker and EventBridge services to allow them to assume the role:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"sagemaker.amazonaws.com", // Required for SageMaker execution
"events.amazonaws.com" // Required for EventBridge to trigger pipelines
]
},
"Action": "sts:AssumeRole"
}
]
}
```

2. **Required IAM Policies**
In addition to the basic SageMaker permissions, the AWS credentials used by the service connector (or provided directly to the orchestrator) need the following permissions to create and manage scheduled pipelines:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"events:PutRule", // Required to create schedule rules
"events:PutTargets", // Required to set pipeline as target
"events:DeleteRule", // Required for cleanup
"events:RemoveTargets", // Required for cleanup
"events:DescribeRule", // Required to verify rule creation
"events:ListTargetsByRule" // Required to verify target setup
],
"Resource": "arn:aws:events:*:*:rule/zenml-*"
}
]
}
```

The following IAM permissions are optional but recommended to allow automatic policy updates for the execution role:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:GetRole", // For verifying role exists
"iam:GetRolePolicy", // For checking existing policies
"iam:PutRolePolicy", // For adding new policies
"iam:UpdateAssumeRolePolicy" // For updating trust relationships
],
"Resource": "arn:aws:iam::*:role/*"
}
]
}
```

These permissions enable:
* Creation and management of EventBridge rules for scheduling
* Setting up trust relationships between services
* Managing IAM policies required for the scheduled execution
* Cleanup of resources when schedules are removed

Without the EventBridge permissions, the scheduling functionality will fail. Without the IAM permissions, you'll need to manually ensure your execution role has the necessary permissions to start pipeline executions.

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Schedules don't work for all orchestrators. Here is a list of all supported orch
| [KubernetesOrchestrator](../../../component-guide/orchestrators/kubernetes.md) | ✅ |
| [LocalOrchestrator](../../../component-guide/orchestrators/local.md) | ⛔️ |
| [LocalDockerOrchestrator](../../../component-guide/orchestrators/local-docker.md) | ⛔️ |
| [SagemakerOrchestrator](../../../component-guide/orchestrators/sagemaker.md) | ⛔️ |
| [SagemakerOrchestrator](../../../component-guide/orchestrators/sagemaker.md) | |
| [SkypilotAWSOrchestrator](../../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [SkypilotAzureOrchestrator](../../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [SkypilotGCPOrchestrator](../../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,15 @@ class SagemakerOrchestratorSettings(BaseSettings):
("processor_role", "execution_role"), ("processor_tags", "tags")
)

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True

@model_validator(mode="before")
def validate_model(cls, data: Dict[str, Any]) -> Dict[str, Any]:
"""Check if model is configured correctly.
Expand Down Expand Up @@ -184,6 +193,7 @@ class SagemakerOrchestratorConfig(

Attributes:
execution_role: The IAM role ARN to use for the pipeline.
scheduler_role: The IAM role ARN to use for the scheduler.
aws_access_key_id: The AWS access key ID to use to authenticate to AWS.
If not provided, the value from the default AWS config will be used.
aws_secret_access_key: The AWS secret access key to use to authenticate
Expand All @@ -203,6 +213,7 @@ class SagemakerOrchestratorConfig(
"""

execution_role: str
scheduler_role: Optional[str] = None
aws_access_key_id: Optional[str] = SecretField(default=None)
aws_secret_access_key: Optional[str] = SecretField(default=None)
aws_profile: Optional[str] = None
Expand Down
Loading
Loading