Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized cirrus workflows and task-batch-compute #105

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cvangerpen
Copy link
Contributor

@cvangerpen cvangerpen commented Mar 7, 2025

This change introduces user-defined templating for the following cirrus input configurations:

  • workflow definition YAMLs
  • workflow state machine JSONs
  • task-batch-compute definition YAMLs

The intention is to allow abstracting any environment-specific values out of these files and into input variables to maintain a single YAML/JSON that can be used across all environments. This brings the workflow and task-batch-compute modules in line with how the task module already handles definition YAMs.

In addition to general variable abstraction, these changes also enable a few handy features:

  • Workflow IAM roles can now be customized via the role_statements attribute in their definition YAMLs.
  • Using this IAM role customization, workflows can now integrate with any AWS service that supports state machines, not just lambda and batch, provided the user supplies the necessary IAM permissions.
  • Workflows are no longer forced to reference at least one cirrus task.

Breaking Change

With the introduction of user-defined template variables in workflow state machine JSONs, there was a new risk of the user's template variables potentially clashing with the names of the builtin task attribute variables. To prevent this, the builtin task attribute variables are now namespaced under a tasks. prefix, and the user's template variables are validated to ensure this reserved key is not used.

To account for this change, you will need to modify any existing task output references in your state machine JSONs, such as ${mytask.foo.bar}, to be ${tasks.mytask.foo.bar} instead.

Here's an example updated state machine that uses both lambda and batch cirrus tasks:

{
  "Comment": "Mirror Workflow",
  "StartAt": "copy-assets",
  "States": {
    "copy-assets": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "copy-assets-pre-batch",
          "States": {
            "copy-assets-pre-batch": {
              "Type": "Task",
              "Resource": "${tasks.pre-batch.lambda.function_arn}",                 // UPDATED
              "Next": "copy-assets-batch"
            },
            "copy-assets-batch": {
              "Type": "Task",
              "Resource": "arn:aws:states:::batch:submitJob.sync",
              "Parameters": {
                "JobDefinition": "${tasks.copy-assets.batch.job_definition_arn}",   // UPDATED
                "JobName": "copy-assets",
                "JobQueue": "${tasks.copy-assets.batch.job_queue_arn}",             // UDPATED
                "Parameters": {
                  "url.$": "$.url",
                  "url_out.$": "$.url_out"
                }
              },
              "Next": "copy-assets-post-batch"
            },
            "copy-assets-post-batch": {
              "Type": "Task",
              "Resource": "${tasks.post-batch.lambda.function_arn}",                // UPDATED
              "End": true
            }
          }
        }
      ],
      "OutputPath": "$[0]",
      "Next": "push-items-to-s3"
    },
    "push-items-to-s3": {
      "Type": "Task",
      "Resource": "${tasks.push-items-to-s3.lambda.function_arn}",                  // UPDATED
      "End": true
    },
    "failure": {
      "Type": "Fail"
    }
  }
}

This namespace also makes the source of these attribute lookups clearer.

Related issue(s)

Proposed Changes

Testing

This change was validated by the following observations:

Checklist

  • I have deployed and validated this change
  • Changelog
    • I have added my changes to the changelog
    • No changelog entry is necessary
  • README migration
    • I have added any migration steps to the Readme
    • No migration is necessary

@cvangerpen cvangerpen force-pushed the cvg/parameterized-workflows-and-task-batch-compute branch from 6200e66 to 7e06d90 Compare March 7, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant