Better Nightly Builds #3002
Replies: 4 comments 8 replies
-
@bendnorman and I had a call, I think our high level takeaways are:
Immediate next steps are:
|
Beta Was this translation helpful? Give feedback.
-
Thought a bunch about how we might want to set up our build process with Batch. If the plan / steps sound reasonable I'll turn this into GH issues and start working on them. Actual high level needsWe want to be able to run the ETL, validate its outputs, publish artifacts, and update the We'd also like to be able to correlate the build artifacts with the code version that generated them. Finally, we'd like to be able to change some behavior based on whether it's a nightly, stable, or ad-hoc run: Nightly:
Stable:
Ad-hoc
Desired end technical stateWe'll still kick off the build process with GitHub Actions, which will configure/submit a Google Batch job. The Batch job will run a build script within a Docker container. GHA workflow
Google batch job descriptionThis will have to be generated dynamically by a Python script that passes the various settings from the GHA context into a JSON file. The secrets will be kept in Google Secrets so that we don't have to pass them around. The non-secret settings will be passed into the main script as CLI args via the {
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"container": {
"imageUri": "docker.io/catalystcoop/pudl-etl:<TAG>",
"commands": [
"micromamba",
...
]
},
"environment": {
"secretVariables": {
"PUDL_BOT_PAT": "projects/PROJECT_ID/secrets/SECRET_NAME/versions/VERSION",
...
}
}
}
]
}
}
],
"allocationPolicy": {
"service_account": {
"email": "some-special-service-account"
}
},
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
} In-container scriptThis will be a Python script that replicates the functionality of Example call: $ ./run_the_dang_build.py --etl-config-file /path/to/etl_fast.yml --gcs-dest gs://foo --aws-dest s3://... --do-publish-datasette --current-git-ref nightly-YYYY-MM-DD --git-target-branch nightly It will pick up secrets from the environment variables. Path to powerThere's a bunch of things that have to happen - here's the order in which we should do things so that we get something useful out fast.
User flowsRunning nightly builds
Running stable builds
Running ad-hoc builds
How to re-run a nightly build that failed?Apply a fix, then push tag Or, if you don't need to apply a code fix, just re-run the GHA workflow that failed / trigger workflow run manually based on the |
Beta Was this translation helpful? Give feedback.
-
For reasons that I don't understand at all, the current VM deployment setup seems to gotten very flaky when combined with the new nightly tagging / branch migration setup, so it seems like that flakiness is currently what stands in the way of getting the nightly/stable branches + tagging / trunk-based development / release-on-tag setup working. It might be helpful to have a slightly bigger outline of that chain of tasks and the order they need to happen in? Are there things other than:
It seems like pythonizing the build script can be deferred until the other deployment infrastructure has been updated. Note that we'll distribute 2 different kinds of "build artifacts"
|
Beta Was this translation helpful? Give feedback.
-
Another thing we should do in the new nightly builds setup is abort the build if there have been no changes to the codebase This should be easy once we get the |
Beta Was this translation helpful? Give feedback.
-
Requirements:
Nice to haves:
Notes from call with Dazhong
Beta Was this translation helpful? Give feedback.
All reactions