Skip to content

Commit

Permalink
Add CI workflow that runs the full SDG pipeline
Browse files Browse the repository at this point in the history
This PR introduces a new e2e CI workflow that exercises the full SDG
pipeline. Some important details:

- This job runs on a dynamic github runner spawned on AWS

- The instance type has 4x NVIDIA A10G GPUs (96 GB total)

- a 4-bit quantized version of Mixtral-8x7b-instruct is used

- Training is currently skipped while we work out getting the new
  training library functional in CI

- The model is served via llama-cpp, as vllm is not yet functional in
  CI

- The job does not run automatically. It must be launched manually via
  the GitHub UI. When you launch it against a given PR, it will
  automatically add comments to that PR to make it easier to follow
  the progress and results. For more info on launching this workflow
  manually, see the instructions for similar workflows in the
  `instructlab/instructlab` repository:

  https://github.com/instructlab/instructlab/blob/main/docs/ci.md

Signed-off-by: Russell Bryant <[email protected]>
  • Loading branch information
russellb committed Jul 8, 2024
1 parent 23f2cb7 commit 20eb593
Showing 1 changed file with 179 additions and 0 deletions.
179 changes: 179 additions & 0 deletions .github/workflows/e2e-nvidia-a10g-x4.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# SPDX-License-Identifier: Apache-2.0

name: E2E (NVIDIA A10G x4 - full pipeline)

on:
workflow_dispatch:
inputs:
pr_or_branch:
description: 'pull request number or branch name'
required: true
default: 'main'

jobs:
start-runner:
name: Start external EC2 runner
runs-on: ubuntu-latest
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Start EC2 runner
id: start-ec2-runner
uses: machulav/ec2-github-runner@fcfb31a5760dad1314a64a0e172b78ec6fc8a17e # v2.3.6
with:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ami-00c51d9c1374eda97
ec2-instance-type: g5.12xlarge
subnet-id: subnet-02d230cffd9385bd4
security-group-id: sg-06300447c4a5fbef3
iam-role-name: instructlab-ci-runner
aws-resource-tags: >
[
{"Key": "Name", "Value": "instructlab-ci-github-runner"},
{"Key": "GitHubRepository", "Value": "${{ github.repository }}"}
]
e2e:
name: E2E Test
needs: start-runner
runs-on: ${{ needs.start-runner.outputs.label }}

permissions:
pull-requests: write

steps:
- name: Checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
with:
fetch-depth: 0

- name: Determine if pr_or_branch is a PR number
id: check_pr
run: |
if [[ "${{ github.event.inputs.pr_or_branch }}" =~ ^[0-9]+$ ]]; then
echo "is_pr=true" >> "$GITHUB_OUTPUT"
else
echo "is_pr=false" >> "$GITHUB_OUTPUT"
fi
- name: Check if gh cli is installed
id: gh_cli
run: |
if command -v gh &> /dev/null ; then
echo "gh_cli_installed=true" >> "$GITHUB_OUTPUT"
else
echo "gh_cli_installed=false" >> "$GITHUB_OUTPUT"
fi
- name: Install gh CLI
if: steps.gh_cli.outputs.gh_cli_installed == 'false'
run: |
sudo dnf install 'dnf-command(config-manager)' -y
sudo dnf config-manager --add-repo https://cli.github.com/packages/rpm/gh-cli.repo
sudo dnf install gh --repo gh-cli -y
- name: test gh CLI
run: |
gh --version
- name: set default repo
run: |
gh repo set-default ${{ github.server_url }}/${{ github.repository }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Add comment to PR
if: steps.check_pr.outputs.is_pr == 'true'
run: |
gh pr comment "${{ github.event.inputs.pr_or_branch }}" -b "${{ github.workflow }} workflow launched on this PR: [View run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Fetch and checkout PR
if: steps.check_pr.outputs.is_pr == 'true'
run: |
gh pr checkout ${{ github.event.inputs.pr_or_branch }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Checkout branch
if: steps.check_pr.outputs.is_pr == 'false'
run: |
git checkout ${{ github.event.inputs.pr_or_branch }}
- name: Install Packages
run: |
cat /etc/os-release
sudo dnf install -y gcc gcc-c++ make git python3.11 python3.11-devel
- name: Install ilab
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
export PATH="/home/ec2-user/.local/bin:/usr/local/cuda/bin:$PATH"
python3.11 -m venv --upgrade-deps venv
. venv/bin/activate
git clone https://github.com/instructlab/instructlab
cd instructlab
sed 's/\[.*\]//' requirements.txt > constraints.txt
python3.11 -m pip cache remove llama_cpp_python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" python3.11 -m pip install --force-reinstall --no-binary llama_cpp_python -c constraints.txt llama_cpp_python
python3.11 -m pip install bitsandbytes
python3.11 -m pip install .
- name: Install sdg
run: |
. venv/bin/activate
python3.11 -m pip install .
- name: Run e2e test
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
. venv/bin/activate
cd instructlab
SKIP_TRAIN=1 ./scripts/basic-workflow-tests.sh -cmFM
- name: Add comment to PR if the workflow failed
if: failure() && steps.check_pr.outputs.is_pr == 'true'
run: |
gh pr comment "${{ github.event.inputs.pr_or_branch }}" -b "e2e workflow failed on this PR: [View run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}), please investigate."
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Add comment to PR if the workflow succeeded
if: success() && steps.check_pr.outputs.is_pr == 'true'
run: |
gh pr comment "${{ github.event.inputs.pr_or_branch }}" -b "e2e workflow succeeded on this PR: [View run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}), congrats!"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

stop-runner:
name: Stop external EC2 runner
needs:
- start-runner
- e2e
runs-on: ubuntu-latest
if: ${{ always() }}
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Stop EC2 runner
uses: machulav/ec2-github-runner@v2
with:
mode: stop
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
label: ${{ needs.start-runner.outputs.label }}
ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}

0 comments on commit 20eb593

Please sign in to comment.