Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use process_retry on amd-smi #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ivarflakstad
Copy link
Member

Both rocm-smi and amd-smi list randomly fail on our runners.
This should increase the odds of getting past those failures.

@ivarflakstad ivarflakstad requested a review from ydshieh January 27, 2025 13:44
@@ -67,7 +67,7 @@ jobs:
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --env HIP_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: AMD-SMI
run: amd-smi list
run: python3 utils/process_retry.py 'amd-smi list'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to update the repository before running this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update this one before merge.

@@ -120,7 +120,7 @@ jobs:
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT

- name: AMD-SMI
run: amd-smi list
run: python3 utils/process_retry.py 'amd-smi list'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this one being after name: Update clone

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

(I would prefer, in general, opening separate RPs for different purpose. I believe there are some changes here are just about styles.)

@ivarflakstad
Copy link
Member Author

LGTM.

(I would prefer, in general, opening separate RPs for different purpose. I believe there are some changes here are just about styles.)

Yeah, most of those are from my IDE actually. It might be a bit too opinionated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants