Move Linux GPU CI pipeline to A10 #23235

snnn · 2025-01-01T03:41:28Z

Move Linux GPU CI pipeline to A10 machines which are more advanced.
Retire onnxruntime-Linux-GPU-T4 machine pool.
Disable run_lean_attention test because the new machines do not have enough shared memory.

skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory
[E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument

snnn · 2025-01-02T23:56:32Z

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines · 2025-01-02T23:56:41Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

Files not reviewed (1)

tools/ci_build/github/linux/build_cuda_ci.sh: Language not supported

Comments suppressed due to low confidence (1)

tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml:140

[nitpick] The pool name 'Onnxruntime-Linux-A10-24G' uses a different capitalization pattern compared to the previous 'onnxruntime-Linux-GPU-T4'. Ensure consistent naming conventions.

pool: Onnxruntime-Linux-A10-24G

Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ```

snnn added 2 commits January 1, 2025 03:41

Move Linux GPU CI pipeline to A10

b9682a3

Update GPU pool to onnxruntime-Linux-GPU-A10-12G

577e398

jchen351 previously approved these changes Jan 2, 2025

View reviewed changes

update

b80c913

snnn dismissed jchen351’s stale review via b80c913 January 2, 2025 20:06

Update pool name in CI pipeline

c103782

snnn added 2 commits January 3, 2025 01:29

Merge remote-tracking branch 'origin/main' into snnn/retire_t4

8a5f257

update

c9e4757

snnn requested review from tianleiwu and Copilot January 3, 2025 03:25

Copilot AI reviewed Jan 3, 2025

View reviewed changes

tianleiwu approved these changes Jan 4, 2025

View reviewed changes

snnn merged commit b7ef81a into main Jan 5, 2025
94 of 96 checks passed

snnn deleted the snnn/retire_t4 branch January 5, 2025 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Linux GPU CI pipeline to A10 #23235

Move Linux GPU CI pipeline to A10 #23235

snnn commented Jan 1, 2025 •

edited

Loading

snnn commented Jan 2, 2025

azure-pipelines bot commented Jan 2, 2025

Move Linux GPU CI pipeline to A10 #23235

Move Linux GPU CI pipeline to A10 #23235

Conversation

snnn commented Jan 1, 2025 • edited Loading

snnn commented Jan 2, 2025

azure-pipelines bot commented Jan 2, 2025

Choose a reason for hiding this comment

snnn commented Jan 1, 2025 •

edited

Loading