-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2: Fused block jacobi #13837
base: develop
Are you sure you want to change the base?
Ifpack2: Fused block jacobi #13837
Conversation
1d16296
to
29fe448
Compare
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
More performant path for block Jacobi case inside BTDS (GPU only, BlockCrs only). Fuses residual and solve into one kernel and doesn't convert vectors to SIMD-packed format. Also inverts diag blocks fully in shared to speed up numeric. Signed-off-by: Brian Kelley <[email protected]>
9b0761c
to
c1f0eb5
Compare
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Not always equal to vector_length/2, since vector_length can theoretically be 1 Signed-off-by: Brian Kelley <[email protected]>
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has pushed a change to the PR before testing completed. NEW EVENT 'committed', ID C_kwDOAsJyMdoAKDkzYzBkMTFjMThjNGRkZmI5ZjA0MzVkNTA0NWJhY2ZjY2M0Zjc5MGQ... The Jenkins Jobs will be shutdown; Testing of this PR must occur again. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
More performant paths for block Jacobi case inside BTDS (GPU only, BlockCrs only).
Compared to #13805, this gives between 9% (bs = 7) and 33% (bs = 11) speedup on the overall solve and about 1.9x speedup in numeric setup for both block sizes. Note: this was measured on a single GPU run, so the speedup only applies to local computation. Multi-rank runs will speed up less due to time spent doing communication.
@trilinos/ifpack2
Related Issues
Follows #13805
Stakeholder Feedback
Will ask SPARC team to evaluate.
Testing
Tested in Ifpack2_BlockTriDiContainerUnitAndPerfTests, and ran this on OpenMP, Cuda and HIP. For Cuda tested double, float and complex_double.