h100 timings for CCO‐Surface50

CCO-surface50 Timings - 5/21/2024

This example is FFT dominant.

Directory: /home/bylaska/PWDFT/QA/CCO-Cu_surface50

The table contains performance timings for the computational task on the given machine with varying numbers of CPU cores (ncpus). The timings are presented in seconds (cputime) and are broken down into different components:

non-local: Timings for non-local operations.
ffm: Timings for ffm operations.
fmf: Timings for fmf operations.
fft: Timings for FFT (Fast Fourier Transform) operations.
diagonalize: Timings for diagonalize operations.

"In the nvidia binary, FFT operations are exclusively performed using gpu, while BLAS3 operations are executed on the GPU. Additionally, it's important to note that the GPUs become overloaded after reaching a threshold of ncpus=6."

machine	ncpus	cputime	non-local	ffm	fmf	fft	diagonalize
h100	8
h100	16
h100	24
h100	32
SYCL	12	4.995e+00	4.499e-01	4.335e-02	1.859e-02	4.185e+00	3.259e-03
SYCL	24	3.066e+00	4.563e-01	4.693e-02	1.231e-02	2.351e+00	3.308e-03
SYCL	48	2.300e+00	4.931e-01	5.110e-02	1.410e-02	1.581e+00	4.477e-03
h100	64
h100	200

The table presents the total and component times for different numbers of CPU cores (ncpus). The optimal timings for each component are indicated by bold values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

h100 timings for CCO‐Surface50

CCO-surface50 Timings - 5/21/2024

Clone this wiki locally