Skip to content

This repository contains the code and framework for the special course in 'Large Scale Computation of ... GPU Architecture'.

Notifications You must be signed in to change notification settings

simonaertssen/large_scale_gpus

Repository files navigation

large_scale_gpus

We investigate the influence of computational hardware topology on the throughput performance in terms of double precision floating point operations per second (FLOPS). Two commercially available GPU-accelerated compute nodes are compared using the multiplication of dense matrices as a compute-bound mathematical problem. We find that, the higher host-device memory bandwidths provided by NVLink-enabled CPUs (compared to PCIe connections) significantly improve the overall performance. We also compare our own implementation with an NVIDIA benchmark and report substantial speedups, especially for very large matrices.

About

This repository contains the code and framework for the special course in 'Large Scale Computation of ... GPU Architecture'.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published