TransferBench v1.54

gilbertlee-amd released this 21 Nov 23:32

02ce785

v1.54

Modified

Refactored TransferBench into a header-only library combined with a thin client to facilitate the
use of TransferBench as the backend for other applications
Optimized how data validation is handled - this should speed up Tests with many parallel transfers as data is only
generated once
Preset benchmarks now no longer take in any extra command line arguments. Preset settings are only accessed via
environment variables. Details for each preset are printed
The a2a preset benchmark now defaults to using fine-grained memory and GFX unroll of 2
Refactored how Transfers are launched in parallel which has reduced some CPU-side overheads
CPU and DMA executor timing now use CPU wall clock timing instead of slowest Transfer time

Added

New one2all preset which sweeps over all subests of parallel transfers from one GPU to others
Adding new warnings for DMA execution relating to how HIP will default to using agents from the source memory

Removed

CU scaling preset has been removed. Similar functionality already exists in the schmoo preset benchmark
Preparation of source data via GFX kernel has been removed (USE_PREP_KERNEL)
Removed GFX block-reordering (BLOCK_ORDER)
Removed NUM_CPU_DEVICES and NUM_GPU_DEVICES from common env vars and only into the presets they apply to.
Removed SHARED_MEM_BYTES option for GFX executor
Removed USE_PCIE_INDEX, and SHARED_MEM_BYTES

Fixed

Fixed a potential timing reporting issue when DMA executed Transfers end up getting serialized.

Assets 4