You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
v1.54
Modified
Refactored TransferBench into a header-only library combined with a thin client to facilitate the
use of TransferBench as the backend for other applications
Optimized how data validation is handled - this should speed up Tests with many parallel transfers as data is only
generated once
Preset benchmarks now no longer take in any extra command line arguments. Preset settings are only accessed via
environment variables. Details for each preset are printed
The a2a preset benchmark now defaults to using fine-grained memory and GFX unroll of 2
Refactored how Transfers are launched in parallel which has reduced some CPU-side overheads
CPU and DMA executor timing now use CPU wall clock timing instead of slowest Transfer time
Added
New one2all preset which sweeps over all subests of parallel transfers from one GPU to others
Adding new warnings for DMA execution relating to how HIP will default to using agents from the source memory
Removed
CU scaling preset has been removed. Similar functionality already exists in the schmoo preset benchmark
Preparation of source data via GFX kernel has been removed (USE_PREP_KERNEL)
Removed GFX block-reordering (BLOCK_ORDER)
Removed NUM_CPU_DEVICES and NUM_GPU_DEVICES from common env vars and only into the presets they apply to.
Removed SHARED_MEM_BYTES option for GFX executor
Removed USE_PCIE_INDEX, and SHARED_MEM_BYTES
Fixed
Fixed a potential timing reporting issue when DMA executed Transfers end up getting serialized.