Skip to content

TransferBench v1.54

Compare
Choose a tag to compare
@gilbertlee-amd gilbertlee-amd released this 21 Nov 23:32
02ce785

v1.54

Modified

  • Refactored TransferBench into a header-only library combined with a thin client to facilitate the
    use of TransferBench as the backend for other applications
  • Optimized how data validation is handled - this should speed up Tests with many parallel transfers as data is only
    generated once
  • Preset benchmarks now no longer take in any extra command line arguments. Preset settings are only accessed via
    environment variables. Details for each preset are printed
  • The a2a preset benchmark now defaults to using fine-grained memory and GFX unroll of 2
  • Refactored how Transfers are launched in parallel which has reduced some CPU-side overheads
  • CPU and DMA executor timing now use CPU wall clock timing instead of slowest Transfer time

Added

  • New one2all preset which sweeps over all subests of parallel transfers from one GPU to others
  • Adding new warnings for DMA execution relating to how HIP will default to using agents from the source memory

Removed

  • CU scaling preset has been removed. Similar functionality already exists in the schmoo preset benchmark
  • Preparation of source data via GFX kernel has been removed (USE_PREP_KERNEL)
  • Removed GFX block-reordering (BLOCK_ORDER)
  • Removed NUM_CPU_DEVICES and NUM_GPU_DEVICES from common env vars and only into the presets they apply to.
  • Removed SHARED_MEM_BYTES option for GFX executor
  • Removed USE_PCIE_INDEX, and SHARED_MEM_BYTES

Fixed

  • Fixed a potential timing reporting issue when DMA executed Transfers end up getting serialized.