Skip to content

Releases: ROCm/TransferBench

rocm-6.3.1

20 Dec 16:12
56a2d6f
Compare
Choose a tag to compare

ROCm release v6.3.1

TransferBench v1.58.00

05 Dec 20:46
fb713d0
Compare
Choose a tag to compare

v1.58.00

Fixed

  • Fixed broken specific DMA-engine copies

rocm-6.3.0

03 Dec 19:49
f6cc992
Compare
Choose a tag to compare

ROCm release v6.3.0

TransferBench v1.57.01

02 Dec 23:22
2c921db
Compare
Choose a tag to compare

v1.57.01

Added

  • Re-added "scaling" GPU GFX preset benchmark, which tests copies from GPU to other devices using varying
    number of CUs.

TransferBench v1.57.00

28 Nov 00:12
062b581
Compare
Choose a tag to compare

v1.57.00

Modified

  • Removing use of default starship operator / C++20 requirement to enable compilation of more OSs
  • Changing how version is reported. Client version is now just last two digits, and increments only if
    no changes are made to the backend header-only library file, and resets to 0 when header is updated
  • GFX_SINGLE_TEAM=0 is set by default

TransferBench v1.56

26 Nov 20:37
83fc9b3
Compare
Choose a tag to compare

v1.56

Fixed

  • Fixed bug when using interactive mode. Interactive mode now starts prior to all warmup iterations

TransferBench v1.55

26 Nov 20:37
9f68d14
Compare
Choose a tag to compare
TransferBench v1.55 Pre-release
Pre-release

v1.55

Fixed

  • Fixed missing header error when compiling on CentOS
  • Fixed issues when using multi-stream mode for GFX executor

TransferBench v1.54

21 Nov 23:32
02ce785
Compare
Choose a tag to compare

v1.54

Modified

  • Refactored TransferBench into a header-only library combined with a thin client to facilitate the
    use of TransferBench as the backend for other applications
  • Optimized how data validation is handled - this should speed up Tests with many parallel transfers as data is only
    generated once
  • Preset benchmarks now no longer take in any extra command line arguments. Preset settings are only accessed via
    environment variables. Details for each preset are printed
  • The a2a preset benchmark now defaults to using fine-grained memory and GFX unroll of 2
  • Refactored how Transfers are launched in parallel which has reduced some CPU-side overheads
  • CPU and DMA executor timing now use CPU wall clock timing instead of slowest Transfer time

Added

  • New one2all preset which sweeps over all subests of parallel transfers from one GPU to others
  • Adding new warnings for DMA execution relating to how HIP will default to using agents from the source memory

Removed

  • CU scaling preset has been removed. Similar functionality already exists in the schmoo preset benchmark
  • Preparation of source data via GFX kernel has been removed (USE_PREP_KERNEL)
  • Removed GFX block-reordering (BLOCK_ORDER)
  • Removed NUM_CPU_DEVICES and NUM_GPU_DEVICES from common env vars and only into the presets they apply to.
  • Removed SHARED_MEM_BYTES option for GFX executor
  • Removed USE_PCIE_INDEX, and SHARED_MEM_BYTES

Fixed

  • Fixed a potential timing reporting issue when DMA executed Transfers end up getting serialized.

TransferBench v1.53

11 Nov 06:59
b56d481
Compare
Choose a tag to compare

v1.53

Added

  • Added ability to specify NULL for sweep preset as source or destination memory type

TransferBench v1.52

09 Oct 16:49
600cf13
Compare
Choose a tag to compare

Added

  • Added USE_HSA_DMA env var to switch to using hsa_amd_memory_async_copy instead of hipMemcpyAsync for DMA execution
  • Added ability to set USE_GPU_DMA env var for a2a benchmark
  • Adding check for large BAR enablement for GPU devices during topology check

Fixed

  • Potential memory leak if HSA reports 0 hops between GPUs and CPUs