Releases: ROCm/TransferBench
Releases · ROCm/TransferBench
TransferBench v1.41
Additions
- Adding schmoo preset config benchmarks local/remote reads/writes/copies
- Usage: ./TransferBench schmoo <numBytes=64M> <localIdx=0> <remoteIdx=1> <maxNumCUs=32>
Fixes
- Fixing some misreported timings when running with non-fixed number of iterations
TransferBench v1.40
Fixes
- Fixing XCC defaulting to 0 instead of random for preset configs, ignoring XCC_PREF_TABLE
TransferBench v1.39
v1.39
Additions
- (Experimental) Adding support for Executor sub-index
Fixes
- Remove deprecated gcnArch code. ROCm version must include support for hipDeviceMallocUncached
TransferBench v1.38
Fixes
- Adding missing threadfence which could cause non-fine-grained Transfers to report higher speeds
TransferBench v1.37
Changes
- USE_SINGLE_STREAM is enabled by default now. (Disable via USE_SINGLE_STREAM=0)
Fixes
- Fix unrecognized token error when XCC_PREF_TABLE is unspecified
TransferBench v1.35
Additions
- USE_FINE_GRAIN also applies to a2a preset
TransferBench v1.34
Added
- Set GPU_KERNEL=3 to default for gfx942
TransferBench v1.33
Adding ALWAYS_VALIDATE env var to allow for validation after every iteration instead of just once at end of all iterations
TransferBench v1.32
Modified
- Increased line limit from 2048 to 32768
TransferBench v1.31
Modified
- SHOW_ITERATIONS now show XCC:CU instead of just CU ID
- SHOW_ITERATIONS also printed when USE_SINGLE_STREAM=1