Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / TransferBench Public

Notifications You must be signed in to change notification settings
Fork 14
Star 38

Code
Issues
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: ROCm/TransferBench

Releases · ROCm/TransferBench

TransferBench v1.53

11 Nov 06:59

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.53

v1.53

Added

Added ability to specify NULL for sweep preset as source or destination memory type

Assets 4

Loading

All reactions

TransferBench v1.52

09 Oct 16:49

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.52

Added

Added USE_HSA_DMA env var to switch to using hsa_amd_memory_async_copy instead of hipMemcpyAsync for DMA execution
Added ability to set USE_GPU_DMA env var for a2a benchmark
Adding check for large BAR enablement for GPU devices during topology check

Fixed

Potential memory leak if HSA reports 0 hops between GPUs and CPUs

Assets 4

Loading

All reactions

TransferBench v1.51

15 Aug 17:46

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.51

v1.51

Modified

CSV output has been modified slightly to match normal terminal output
Output for non single stream mode has been changed to match single stream mode (results per Executor)

Added

Support for sub-iterations via NUM_SUBITERATIONS. This allows for additional looping during an iteration
If set to 0, this should infinitely loop (which may be useful for some debug purposes)
Support for variable number of subexecutors (currently for GPU-GFX executor only). Setting subExecutors to
0 will run over a range of CUs to use, and report only the results of the best one found. This can be tuned
for performance by setting the MIN_VAR_SUBEXEC and MAX_VAR_SUBEXEC environment variables to narrow the
search space. The number of CUs used will be identical for all variable subExecutor transfers
Experimental new "healthcheck" preset config which currently only supports MI300 series. This preset runs
through CPU to GPU bandwidth tests and all-to-all XGMI bandwidth tests and compares against expected values
Pass criteria limits can be modified (due to platform differences) via the environment variables
LIMIT_UDIR (undirectional), LIMIT_BDIR (bidirectional), and LIMIT_A2A (Per GPU-GPU link bandwidth)

Fixed

Fixed out-of-bounds memory access during topology detection that can happen if the number of
CPUs is less than the number of NUMA domains
Fixed CU masking functionality on multi-XCD architectures (e.g. MI300)

Assets 2

Loading

All reactions

TransferBench v1.50

03 Apr 16:27

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.50

Added

Adding new parallel copy preset benchmark (pcopy)
- Usage: ./TransferBench pcopy <numBytes=64M> <#CUs=8> <srcGpu=0> <minGpus=1> <maxGpus=#GPU-1>

Fixed

Removed non-copies DMA Transfers (this had previously been using hipMemset)
Fixed CPU executor when operating on null destination

Assets 6

Loading

All reactions

TransferBench v1.49

02 Apr 22:38

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.49

Fixes

Enumerating previously missed DMA engines used only for CPU traffic in topology display

Assets 2

Loading

All reactions

TransferBench v1.48

02 Feb 22:46

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.48

v1.48

Fixes

Various fixes for TransferBenchCuda

Additions

Support for targeting specific DMA engines via executor subindex (e.g. D0.1)
Printing warnings when exeuctors are overcommited

Modifications

USE_REMOTE_READ supported for rwrite preset benchmark

Assets 2

Loading

All reactions

TransferBench v1.47

09 Jan 20:52

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.47

Fixes

Fixing CUDA compilation

Assets 2

Loading

All reactions

TransferBench v1.46

14 Dec 03:54

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.46

Fixes

Fixing GFX_UNROLL set to 13 (past 8) on gfx906 cards

Modifications

GFX_SINGLE_TEAM=1 by default
Adding field showing summation of individual Transfer bandwidths for Executors

Assets 2

Loading

All reactions

TransferBench v1.45

05 Dec 06:41

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.45

Additions

Adding A2A_MODE to a2a preset (0 = copy, 1 = read-only, 2 = write-only)
Adding GFX_UNROLL to modify GFX kernel's unroll factor
Adding GFX_WAVE_ORDER to modify order in which wavefronts process data

Modifications

Rewrote the GFX reduction kernel to support new wave ordering

Assets 2

Loading

All reactions

TransferBench v1.44

01 Dec 21:00

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.44

Additions

Adding rwrite preset to benchmark remote parallel writes
Usage: ./TransferBench rwrite <numBytes=64M> <#CUs=8> <srcGpu=0> <minGpus=1> <maxGpus=3>

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 6 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.