Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CPU/GPU GSTATS regions consistent #59

Conversation

samhatfield
Copy link
Collaborator

The GPU porting of ecTrans introduced some new GSTATS regions. The following are now reported in the GSTATS summary of NODE.001_01 in the IFS when using ecTrans GPU mode:

410, 411, 412, 413, 430, 431, 440, 441

These are currently unlabelled so show up as rather anonymous in the profile:

 410                                               725        13.259        37.374      1.49      0.96
 411                                               361        96.194       128.168      2.55      0.64
 412                                               725        67.824       116.401      4.65      1.94
 413                                               361        40.342        82.748      1.65      0.84
 430                                               722         2.637         3.603      0.14      0.04
 431                                              1448         2.009         2.865      0.23      0.07
 440                                               361       190.696       236.205      4.70      0.91
 441                                               725       135.035       196.480      7.85      2.45

So the question is, which of these would we like to keep (probably some were simply for debugging), and of those which should be copied also the CPU version.

I would argue that 440 (DIR_TRANS) and 441 (INV_TRANS) are very useful and should be kept - see first commit. The others are not so important so could probably be deleted.

I also reordered some GSTATS close calls so they occur within the parent
Dr Hook region ({[]} is better than {[}]...).
@samhatfield samhatfield marked this pull request as draft January 30, 2024 17:51
@samhatfield
Copy link
Collaborator Author

Once this is concluded I will submit a PR so the new labels are reflected in the DestinE ifs-source develop branch.

@samhatfield
Copy link
Collaborator Author

Here are the meanings of all anonymous regions:
410 - ALLTOALLV of TRMTOL plus the preceding DATA declaration. Could be deleted as it doesn't really offer any new information on top of the existing 807 region.
411 - This one does just measure the ALLTOALLV. But again, the existing 806 region already captures this. Could be deleted.
412 - TRLTOG - doesn't really give any more information than existing 805 region.
413 - TRGTOL - again, 803/804 basically capture this region already.
430 - This region is declared twice for different purposes, in FTDIR_CTL and DIR_TRANS_CTL and measures different things. Let's just delete this.
431 - This seems to be targeting the host->device data transfers in LTINV. But it also captures the end of a DATA region in TRLTOG for some reason. Delete...

@samhatfield
Copy link
Collaborator Author

I will close this for now as redgreengpu is no longer the preferred target branch for GPU-related improvements.

@samhatfield samhatfield closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant