Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes various issues in the tuneing scripts #2079

Closed
wants to merge 5 commits into from

Conversation

IMbackK
Copy link

@IMbackK IMbackK commented Jan 8, 2025

672afc6 fixes #2077 issue 1

8829b02 fixes #2077 Issue 4

39451b8 updates the tuning script to align with rocblas change 7972a13cb93611d12c47d72e3bf15acd8ca4b1ee

ff26d8a partially fixes #2077 Issue 2, but many bugs with the rocblas log to tensile benchmark config conversion remains, not limited to:

  1. Only the first data type configuration set in the rocblas log file is taken as the configuration for all the benchmark problems, however the log may contain different configurations
    1. The different configurations should be forked instead of being only in the common benchmark config paramters, but they are not
  2. Configurations where the input and output dtypes are not the same are borken

5922698 updates the parameters of TensileCreateLibrary to work with the current version and fixes the TODO in provision_verification that previously hardcoded python 3.6, which is ofc not useful in 2025

Overall the tuning scripts are still in an extremely sorry state, there are many pitfalls in these scripts such as:

  1. various Chips are hard coded in various places, but it only goes up to Arcturus, leaving Aldebaran and up in the cold, never mind gfx10+
  2. Very little error handling while with many faulty assumptions about system environment
  3. various flags like --redo do not work due to faulty assumptions about the behavior of a posix shell
  4. many many more

@IMbackK
Copy link
Author

IMbackK commented Jan 8, 2025

Please add a run of master_tuning_script.sh to your ci to prevent the tuning scripts from falling into this degree of disrepair again.

Please also consider releasing a full set of benchmark problem files for all the logic files in rocBlas with a script to retune every solution contained in rocblas. Ie end to end documentation and scripts on how to regenerate all logic files in rocblas from https://github.com/ROCm/Tensile/tree/develop/Tensile/Configs

@babakpst
Copy link
Collaborator

babakpst commented Jan 9, 2025

@IMbackK Thanks for your comments and the PR. We no longer use these scripts in our tuning workflow, and, as a result, we do not update them. I will clean the repo soon. I appreciate your time and consideration.

@babakpst babakpst closed this Jan 9, 2025
@IMbackK
Copy link
Author

IMbackK commented Jan 9, 2025

Well, will there be a replacement for application specific tuning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue]: Multiple issues in training scripts
2 participants