-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Problems with "agent profiling" mode in Rocprofiler-SDK #15
Comments
I suspect A/B may be related. Can you post the code where you call rocprofiler_configure_agent_profile_counting_service? C has an internal patch that resolves this issue that should be published shortly. D has a patch in the works that should be available soon. |
In addition to the problems discussed above, I'm now getting a segfault inside rocprof-sdk code. To reproduce the segfault please do the following:
Here is the backtrace from my runs: |
* support avail tool Updating avail library and script Listing on Std output incase the output folder is not given Extending list metrics test misc fix misc fix fixing memory leak changing list-metrics to list-avail fixing formatting issue Fixing CMakeLists Add test for list avil with trace Fix test fail clang tidy errors fixed Removing build commands for rocprofv3-trigger-list Addressing review changes addressing review comment moving avail to libexec merge fix Fix test failures updating doc Fix doc error * updating legacy doc * fix formatting issue * Addressing review comments
Hi @adanalis, I gave the code a try and was able to run the highlighted example successfully on ROCm 6.3 with a few changes. Firstly, we transitioned from agent_profile to agent_profile_counting_service with a change to the docs here: 4204042. You can run the command:
Additionally, there was a change to rocprofiler_sample_device_counting_service to allow returning data as a part of the API call: Change to API. You need to change line 56 in sdk_class.cpp to
and the read_sample() function to:
I also had to set I can push the code to your branch if you'd like. Please give that a try and let me know if you run into any issues, thanks! |
Thanks for your comments. I have since updated the code significantly and have incorporated the API changes you mentioned. You can find the latest version at my fork of PAPI under the branch 2024.06.rocprof_sdk: |
Awesome! Are you running into any more issues? |
The AMD internal repo has fixed most issues. However, the code released in 6.3.1 still has problems in device profiling mode. The only remaining issue that we are aware of is a core dump caused by DumpStackTraceAndExit() when the program exits abnormally. |
Hi @adanalis, What problems are you facing with device profiling mode? Could you provide the code that you're running into issues with? Thanks! |
The code can be accessed here: icl-utk-edu/papi#249 Setting the env variable RPSDK_MODE_AGENT_PROFILE=1 and running any of the tests under src/components/rocp_sdk/tests will result in zero values when using rocm-6.3.1, but correct values when using the container with the latest internal code. The culprit is the following call:
Specifically, when I enable debugging output, I see that with 6.3.1 I get results such as: versus the same code built with the container prints: |
Interesting, are you building the latest code directly from the amd-mainline branch? Perhaps there was a patch that hasn't been cherry picked into release yet, let me check with the team. |
I have a container that Benjamin Welton built for us from the AMD repo. |
Hi @adanalis, The fix for the above issue should be in the next minor release, ROCm 6.3.2. Please let me know if there are any other issues I can help with, thanks! |
Problem Description
A) I only get non-zero values for the first event that I have added to
the profile.
B) I start two agents for two distinct GPUs, I submit my kernel on
only one GPU, but I get the same measurements from both agents.
C) When I get the measurements I have no way of distinguishing which
measurement came from which agent.
D) When using watermark equal to zero, the buffer callback is triggered as soon as there is one entry in the buffer, but before all the entries have been in the buffer. As a result we see the entries "out of order." We would like the data to be accessible synchronously when we get a sample without having to go through buffers.
Operating System
Rocky Linux 9.4 (Blue Onyx)
CPU
AMD EPYC 7413 24-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.2.0
ROCm Component
rocprofiler
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: