-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for "perf record" and "perf report" to WA #971
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class-level description
should also be updated to describe the difference between the two modes and when one should be used vs the other.
wa/instruments/perf.py
Outdated
@@ -77,6 +77,17 @@ class PerfInstrument(Instrument): | |||
description="""Provides labels for pref output. If specified, the number of | |||
labels must match the number of ``optionstring``\ s. | |||
"""), | |||
Parameter('perf_mode', kind=str, default='stat', allowed_values=['stat', 'record'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter can be renamed to just "mode"
, given that it is for perf
instrument, calling it perf_mode
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter can be renamed to just
"mode"
... or (sub
)command
, given that it's how perf
itself calls these?
$ perf help
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
[...]
See 'perf help COMMAND' for more information on a specific command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
"""), | ||
Parameter('report_optionstring', kind=list_or_string, default='', | ||
global_alias='report_perf_options', | ||
description="""Specifies options to be used for the 'perf report' command used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format consistently with the other parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
count = temp.strip('%') | ||
context.add_metric(metric, count, '%') | ||
|
||
def update_output(self, context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should go above internal methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
context.add_metric(metric, count, '%') | ||
|
||
def update_output(self, context): | ||
self.logger.info('Extracting reports from target...') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the logging inside the _stat
and _output
methods, this is redundant. Please keep info-level logging to a minimum. Should be converted to debug
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
@@ -134,5 +146,38 @@ def update_output(self, context): | |||
metric = '{}_{}'.format(label, match.group(3)) | |||
context.add_metric(metric, count, classifiers=classifiers) | |||
|
|||
def _update_output_record(self, context): | |||
outdir = os.path.join(context.output_directory, 'perf') | |||
self.collector.get_trace(outdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the above two lines happen in both _stat
and record
cases, they should be moved out into update_output
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
|
||
self.logger.info('Processing perf report reports.') | ||
for host_file in os.listdir(outdir): | ||
label = host_file.split('.rpt')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use os.path.splitext
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
wa/instruments/perf.py
Outdated
event_type = words[5] | ||
event_type = event_type.strip("'") | ||
if words[0] != '#' and '%' in words[0] and len(words) == 5: | ||
metric = 'perf/{}/{}/{}'.format(event_type, words[1], words[2].strip("[]")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using classifiers rather than packing everything into the metric name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the classifiers but also left it packed in the metric name. It seems that some of the output processors, e.g. csv, default to classifier output off and that will result to a lot of meaningless/duplicate entries in the results file by default options. Which doesn't seem likely friendly behavior.
I am very open to peoples thoughts on what is the right behavior here thou.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a thought today that change the metric to / which should be unique key and most critical info. Then use classifier for the other entries.
wa/instruments/perf.py
Outdated
if words[0] != '#' and '%' in words[0] and len(words) == 5: | ||
metric = 'perf/{}/{}/{}'.format(event_type, words[1], words[2].strip("[]")) | ||
temp = words[0] | ||
count = temp.strip('%') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp
not necessary here. Also, a metric value should be a number, not string.
count = numeric(words[0].strip('%'))
(note: numeric
must be imported from wa.utils.types
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version. (I initially thought about doing this but expected it to be written out as a string again so didn't seem necessary.)
self.collector.get_trace(outdir) | ||
|
||
self.logger.info('Processing perf report reports.') | ||
for host_file in os.listdir(outdir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently attempting to process every file in the perf directory, since we are now pulling the raw trace data to this directory this fails trying to parse binary data. Should this be updated to only process .rpt
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed in latest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in the corresponding devlib
PR (ARM-software/devlib#382):
this "review" is posted as comments and should only be seen as so.
These include comments about details but also general comments about the architecture of this instrument and the way its eventual growth seems to be shaped. As an end user, I would personally prefer the perf instrument to be as generic and flexible as possible (in terms of running perf commands, not only stat or record nor having to manually patch/inject redundant code for each subcommand).
However, I believe that this should not influence this PR. Instead, it could be merged for now and another (heavier) PR could be open later on to refactor the perf instrument. Therefore, I leave it to the author of the PR and the maintainers to decide which of my comments add value to this PR and which ones may be ignored.
wa/instruments/perf.py
Outdated
@@ -77,6 +77,17 @@ class PerfInstrument(Instrument): | |||
description="""Provides labels for pref output. If specified, the number of | |||
labels must match the number of ``optionstring``\ s. | |||
"""), | |||
Parameter('perf_mode', kind=str, default='stat', allowed_values=['stat', 'record'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter can be renamed to just
"mode"
... or (sub
)command
, given that it's how perf
itself calls these?
$ perf help
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
[...]
See 'perf help COMMAND' for more information on a specific command.
wa/instruments/perf.py
Outdated
label = host_file.split('.rpt')[0] | ||
host_file_path = os.path.join(outdir, host_file) | ||
context.add_artifact(label, host_file_path, 'raw') | ||
with open(host_file_path) as fh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parser is extremely specific. I understand we don't want to spend the effort writing a perfectly generic parser for perf report
that works in all cases but this implementation is definitely too narrow. As an example, running on my host machine:
-
Following
sudo perf record -a sleep 1
:$ sudo perf report 2>/dev/null | head -n 20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:ppp' # Event count (approx.): 218123588 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ............................ # 41.86% swapper [kernel.kallsyms] [k] intel_idle 2.06% perf [kernel.kallsyms] [k] smp_call_function_single 0.92% Chrome_ChildIOT libpthread-2.27.so [.] __pthread_mutex_lock 0.69% swapper [kernel.kallsyms] [k] calc_global_load 0.52% swapper [kernel.kallsyms] [k] ioread32 0.49% CompositorTileW libc-2.27.so [.] __memset_avx2_erms 0.41% IPDL Background [kernel.kallsyms] [k] cpuacct_charge 0.41% IPDL Background [vdso] [.] __vdso_clock_gettime 0.41% IPDL Background libxul.so [.] 0x0000000000879a17
$ sudo perf report --sort=dso 2>/dev/null | head -n 20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:ppp' # Event count (approx.): 218123588 # # Overhead Shared Object # ........ .......................... # 67.43% [kernel.kallsyms] 10.79% libxul.so 10.40% slack 2.29% libpthread-2.27.so 1.67% libc-2.27.so 1.47% firefox 1.15% libglib-2.0.so.0.5600.3 0.84% nouveau_dri.so 0.60% [vdso]
$ sudo perf report --sort=cpu 2>/dev/null | head -n 20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:ppp' # Event count (approx.): 218123588 # # Overhead CPU # ........ ... # 15.64% 006 9.37% 003 8.75% 011 8.75% 000 7.14% 009 7.07% 007 5.44% 004 5.29% 015 5.02% 012
-
Following
sudo perf record -g sleep 1
:$ sudo perf report -g 2>/dev/null | head -n 30 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 14 of event 'cycles:ppp' # Event count (approx.): 1311183 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. .................................. # 32.80% 0.00% sleep [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe | ---entry_SYSCALL_64_after_hwframe do_syscall_64 | |--22.57%--sys_exit_group | do_group_exit | do_exit | mmput | exit_mmap | tlb_finish_mmu | arch_tlb_finish_mmu | tlb_flush_mmu_free | free_pages_and_swap_cache | release_pages | --10.23%--sys_execve do_execveat_common.isra.34 search_binary_handler
Obviously, I don't expect this parser to handle all outputs of perf report
(e.g. --call-graph
, --children
, --header
, ...) but handling the extremely useful --sort
would be a nice start; I believe that parsing the table(s) of samples into pandas.DataFrame
(s) would make the following logic easier to work with. I actually have implemented such a parser (currently internal) which I could integrate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a first implementation to have it parse the basic output. As you say handling all the output options is difficult. If I update this to handle new report formats I'll likely see about committing them but I am not planning to write a more generic/complete one. If you have one that could be integrate here that would be very good.
# Extract data for 'perf record' | ||
elif self.perf_mode == 'record': | ||
self._update_output_record(context) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# else: fail silently |
You could use a dictionary of functions (method-local dictionary of instance methods, class-attribute dictionary of class methods, ... your call), accessed with self.perf_mode
as a key: if it's not present, a KeyError
(IIRC) is raised. Furthermore, update_output
becomes "subcommand-agnostic". This would even (depending on the implementation) allow assert
-based unit testing by comparing the allowed_values
of perf_mode
and the keys()
of this dictionary, allowing to programmatically express the link between the supported perf
subcommands and the need for an implemented _update_output_<...>
method, currently implicit. Finally, it's more pythonic than this switch
-like use of if
-elif
-elif
-... .
(I know that self.perf_mode
has been sanitized but yet, I believe there's value in this change).
I agree, I was surprised that perf support was limited to just |
No description provided.