Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "perf record" and "perf report" to WA #971

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

markbellamy
Copy link

No description provided.

Copy link
Contributor

@setrofim setrofim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class-level description should also be updated to describe the difference between the two modes and when one should be used vs the other.

@@ -77,6 +77,17 @@ class PerfInstrument(Instrument):
description="""Provides labels for pref output. If specified, the number of
labels must match the number of ``optionstring``\ s.
"""),
Parameter('perf_mode', kind=str, default='stat', allowed_values=['stat', 'record'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter can be renamed to just "mode", given that it is for perf instrument, calling it perf_mode is redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter can be renamed to just "mode"

... or (sub)command, given that it's how perf itself calls these?

$ perf help

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   [...]

 See 'perf help COMMAND' for more information on a specific command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

"""),
Parameter('report_optionstring', kind=list_or_string, default='',
global_alias='report_perf_options',
description="""Specifies options to be used for the 'perf report' command used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format consistently with the other parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

count = temp.strip('%')
context.add_metric(metric, count, '%')

def update_output(self, context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should go above internal methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

context.add_metric(metric, count, '%')

def update_output(self, context):
self.logger.info('Extracting reports from target...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the logging inside the _stat and _output methods, this is redundant. Please keep info-level logging to a minimum. Should be converted to debug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

@@ -134,5 +146,38 @@ def update_output(self, context):
metric = '{}_{}'.format(label, match.group(3))
context.add_metric(metric, count, classifiers=classifiers)

def _update_output_record(self, context):
outdir = os.path.join(context.output_directory, 'perf')
self.collector.get_trace(outdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the above two lines happen in both _stat and record cases, they should be moved out into update_output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.


self.logger.info('Processing perf report reports.')
for host_file in os.listdir(outdir):
label = host_file.split('.rpt')[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use os.path.splitext instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

event_type = words[5]
event_type = event_type.strip("'")
if words[0] != '#' and '%' in words[0] and len(words) == 5:
metric = 'perf/{}/{}/{}'.format(event_type, words[1], words[2].strip("[]"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using classifiers rather than packing everything into the metric name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the classifiers but also left it packed in the metric name. It seems that some of the output processors, e.g. csv, default to classifier output off and that will result to a lot of meaningless/duplicate entries in the results file by default options. Which doesn't seem likely friendly behavior.

I am very open to peoples thoughts on what is the right behavior here thou.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought today that change the metric to / which should be unique key and most critical info. Then use classifier for the other entries.

if words[0] != '#' and '%' in words[0] and len(words) == 5:
metric = 'perf/{}/{}/{}'.format(event_type, words[1], words[2].strip("[]"))
temp = words[0]
count = temp.strip('%')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp not necessary here. Also, a metric value should be a number, not string.

count = numeric(words[0].strip('%'))

(note: numeric must be imported from wa.utils.types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version. (I initially thought about doing this but expected it to be written out as a string again so didn't seem necessary.)

self.collector.get_trace(outdir)

self.logger.info('Processing perf report reports.')
for host_file in os.listdir(outdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently attempting to process every file in the perf directory, since we are now pulling the raw trace data to this directory this fails trying to parse binary data. Should this be updated to only process .rpt files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

Copy link
Contributor

@ptosi ptosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the corresponding devlib PR (ARM-software/devlib#382):

this "review" is posted as comments and should only be seen as so.

These include comments about details but also general comments about the architecture of this instrument and the way its eventual growth seems to be shaped. As an end user, I would personally prefer the perf instrument to be as generic and flexible as possible (in terms of running perf commands, not only stat or record nor having to manually patch/inject redundant code for each subcommand).

However, I believe that this should not influence this PR. Instead, it could be merged for now and another (heavier) PR could be open later on to refactor the perf instrument. Therefore, I leave it to the author of the PR and the maintainers to decide which of my comments add value to this PR and which ones may be ignored.

@@ -77,6 +77,17 @@ class PerfInstrument(Instrument):
description="""Provides labels for pref output. If specified, the number of
labels must match the number of ``optionstring``\ s.
"""),
Parameter('perf_mode', kind=str, default='stat', allowed_values=['stat', 'record'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter can be renamed to just "mode"

... or (sub)command, given that it's how perf itself calls these?

$ perf help

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   [...]

 See 'perf help COMMAND' for more information on a specific command.

label = host_file.split('.rpt')[0]
host_file_path = os.path.join(outdir, host_file)
context.add_artifact(label, host_file_path, 'raw')
with open(host_file_path) as fh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parser is extremely specific. I understand we don't want to spend the effort writing a perfectly generic parser for perf report that works in all cases but this implementation is definitely too narrow. As an example, running on my host machine:

  • Following sudo perf record -a sleep 1:

    $ sudo perf report 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  Command          Shared Object               Symbol                                           
    # ........  ...............  ..........................  ............................
    #
        41.86%  swapper          [kernel.kallsyms]           [k] intel_idle
         2.06%  perf             [kernel.kallsyms]           [k] smp_call_function_single
         0.92%  Chrome_ChildIOT  libpthread-2.27.so          [.] __pthread_mutex_lock
         0.69%  swapper          [kernel.kallsyms]           [k] calc_global_load
         0.52%  swapper          [kernel.kallsyms]           [k] ioread32
         0.49%  CompositorTileW  libc-2.27.so                [.] __memset_avx2_erms
         0.41%  IPDL Background  [kernel.kallsyms]           [k] cpuacct_charge
         0.41%  IPDL Background  [vdso]                      [.] __vdso_clock_gettime
         0.41%  IPDL Background  libxul.so                   [.] 0x0000000000879a17
    
    $ sudo perf report --sort=dso 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  Shared Object
    # ........  ..........................
    #
        67.43%  [kernel.kallsyms]
        10.79%  libxul.so
        10.40%  slack
         2.29%  libpthread-2.27.so
         1.67%  libc-2.27.so
         1.47%  firefox
         1.15%  libglib-2.0.so.0.5600.3
         0.84%  nouveau_dri.so
         0.60%  [vdso]
    
    $ sudo perf report --sort=cpu 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  CPU
    # ........  ...
    #
        15.64%  006
         9.37%  003
         8.75%  011
         8.75%  000
         7.14%  009
         7.07%  007
         5.44%  004
         5.29%  015
         5.02%  012
    
  • Following sudo perf record -g sleep 1:

    $ sudo perf report -g 2>/dev/null | head -n 30
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 14  of event 'cycles:ppp'
    # Event count (approx.): 1311183
    #
    # Children      Self  Command  Shared Object      Symbol
    # ........  ........  .......  .................  ..................................
    #
        32.80%     0.00%  sleep    [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
                |
                ---entry_SYSCALL_64_after_hwframe
                   do_syscall_64
                   |
                   |--22.57%--sys_exit_group
                   |          do_group_exit
                   |          do_exit
                   |          mmput
                   |          exit_mmap
                   |          tlb_finish_mmu
                   |          arch_tlb_finish_mmu
                   |          tlb_flush_mmu_free
                   |          free_pages_and_swap_cache
                   |          release_pages
                   |
                    --10.23%--sys_execve
                              do_execveat_common.isra.34
                              search_binary_handler
    

Obviously, I don't expect this parser to handle all outputs of perf report (e.g. --call-graph, --children, --header, ...) but handling the extremely useful --sort would be a nice start; I believe that parsing the table(s) of samples into pandas.DataFrame(s) would make the following logic easier to work with. I actually have implemented such a parser (currently internal) which I could integrate here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a first implementation to have it parse the basic output. As you say handling all the output options is difficult. If I update this to handle new report formats I'll likely see about committing them but I am not planning to write a more generic/complete one. If you have one that could be integrate here that would be very good.

# Extract data for 'perf record'
elif self.perf_mode == 'record':
self._update_output_record(context)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# else: fail silently

You could use a dictionary of functions (method-local dictionary of instance methods, class-attribute dictionary of class methods, ... your call), accessed with self.perf_mode as a key: if it's not present, a KeyError (IIRC) is raised. Furthermore, update_output becomes "subcommand-agnostic". This would even (depending on the implementation) allow assert-based unit testing by comparing the allowed_values of perf_mode and the keys() of this dictionary, allowing to programmatically express the link between the supported perf subcommands and the need for an implemented _update_output_<...> method, currently implicit. Finally, it's more pythonic than this switch-like use of if-elif-elif-... .

(I know that self.perf_mode has been sanitized but yet, I believe there's value in this change).

@markbellamy
Copy link
Author

As an end user, I would personally prefer the perf instrument to be as generic and flexible as possible (in terms of running perf commands, not only stat or record nor having to manually patch/inject redundant code for each subcommand).

I agree, I was surprised that perf support was limited to just perf stat and I think a much more generic implementation is the right direction. This implementation is just a "quick" one to add 'perf record' support to the current implementation. as I wanted to use it. It seem the right thing to see about committing it back as might be useful to others as stop gap until wider perf support is added.

@ptosi ptosi mentioned this pull request May 15, 2019
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants