linux/tools
Namhyung Kim 784e8adda4 perf sort: Fix the 'weight' sort key behavior
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses.  And perf tool has 'weight'
and 'local_weight' sort keys to display the info.

But it's somewhat confusing what it shows exactly.  In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.

For example:

  $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M

  $ perf report --stdio -n -s +local_weight
  ...
  #
  # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
  # ........  .......  .......  ................  .........................  ............
  #
      21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
      12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
      11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
      10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
       7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
       6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
       6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
  ...

So let's look at the 'lockref_get_not_zero' symbols.  The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016.  But it's not the case:

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  ......
  #
       1.36%        4  dd       [kernel.vmlinux]  36            144
       0.47%        4  dd       [kernel.vmlinux]  37            148
       0.42%        4  dd       [kernel.vmlinux]  32            128
       0.40%        4  dd       [kernel.vmlinux]  34            136
       0.35%        4  dd       [kernel.vmlinux]  36            144
       0.34%        4  dd       [kernel.vmlinux]  35            140
       0.30%        4  dd       [kernel.vmlinux]  36            144
       0.30%        4  dd       [kernel.vmlinux]  34            136
       0.30%        4  dd       [kernel.vmlinux]  32            128
       0.30%        4  dd       [kernel.vmlinux]  32            128
  ...

With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
what we want.

I found this because of the way it aggregates the 'weight' value.  Since
it's not a period, we should not add them in the he->stat.  Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.

After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.

Let's keep the weight and display it differently.  For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.

With this change, I can see the expected numbers.

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  .....
  #
      21.23%      313  dd       [kernel.vmlinux]  32            10016
      12.43%      183  dd       [kernel.vmlinux]  35            6405
      11.97%      159  dd       [kernel.vmlinux]  36            5724
       7.63%      113  dd       [kernel.vmlinux]  33            3729
       6.37%       92  dd       [kernel.vmlinux]  34            3128
       4.17%       59  dd       [kernel.vmlinux]  37            2183
       0.08%        1  dd       [kernel.vmlinux]  269           269
       0.08%        1  dd       [kernel.vmlinux]  38            38

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
..
accounting
arch tools headers cpufeatures: Sync with the kernel sources 2021-11-18 10:08:06 -03:00
bootconfig bootconfig: Cleanup dummy headers in tools/bootconfig 2021-10-10 22:16:02 -04:00
bpf bpftool: Install libbpf headers for the bootstrap version, too 2021-11-05 16:19:48 +01:00
build tools: Bump minimum LLVM C++ std to GNU++14 2021-11-04 09:31:30 -03:00
cgroup
counter tools/counter: Create Counter tools 2021-10-17 10:54:16 +01:00
debugging
edid
firewire
firmware
gpio
hv
iio
include tools headers UAPI: Sync linux/kvm.h with the kernel sources 2021-11-18 10:08:07 -03:00
io_uring tools/io_uring/io_uring-cp: sync with liburing example 2021-08-13 08:58:11 -06:00
kvm/kvm_stat KVM: kvm_stat: do not show halt_wait_ns 2021-10-18 14:07:18 -04:00
laptop
leds
lib tools/lib/lockdep: drop liblockdep 2021-11-12 11:07:17 -08:00
memory-model
objtool static_call,x86: Robustify trampoline patching 2021-11-11 13:09:31 +01:00
pci tools: PCI: Zero-initialize param 2021-08-05 11:01:30 +01:00
pcmcia
perf perf sort: Fix the 'weight' sort key behavior 2021-11-18 10:08:07 -03:00
power
rcu tools/rcu: Add an extract-stall script 2021-09-16 10:31:26 -07:00
scripts tools, build: Add RISC-V to HOSTARCH parsing 2021-11-01 17:08:21 +01:00
spi
testing New x86 features: 2021-11-13 10:01:10 -08:00
thermal/tmon tools/thermal/tmon: Add cross compiling support 2021-08-14 15:33:19 +02:00
time
tracing tools/latency-collector: Use correct size when writing queue_full_warning 2021-10-25 22:27:19 -04:00
usb usb: testusb: Fix for showing the connection speed 2021-09-14 10:31:41 +02:00
virtio tools/virtio: fix build 2021-08-11 06:44:24 -04:00
vm tools/vm/page-types.c: print file offset in hexadecimal 2021-11-06 13:30:40 -07:00
wmi
Makefile tools/lib/lockdep: drop liblockdep 2021-11-12 11:07:17 -08:00