Go to file
Linus Torvalds b30d7a77c5 perf tools changes and fixes for v6.5: 1st batch
Internal cleanup:
 
  - Refactor PMU data management to handle hybrid systems in a generic way.
    Do more work in the lexer so that legacy event types parse more easily.
    A side-effect of this is that if a PMU is specified, scanning sysfs is
    avoided improving start-up time.
 
  - Fix hybrid metrics, for example, the TopdownL1 works for both performance
    and efficiency cores on Intel machines.  To support this, sort and regroup
    events after parsing.
 
  - Add reference count checking for the 'thread' data structure.
 
  - Lots of fixes for memory leaks in various places thanks to the ASAN and
    Ian's refcount checker.
 
  - Reduce the binary size by replacing static variables with local or
    dynamically allocated memory.
 
  - Introduce shared_mutex for annotate data to reduce memory footprint.
 
  - Make filesystem access library functions more thread safe.
 
 Test:
 
  - Organize cpu_map tests into a single suite.
 
  - Add metric value validation test to check if the values are within correct
    value ranges.
 
  - Add perf stat stdio output test to check if event and metric names match.
 
  - Add perf data converter JSON output test.
 
  - Fix a lot of issues reported by shellcheck(1).  This is a preparation to
    enable shellcheck by default.
 
  - Make the large x86 new instructions test optional at build time using
    EXTRA_TESTS=1.
 
  - Add a test for libpfm4 events.
 
 perf script:
 
  - Add 'dsoff' outpuf field to display offset from the DSO.
 
     $ perf script -F comm,pid,event,ip,dsoff
        ls 2695501 cycles:      152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
        ls 2695501 cycles:  ffffffff99045b3e ([kernel.kallsyms])
        ls 2695501 cycles:  ffffffff9968e107 ([kernel.kallsyms])
        ls 2695501 cycles:  ffffffffc1f54afb ([kernel.kallsyms])
        ls 2695501 cycles:  ffffffff9968382f ([kernel.kallsyms])
        ls 2695501 cycles:  ffffffff99e00094 ([kernel.kallsyms])
        ls 2695501 cycles:      152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
        ls 2695501 cycles:  ffffffff992a6db0 ([kernel.kallsyms])
 
  - Adjust width for large PID/TID values.
 
 perf report:
 
  - Robustify reading addr2line output for srcline by checking sentinel output
    before the actual data and by using timeout of 1 second.
 
  - Allow config terms (like 'name=ABC') with breakpoint events.
 
     $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
 
 perf annotate:
 
  - Handle x86 instruction suffix like 'l' in 'movl' generally.
 
  - Parse instruction operands properly even with a whitespace.  This is needed
    for llvm-objdump output.
 
  - Support RISC-V binutils lookup using the triplet prefixes.
 
  - Add '<' and '>' key to navigate to prev/next symbols in TUI.
 
  - Fix instruction association and parsing for LoongArch.
 
 perf stat:
 
  - Add --per-cache aggregation option, optionally specify a cache level
    like `--per-cache=L2`.
 
     $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
       taskset -c 0-15,64-79,128-143,192-207\
       perf bench sched messaging -p -t -l 100000 -g 8
 
       # Running 'sched/messaging' benchmark:
       # 20 sender and receiver threads per group
       # 8 groups == 320 threads run
 
       Total time: 7.648 [sec]
 
       Performance counter stats for 'system wide':
 
       S0-D0-L3-ID0             16         17,145,912      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID8             16         14,977,628      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID16            16            262,539      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID24            16              3,140      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID32            16             27,403      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID40            16             17,026      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID48            16              7,292      ls_dmnd_fills_from_sys.ext_cache_remote
       S0-D0-L3-ID56            16              2,464      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID64            16         22,489,306      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID72            16         21,455,257      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID80            16             11,619      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID88            16             30,978      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID96            16             37,628      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID104           16             13,594      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID112           16             10,164      ls_dmnd_fills_from_sys.ext_cache_remote
       S1-D1-L3-ID120           16             11,259      ls_dmnd_fills_from_sys.ext_cache_remote
 
             7.779171484 seconds time elapsed
 
   - Change default (no event/metric) formatting for default metrics so that
     events are hidden and the metric and group appear.
 
      Performance counter stats for 'ls /':
 
                   1.85 msec task-clock                       #    0.594 CPUs utilized
                      0      context-switches                 #    0.000 /sec
                      0      cpu-migrations                   #    0.000 /sec
                     97      page-faults                      #   52.517 K/sec
              2,187,173      cycles                           #    1.184 GHz
              2,474,459      instructions                     #    1.13  insn per cycle
                531,584      branches                         #  287.805 M/sec
                 13,626      branch-misses                    #    2.56% of all branches
                             TopdownL1                 #     23.5 %  tma_backend_bound
                                                       #     11.5 %  tma_bad_speculation
                                                       #     39.1 %  tma_frontend_bound
                                                       #     25.9 %  tma_retiring
 
  - Allow --cputype option to have any PMU name (not just hybrid).
 
  - Fix output value not to added when it runs multiple times with -r option.
 
 perf list:
 
  - Show metricgroup description from JSON file called metricgroups.json.
 
  - Allow 'pfm' argument to list only libpfm4 events and check each event is
    supported before showing it.
 
 JSON vendor events:
 
  - Avoid event grouping using "NO_GROUP_EVENTS" constraints.  The topdown
    events are correctly grouped even if no group exists.
 
  - Add "Default" metric group to print it in the default output.  And use
    "DefaultMetricgroupName" to indicate the real metric group name.
 
  - Add AmpereOne core PMU events.
 
 Misc:
 
  - Define man page date correctly.
 
  - Track exception level properly on ARM CoreSight ETM.
 
  - Allow anonymous struct, union or enum when retrieving type names from DWARF.
 
  - Fix incorrect filename when calling `perf inject --jit`.
 
  - Handle PLT size correctly on LoongArch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZJxT3gAKCRCMstVUGiXM
 g3//AQDyH3tbAVxU6JkvEOjjDvK7MWeXef7GQh8MP8D9Wkxk1AD9HgyxZWXn+mer
 wxzBMntnxlr9+mkBerrVwUzYMd/IJQk=
 =hPh8
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next

Pull perf tools updates from Namhyung Kim:
 "Internal cleanup:

   - Refactor PMU data management to handle hybrid systems in a generic
     way.

     Do more work in the lexer so that legacy event types parse more
     easily. A side-effect of this is that if a PMU is specified,
     scanning sysfs is avoided improving start-up time.

   - Fix hybrid metrics, for example, the TopdownL1 works for both
     performance and efficiency cores on Intel machines. To support
     this, sort and regroup events after parsing.

   - Add reference count checking for the 'thread' data structure.

   - Lots of fixes for memory leaks in various places thanks to the ASAN
     and Ian's refcount checker.

   - Reduce the binary size by replacing static variables with local or
     dynamically allocated memory.

   - Introduce shared_mutex for annotate data to reduce memory
     footprint.

   - Make filesystem access library functions more thread safe.

  Test:

   - Organize cpu_map tests into a single suite.

   - Add metric value validation test to check if the values are within
     correct value ranges.

   - Add perf stat stdio output test to check if event and metric names
     match.

   - Add perf data converter JSON output test.

   - Fix a lot of issues reported by shellcheck(1). This is a
     preparation to enable shellcheck by default.

   - Make the large x86 new instructions test optional at build time
     using EXTRA_TESTS=1.

   - Add a test for libpfm4 events.

  perf script:

   - Add 'dsoff' outpuf field to display offset from the DSO.

      $ perf script -F comm,pid,event,ip,dsoff
         ls 2695501 cycles:      152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
         ls 2695501 cycles:  ffffffff99045b3e ([kernel.kallsyms])
         ls 2695501 cycles:  ffffffff9968e107 ([kernel.kallsyms])
         ls 2695501 cycles:  ffffffffc1f54afb ([kernel.kallsyms])
         ls 2695501 cycles:  ffffffff9968382f ([kernel.kallsyms])
         ls 2695501 cycles:  ffffffff99e00094 ([kernel.kallsyms])
         ls 2695501 cycles:      152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
         ls 2695501 cycles:  ffffffff992a6db0 ([kernel.kallsyms])

   - Adjust width for large PID/TID values.

  perf report:

   - Robustify reading addr2line output for srcline by checking sentinel
     output before the actual data and by using timeout of 1 second.

   - Allow config terms (like 'name=ABC') with breakpoint events.

      $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1

  perf annotate:

   - Handle x86 instruction suffix like 'l' in 'movl' generally.

   - Parse instruction operands properly even with a whitespace. This is
     needed for llvm-objdump output.

   - Support RISC-V binutils lookup using the triplet prefixes.

   - Add '<' and '>' key to navigate to prev/next symbols in TUI.

   - Fix instruction association and parsing for LoongArch.

  perf stat:

   - Add --per-cache aggregation option, optionally specify a cache
     level like `--per-cache=L2`.

      $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
        taskset -c 0-15,64-79,128-143,192-207\
        perf bench sched messaging -p -t -l 100000 -g 8

        # Running 'sched/messaging' benchmark:
        # 20 sender and receiver threads per group
        # 8 groups == 320 threads run

        Total time: 7.648 [sec]

        Performance counter stats for 'system wide':

        S0-D0-L3-ID0             16         17,145,912      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID8             16         14,977,628      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID16            16            262,539      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID24            16              3,140      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID32            16             27,403      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID40            16             17,026      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID48            16              7,292      ls_dmnd_fills_from_sys.ext_cache_remote
        S0-D0-L3-ID56            16              2,464      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID64            16         22,489,306      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID72            16         21,455,257      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID80            16             11,619      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID88            16             30,978      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID96            16             37,628      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID104           16             13,594      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID112           16             10,164      ls_dmnd_fills_from_sys.ext_cache_remote
        S1-D1-L3-ID120           16             11,259      ls_dmnd_fills_from_sys.ext_cache_remote

              7.779171484 seconds time elapsed

   - Change default (no event/metric) formatting for default metrics so
     that events are hidden and the metric and group appear.

       Performance counter stats for 'ls /':

                    1.85 msec task-clock                       #    0.594 CPUs utilized
                       0      context-switches                 #    0.000 /sec
                       0      cpu-migrations                   #    0.000 /sec
                      97      page-faults                      #   52.517 K/sec
               2,187,173      cycles                           #    1.184 GHz
               2,474,459      instructions                     #    1.13  insn per cycle
                 531,584      branches                         #  287.805 M/sec
                  13,626      branch-misses                    #    2.56% of all branches
                              TopdownL1                 #     23.5 %  tma_backend_bound
                                                        #     11.5 %  tma_bad_speculation
                                                        #     39.1 %  tma_frontend_bound
                                                        #     25.9 %  tma_retiring

   - Allow --cputype option to have any PMU name (not just hybrid).

   - Fix output value not to added when it runs multiple times with -r
     option.

  perf list:

   - Show metricgroup description from JSON file called
     metricgroups.json.

   - Allow 'pfm' argument to list only libpfm4 events and check each
     event is supported before showing it.

  JSON vendor events:

   - Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
     topdown events are correctly grouped even if no group exists.

   - Add "Default" metric group to print it in the default output. And
     use "DefaultMetricgroupName" to indicate the real metric group
     name.

   - Add AmpereOne core PMU events.

  Misc:

   - Define man page date correctly.

   - Track exception level properly on ARM CoreSight ETM.

   - Allow anonymous struct, union or enum when retrieving type names
     from DWARF.

   - Fix incorrect filename when calling `perf inject --jit`.

   - Handle PLT size correctly on LoongArch"

* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
  perf test: Skip metrics w/o event name in stat STD output linter
  perf test: Reorder event name checks in stat STD output linter
  perf pmu: Remove a hard coded cpu PMU assumption
  perf pmus: Add notion of default PMU for JSON events
  perf unwind: Fix map reference counts
  perf test: Set PERF_EXEC_PATH for script execution
  perf script: Initialize buffer for regs_map()
  perf tests: Fix test_arm_callgraph_fp variable expansion
  perf symbol: Add LoongArch case in get_plt_sizes()
  perf test: Remove x permission from lib/stat_output.sh
  perf test: Rerun failed metrics with longer workload
  perf test: Add skip list for metrics known would fail
  perf test: Add metric value validation test
  perf jit: Fix incorrect file name in DWARF line table
  perf annotate: Fix instruction association and parsing for LoongArch
  perf annotation: Switch lock from a mutex to a sharded_mutex
  perf sharded_mutex: Introduce sharded_mutex
  tools: Fix incorrect calculation of object size by sizeof
  perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
  perf parse-events: Remove unneeded semicolon
  ...
2023-06-30 11:35:41 -07:00
arch Tracing updates for 6.5: 2023-06-30 10:33:17 -07:00
block - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
certs KEYS: Add missing function documentation 2023-04-24 16:15:52 +03:00
crypto sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
Documentation Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
drivers RISC-V Patches for the 6.5 Merge Window, Part 1 2023-06-30 09:37:26 -07:00
fs \n 2023-06-29 13:39:51 -07:00
include Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
init - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in 2023-06-28 10:59:38 -07:00
io_uring \n 2023-06-29 13:31:44 -07:00
ipc Merge branch 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2023-02-24 19:20:07 -08:00
kernel Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
lib Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm memblock: small updates for 6.5-rc1 2023-06-29 23:21:20 -07:00
net Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
rust rust: error: impl Debug for Error with errname() integration 2023-06-13 01:24:42 +02:00
samples Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
scripts powerpc updates for 6.5 2023-06-30 09:20:08 -07:00
security powerpc updates for 6.5 2023-06-30 09:20:08 -07:00
sound ARM: SoC changes for 6.5 2023-06-29 15:28:33 -07:00
tools perf tools changes and fixes for v6.5: 1st batch 2023-06-30 11:35:41 -07:00
usr initramfs: Check negative timestamp to prevent broken cpio archive 2023-04-16 17:37:01 +09:00
virt - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore linux-kselftest-kunit-6.4-rc1 2023-04-24 12:31:32 -07:00
.mailmap drm changes for 6.5-rc1: 2023-06-29 11:00:17 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS - Address -Wmissing-prototype warnings 2023-06-26 16:43:54 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS RISC-V Patches for the 6.5 Merge Window, Part 1 2023-06-30 09:37:26 -07:00
Makefile hardening updates for v6.5-rc1 2023-06-27 21:24:18 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.