linux/arch/x86/events
Breno Leitao 599522d9d2 perf/x86/amd: Do not WARN() on every IRQ
Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
handler, as shown below, several times while perf runs. A simple
`perf top` run is enough to render the system unusable:

  WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0

This happens because the Performance Counter Global Status Register
(PerfCntGlobalStatus) has one or more bits set which are considered
reserved according to the "AMD64 Architecture Programmer’s Manual,
Volume 2: System Programming, 24593":

  https://www.amd.com/system/files/TechDocs/24593.pdf

To make this less intrusive, warn just once if any reserved bit is set
and prompt the user to update the microcode. Also sanitize the value to
what the code is handling, so that the overflow events continue to be
handled for the number of counters that are known to be sane.

Going forward, the following microcode patch levels are recommended
for Zen 4 processors in order to avoid such issues with reserved bits:

  Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
  Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
  Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
  Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212

Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
the linux-firmware tree has binaries that meet the minimum required
patch levels.

  [ sandipan: - add message to prompt users to update microcode
              - rework commit message and call out required microcode levels ]

Fixes: 7685665c39 ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/3540f985652f41041e54ee82aa53e7dbd55739ae.1694696888.git.sandipan.das@amd.com/
2023-09-25 11:30:31 +02:00
..
amd perf/x86/amd: Do not WARN() on every IRQ 2023-09-25 11:30:31 +02:00
intel perf/x86/uncore: Correct the number of CHAs on EMR 2023-09-05 21:50:21 +02:00
zhaoxin x86/perf/zhaoxin: Add stepping check for ZXC 2023-02-11 11:18:12 +01:00
core.c perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability 2023-07-26 12:28:46 +02:00
Kconfig perf/x86/Kconfig: Fix indentation in the Kconfig file 2022-05-25 15:54:26 +02:00
Makefile perf/x86: Move branch classifier 2022-08-27 00:05:44 +02:00
msr.c x86/cpu: Fix Gracemont uarch 2023-08-09 21:51:06 +02:00
perf_event_flags.h x86/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH 2022-09-07 21:54:01 +02:00
perf_event.h perf/x86/intel: Add Crestmont PMU 2023-08-09 21:51:07 +02:00
probe.c perf/x86/rapl: Add msr mask support 2021-02-10 14:44:54 +01:00
probe.h perf/x86/rapl: Add msr mask support 2021-02-10 14:44:54 +01:00
rapl.c x86/cpu: Fix Gracemont uarch 2023-08-09 21:51:06 +02:00
utils.c perf/x86/utils: Fix uninitialized var in get_branch_type() 2022-09-29 12:20:56 +02:00