fc02cb2b37
7309 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
fc02cb2b37 |
Core:
- Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD +qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld 6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde 0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr 84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE= =srde -----END PGP SIGNATURE----- Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ... |
||
Jakub Kicinski
|
b7b98f8689 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-11-01 We've added 181 non-merge commits during the last 28 day(s) which contain a total of 280 files changed, 11791 insertions(+), 5879 deletions(-). The main changes are: 1) Fix bpf verifier propagation of 64-bit bounds, from Alexei. 2) Parallelize bpf test_progs, from Yucong and Andrii. 3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus. 4) Improve bpf selftests on s390, from Ilya. 5) bloomfilter bpf map type, from Joanne. 6) Big improvements to JIT tests especially on Mips, from Johan. 7) Support kernel module function calls from bpf, from Kumar. 8) Support typeless and weak ksym in light skeleton, from Kumar. 9) Disallow unprivileged bpf by default, from Pawan. 10) BTF_KIND_DECL_TAG support, from Yonghong. 11) Various bpftool cleanups, from Quentin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits) libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose bpf: Factor out helpers for ctx access checking bpf: Factor out a helper to prepare trampoline for struct_ops prog selftests, bpf: Fix broken riscv build riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h tools, build: Add RISC-V to HOSTARCH parsing riscv, bpf: Increase the maximum number of iterations selftests, bpf: Add one test for sockmap with strparser selftests, bpf: Fix test_txmsg_ingress_parser error ... ==================== Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
160729afc8 |
- Use the proper interface for the job: get_unaligned() instead of
memcpy() in the insn decoder - A randconfig build fix -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/wogACgkQEsHwGGHe VUoQIw//WdNg7rD++X4GG5l73lGt5ajerqnxjpipiAQTy029cUx0OzeYlWeHR2QH p+zLb3xzghjHn0Gviv9omadcPjHjXbqU6vlR3b95JARM5NnJEKRE7nho/w3mRfaT gWBzo6awh5SXLlo7DYESHRfvyr/Ryjl6LvgBFXprO33ST+0RMsWW/J4bx63xEIUF TKIYtm994O/qQBNLIEu/CB2cOAxtGZrVfRfVK+8QJcUy9xwgP0Oa9I6o9LvzaoJ1 UEvOkL1w6TttRsxgoHz/gskj8+LbXQD9LWVQ55u/HpRDhpNAe4f+RI73Fsgr7Av9 irbrhKwXherKCk9lHgaXQ6XgrrkZyvDY/pvdlj3RlnDt0jsJa6R4gwBGCOXmTgkU 5MF0hHr5kGgXAIJ7AVmYIaTBiLs99/JpF9+9lLW9UuJE2oKj2GxMot3YGTOokj1h u7Y32cta6Ve96ZHHtIXObY5c+LD3OQaljdBayLFaJuTVB6TqVc3dfsEzSNNf/duS 56K28CQEIpPGMe/KW6uZW9eYzQsGv+Jux1X3p650Z/e9A5wVCbdmdEshtACbXSac FVhaybv8ksJKNQmHi3xqbDUpFSMlbXZB3UfpCoQoGR20IfN1H+L7h64Xro5bvbXd LResoLmpnyU3gs3gn9xRYsb4fBr4KYW9jFwzTZSEH3h/Si/Hm2c= =Wj9y -----END PGP SIGNATURE----- Merge tag 'x86_misc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 changes from Borislav Petkov: - Use the proper interface for the job: get_unaligned() instead of memcpy() in the insn decoder - A randconfig build fix * tag 'x86_misc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Use get_unaligned() instead of memcpy() x86/Kconfig: Fix an unused variable error in dell-smm-hwmon |
||
Linus Torvalds
|
91e1c99e17 |
perf updates:
core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. tools: - Update for the new mem_hops field in perf_mem_data_src arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/GkQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoaD8D/wLhXR8RxtF4W9HJmHA+5XFsPtg+isp ZNU2kOs4gZskFx75NQaRv5ikA8y68TKdIx+NuQvRLYItaMveTToLSsJ55bfGMxIQ JHqDvANUNxBmAACnbYQlqf9WgB0i/3fCUHY5lpmN0waKjaswz7WNpycv4ccShVZr PKbgEjkeFBhplCqqOF0X5H3V+4q85+nZONm1iSNd4S7/3B6OCxOf1u78usL1bbtW yJAMSuTeOVUZCJm7oVywKW/ZlCscT135aKr6xe5QTrjlPuRWzuLaXNezdMnMyoVN HVv8a0ClACb8U5KiGfhvaipaIlIAliWJp2qoiNjrspDruhH6Yc+eNh1gUhLbtNpR 4YZR5jxv4/mS13kzMMQg00cCWQl7N4whPT+ZE9pkpshGt+EwT+Iy3U+v13wDfnnp MnDggpWYGEkAck13t/T6DwC3qBIsVujtpiG+tt/ERbTxiuxi1ccQTGY3PDjtHV3k tIMH5n7l4jEpfl8VmoSUgz/2h1MLZnQUWp41GXkjkaOt7uunQZen+nAwqpTm28KV 7U6U0h1q6r7HxOZRxkPPe4HSV+aBNH3H1LeNBfEd3hDCFGf6MY6vLow+2BE9ybk7 Y6LPbRqq0SN3sd5MND0ZvQEt5Zgol8CMlX+UKoLEEv7RognGbIxkgpK7exv5pC9w nWj7TaMfpRzPgw== =Oj0G -----END PGP SIGNATURE----- Merge tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. Tools: - Update for the new mem_hops field in perf_mem_data_src Arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC" * tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses tools/perf: Add mem_hops field in perf_mem_data_src structure perf: Add mem_hops field in perf_mem_data_src structure perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line perf/core: Allow ftrace for functions in kernel/event/core.c perf/x86: Add new event for AUX output counter index perf/x86: Add compiler barrier after updating BTS perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints perf/x86/intel/uncore: Fix Intel SPR IIO event constraints perf/x86/intel/uncore: Fix Intel SPR CHA event constraints perf/x86/intel/uncore: Fix Intel ICX IIO event constraints perf/x86/intel/uncore: Fix invalid unit check perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server |
||
Hengqi Chen
|
2502e74bb5 |
perf bpf: Switch to new btf__raw_data API
Replace the call to btf__get_raw_data with new API btf__raw_data. The old APIs will be deprecated in libbpf v0.7+. No functionality change. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-3-hengqi.chen@gmail.com |
||
Kajol Jain
|
cae1d75906 |
tools/perf: Add mem_hops field in perf_mem_data_src structure
Going forward, future generation systems can have more hierarchy within the node/package level but currently we don't have any data source encoding field in perf, which can be used to represent this level of data. Add a new field called 'mem_hops' in the perf_mem_data_src structure which can be used to represent intra-node/package or inter-node/off-package details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value can be used to present different hop levels data. Also add corresponding macros to define mem_hop field values and shift value. Currently we define macro for HOPS_0 which corresponds to data coming from another core but same node. Add functionality to represent mem_hop field data in perf_mem__lvl_scnprintf function with the help of added string array called mem_hops. For ex: Encodings for mem_hops fields with L2 cache: L2 - local L2 L2 | REMOTE | HOPS_0 - remote core, same node L2 Since with the addition of HOPS field, now remote can be used to denote cache access from the same node but different core, a check is added in the c2c_decode_stats function to set mrem only when HOPS is zero along with set remote field. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-4-kjain@linux.ibm.com |
||
Kajol Jain
|
f4c6217f7f |
perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
Add a comment about PERF_MEM_LVL_* namespace being depricated to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Remove an extra line present in perf_mem__lvl_scnprintf function. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-2-kjain@linux.ibm.com |
||
Alexey Bayduraev
|
8e820f9623 |
perf report: Output non-zero offset for decompressed records
Print offset of PERF_RECORD_COMPRESSED record instead of zero for decompressed records in raw trace dump (-D option of perf-report): 0x17cf08 [0x28]: event: 9 instead of: 0 [0x28]: event: 9 The fix is not critical, because currently file_pos for compressed events is used in perf_session__process_event only to show offsets in the raw dump. This patch was separated from patchset: https://lore.kernel.org/lkml/cover.1629186429.git.alexey.v.bayduraev@linux.intel.com/ and was already rewieved. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210929091445.18274-1-alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Borislav Petkov
|
f96b467583 |
x86/insn: Use get_unaligned() instead of memcpy()
Use get_unaligned() instead of memcpy() to access potentially unaligned
memory, which, when accessed through a pointer, leads to undefined
behavior. get_unaligned() describes much better what is happening there
anyway even if memcpy() does the job.
In addition, since perf tool builds with -Werror, it would fire with:
util/intel-pt-decoder/../../../arch/x86/lib/insn.c: In function '__insn_get_emulate_prefix':
tools/include/../include/asm-generic/unaligned.h:10:15: error: packed attribute is unnecessary [-Werror=packed]
10 | const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
because -Werror=packed would complain if the packed attribute would have
no effect on the layout of the structure.
In this case, that is intentional so disable the warning only for that
compilation unit.
That part is Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
No functional changes.
Fixes:
|
||
Like Xu
|
a827c007c7 |
perf config: Refine error message to eliminate confusion
If there is no configuration file at first, the user can write any pair of "key.subkey=value" to the newly created configuration file, while value validation against a valid configurable key is *deferred* until the next execution or the implied execution of "perf config ... ". For example: $ rm ~/.perfconfig $ perf config call-graph.dump-size=65529 $ cat ~/.perfconfig # this file is auto-generated. [call-graph] dump-size = 65529 $ perf config call-graph.dump-size=2048 callchain: Incorrect stack dump size (max 65528): 65529 Error: wrong config key-value pair call-graph.dump-size=65529 The user might expect that the second value 2048 is valid and can be updated to the configuration file, but the error message is very confusing because the first value 65529 is not reported as an error during the last configuration. It is recommended not to change the current behavior of delayed validation (as more effort is needed), but to refine the original error message to *clearly indicate* that the cause of the error is the configuration file. Signed-off-by: Like Xu <likexu@tencent.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210924115817.58689-1-likexu@tencent.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andrii Nakryiko
|
219d720e6d |
perf bpf: Ignore deprecation warning when using libbpf's btf__get_from_id()
Perf code re-implements libbpf's btf__load_from_kernel_by_id() API as a weak function, presumably to dynamically link against old version of libbpf shared library. Unfortunately this causes compilation warning when perf is compiled against libbpf v0.6+. For now, just ignore deprecation warning, but there might be a better solution, depending on perf's needs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: kernel-team@fb.com LPU-Reference: 20210914170004.4185659-1-andrii@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Michael Petlan
|
57f0ff059e |
perf machine: Initialize srcline string member in add_location struct
It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit |
||
Namhyung Kim
|
4a86d41404 |
perf tools: Allow build-id with trailing zeros
Currently perf saves a build-id with size but old versions assumes the
size of 20. In case the build-id is less than 20 (like for MD5), it'd
fill the rest with 0s.
I saw a problem when old version of perf record saved a binary in the
build-id cache and new version of perf reads the data. The symbols
should be read from the build-id cache (as the path no longer has the
same binary) but it failed due to mismatch in the build-id.
symsrc__init: build id mismatch for /home/namhyung/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf.
The build-id event in the data has 20 byte build-ids, but it saw a
different size (16) when it reads the build-id of the elf file in the
build-id cache.
$ readelf -n ~/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 53e4c2f42a4c61a2d632d92a72afa08f
Let's fix this by allowing trailing zeros if the size is different.
Fixes:
|
||
Adrian Hunter
|
99fc5941b8 |
perf tools: Fix hybrid config terms list corruption
A config terms list was spliced twice, resulting in a never-ending loop when the list was traversed. Fix by using list_splice_init() and copying and freeing the lists as necessary. This patch also depends on patch "perf tools: Factor out copy_config_terms() and free_config_terms()" Example on ADL: Before: # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname & # jobs [1]+ Running perf record -e "{intel_pt//,cycles/aux-sample-size=4096/pp}" uname # perf top -E 10 PerfTop: 4071 irqs/sec kernel: 6.9% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 24 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 97.60% perf [.] __evsel__get_config_term 0.25% [kernel] [k] kallsyms_expand_symbol.constprop.13 0.24% perf [.] kallsyms__parse 0.15% [kernel] [k] _raw_spin_lock 0.14% [kernel] [k] number 0.13% [kernel] [k] advance_transaction 0.08% [kernel] [k] format_decode 0.08% perf [.] map__process_kallsym_symbol 0.08% perf [.] rb_insert_color 0.08% [kernel] [k] vsnprintf exiting. # kill %1 After: # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname & Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.060 MB perf.data ] # perf script | head perf-exec 604 [001] 1827.312293: psb: psb offs: 0 ffffffffb8415e87 pt_config_start+0x37 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a3bd event_sched_in.isra.133+0xfd ([kernel.kallsyms]) => ffffffffb856a9a0 perf_pmu_nop_void+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856b10e merge_sched_in+0x26e ([kernel.kallsyms]) => ffffffffb856a2c0 event_sched_in.isra.133+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a45d event_sched_in.isra.133+0x19d ([kernel.kallsyms]) => ffffffffb8568b80 perf_event_set_state.part.61+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8568b86 perf_event_set_state.part.61+0x6 ([kernel.kallsyms]) => ffffffffb85662a0 perf_event_update_time+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a35c event_sched_in.isra.133+0x9c ([kernel.kallsyms]) => ffffffffb8567610 perf_log_itrace_start+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a377 event_sched_in.isra.133+0xb7 ([kernel.kallsyms]) => ffffffffb8403b40 x86_pmu_add+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8403b86 x86_pmu_add+0x46 ([kernel.kallsyms]) => ffffffffb8403940 collect_events+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8403a7b collect_events+0x13b ([kernel.kallsyms]) => ffffffffb8402cd0 collect_event+0x0 ([kernel.kallsyms]) Fixes: |
||
Adrian Hunter
|
a7d212fc6c |
perf tools: Factor out copy_config_terms() and free_config_terms()
Factor out copy_config_terms() and free_config_terms() so that they can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https //lore.kernel.org/r/20210909125508.28693-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
eb34363ae1 |
perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
Some fields are missing and text_poke is duplicated. Fix that up. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210911120550.12203-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
218e7b775d |
perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
The btf__get_from_id() function was deprecated in favour of btf__load_from_kernel_by_id(), but it is still avaiable, so use it to provide a weak function btf__load_from_kernel_by_id() for older libbpf when building perf with LIBBPF_DYNAMIC=1, i.e. using the system's libbpf package. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kim Phillips
|
291dcb98d7 |
perf report: Add support to print a textual representation of IBS raw sample data
Perf records IBS (Instruction Based Sampling) extra sample data when 'perf record --raw-samples' is used with an IBS-compatible event, on a machine that supports IBS. IBS support is indicated in CPUID_Fn80000001_ECX bit #10. Up until now, users have been able to see the extra sample data solely in raw hex format using 'perf report --dump-raw-trace'. From there, users could decode the data either manually, or by using an external script. Enable the built-in 'perf report --dump-raw-trace' to do the decoding of the extra sample data bits, so manual or external script decoding isn't necessary. Example usage: $ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace Stdout contains IBS Fetch samples, e.g.: ibs_fetch_ctl: 02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat 7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0 IbsFetchLinAd: 000056016b2ead40 IbsFetchPhysAd: 000000115cedfd40 c_ibs_ext_ctl: 0000000000000000 IbsItlbRefillLat 0 ..and IBS Op samples, e.g.: ibs_op_ctl: 0000009e009e8968 MaxCnt 10000000 En 1 Val 1 CntCtl 1=uOps CurCnt 158 IbsOpRip: 000056016b2ea73d ibs_op_data: 00000000000b0002 CompToRetCtr 2 TagToRetCtr 11 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0 ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache ibs_op_data3: 0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth 4 bytes OpDcMissOpenMemReqs 0 DcMissLat 0 TlbRefillLat 0 IbsDCLinAd: 00007f133c319ce0 IbsDCPhysAd: 0000000270485ce0 Committer notes: Fixed up this: util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’: util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=] 125 | " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1)); | ^~~~~~ In file included from /usr/include/stdio.h:866, from util/amd-sample-raw.c:7: /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21 71 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 72 | __glibc_objsize (__s), __fmt, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 73 | __va_arg_pack ()); | ~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors As that %2d won't limit the number of chars to 2, just state that 2 is the minimal width: $ cat printf.c #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { char bf[64]; int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1])); printf("strlen(%s): %u\n", bf, len); return 0; } $ ./printf 1 strlen( 1): 2 $ ./printf 12 strlen(12): 2 $ ./printf 123 strlen(123): 3 $ ./printf 1234 strlen(1234): 4 $ ./printf 12345 strlen(12345): 5 $ ./printf 123456 strlen(123456): 6 $ And since we probably don't want that output to be truncated, just assume the worst case, as the compiler did, and add a few more chars to that buffer. Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to avoid bugs when changing one but not the other. I also had to change this: -#include <asm/amd-ibs.h> +#include "../../arch/x86/include/asm/amd-ibs.h" To make it build on other architectures, just like intel-pt does. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https //lore.kernel.org/r/20210817221509.88391-4-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kim Phillips
|
9fe8895a27 |
perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
To be used by IBS raw data display: It needs the recorder's cpuid in order to determine which errata workarounds to apply to the data, and the pmu_mappings are needed in order to figure out which PMU sample type is IBS Fetch vs. IBS Op. When not available from perf.data, we assume local operation, and retrieve cpuid and pmu mappings directly from the running system. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https //lore.kernel.org/r/20210817221509.88391-2-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Remi Bernon
|
d2930ede52 |
perf symbol: Look for ImageBase in PE file to compute .text offset
Instead of using the file offset in the debug file. This fixes a regression from |
||
Linus Torvalds
|
2d338201d5 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"147 patches, based on
|
||
Andy Shevchenko
|
7fc5b57132 |
tools: rename bitmap_alloc() to bitmap_zalloc()
Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API in the kernel. No functional changes intended. Link: https://lkml.kernel.org/r/20210814211713.180533-14-yury.norov@gmail.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Suggested-by: Yury Norov <yury.norov@gmail.com> Acked-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
27151f1778 |
perf tools changes for v5.15:
New features: - Improvements for the flamegraph python script, including: - Display perf.data header - Display PIDs of user stacks - Added option to change color scheme - Default to blue/green color scheme to improve accessibility - Correctly identify kernel stacks when debuginfo is available - Improvements for 'perf bench futex': - Add --mlockall parameter - Add --broadcast and --pi to the 'requeue' sub benchmark - Add support for PMU aliases. - Introduce an ARM Coresight ETE decoder. - Add a 'perf bench' entry for evlist open/close operations, to help quantify improvements with multithreading 'perf record'. - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf script's python scripting. - Add a 'perf test' entry for PMU aliases. - Add a 'perf test' entry for 'perf record/perf report/perf script' pipe mode. Fixes: - perf script dlfilter (API for filtering via dynamically loaded shared object introduced in v5.14) fixes and a 'perf test' entry for it. - Fix get_current_dir_name() compilation on Android. - Fix issues with asciidoc and double dashes uses. - Fix memory leaks in the BTF handling code. - Fix leftover problems in the Documentation from the infrastructure originally lifted from the git codebase. - Fix *probe_vfs_getname.sh 'perf test' failures. - Handle fd gaps in 'perf test's test__dso_data_reopen(). - Make sure to show disasembly warnings for 'perf annotate --stdio'. - Fix output from pipe to file and vice-versa in 'perf record/report/script'. - Correct 'perf data -h' output. - Fix wrong comm in system-wide mode with 'perf record --delay'. - Do not allow --for-each-cgroup without cpu in 'perf stat' - Make 'perf test --skip' work on shell tests. - Fix libperf's verbose printing. Misc improvements: - Preparatory patches for multithreading varios 'perf record' phases (synthesizing, opening, recording, etc). - Add sparse context/locking annotations in compiler-types.h, also to help with the multithreading effort. - Optimize the generation of the arch specific erno tables used in 'perf trace'. - Optimize libperf's perf_cpu_map__max(). - Improve ARM's CoreSight warnings. - Report collisions in AUX records. - Improve warnings for the LLVM 'perf test' entry. - Improve the PMU events 'perf test' codebase. - perf test: Do not compare overheads in the zstd comp test - Better support annotation on ARM. - Update 'perf trace's cmd string table to decode sys_bpf() first arg. Vendor events: - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and Elhart Lake. - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake servers. Hardware tracing: - Improvements for the ARM hardware tracing auxtrace support. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYTO8OAAKCRCyPKLppCJ+ J/N0AQCTAfxKsVyh9vjIY5vwHvNkYMDM4Wvs7TmlfXbVgKLP3AEA0wYlTyar/teI NC6jQpipxpWASFUJbLSmU8qn/XFXwQo= =4r0w -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "New features: - Improvements for the flamegraph python script, including: - Display perf.data header - Display PIDs of user stacks - Added option to change color scheme - Default to blue/green color scheme to improve accessibility - Correctly identify kernel stacks when debuginfo is available - Improvements for 'perf bench futex': - Add --mlockall parameter - Add --broadcast and --pi to the 'requeue' sub benchmark - Add support for PMU aliases. - Introduce an ARM Coresight ETE decoder. - Add a 'perf bench' entry for evlist open/close operations, to help quantify improvements with multithreading 'perf record'. - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf script's python scripting. - Add a 'perf test' entry for PMU aliases. - Add a 'perf test' entry for 'perf record/perf report/perf script' pipe mode. Fixes: - perf script dlfilter (API for filtering via dynamically loaded shared object introduced in v5.14) fixes and a 'perf test' entry for it. - Fix get_current_dir_name() compilation on Android. - Fix issues with asciidoc and double dashes uses. - Fix memory leaks in the BTF handling code. - Fix leftover problems in the Documentation from the infrastructure originally lifted from the git codebase. - Fix *probe_vfs_getname.sh 'perf test' failures. - Handle fd gaps in 'perf test's test__dso_data_reopen(). - Make sure to show disasembly warnings for 'perf annotate --stdio'. - Fix output from pipe to file and vice-versa in 'perf record/report/script'. - Correct 'perf data -h' output. - Fix wrong comm in system-wide mode with 'perf record --delay'. - Do not allow --for-each-cgroup without cpu in 'perf stat' - Make 'perf test --skip' work on shell tests. - Fix libperf's verbose printing. Misc improvements: - Preparatory patches for multithreading various 'perf record' phases (synthesizing, opening, recording, etc). - Add sparse context/locking annotations in compiler-types.h, also to help with the multithreading effort. - Optimize the generation of the arch specific erno tables used in 'perf trace'. - Optimize libperf's perf_cpu_map__max(). - Improve ARM's CoreSight warnings. - Report collisions in AUX records. - Improve warnings for the LLVM 'perf test' entry. - Improve the PMU events 'perf test' codebase. - perf test: Do not compare overheads in the zstd comp test - Better support annotation on ARM. - Update 'perf trace's cmd string table to decode sys_bpf() first arg. Vendor events: - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and Elhart Lake. - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake servers. Hardware tracing: - Improvements for the ARM hardware tracing auxtrace support" * tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits) perf tests: Add test for PMU aliases perf pmu: Add PMU alias support perf session: Report collisions in AUX records perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event perf build: Report failure for testing feature libopencsd perf cs-etm: Show a warning for an unknown magic number perf cs-etm: Print the decoder name perf cs-etm: Create ETE decoder perf cs-etm: Update OpenCSD decoder for ETE perf cs-etm: Fix typo perf cs-etm: Save TRCDEVARCH register perf cs-etm: Refactor out ETMv4 header saving perf cs-etm: Initialise architecture based on TRCIDR1 perf cs-etm: Refactor initialisation of decoder params. tools build: Fix feature detect clean for out of source builds perf evlist: Add evlist__for_each_entry_from() macro perf evsel: Handle precise_ip fallback in evsel__open_cpu() perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu() perf evsel: Move test_attr__open() to success path in evsel__open_cpu() perf evsel: Move ignore_missing_thread() to fallback code ... |
||
Kan Liang
|
13d60ba073 |
perf pmu: Add PMU alias support
A perf uncore PMU may have two PMU names, a real name and an alias. The alias is exported at /sys/bus/event_source/devices/uncore_*/alias. The perf tool should support the alias as well. Add alias_name in the struct perf_pmu to store the alias. For the PMU which doesn't have an alias. It's NULL. Introduce two X86 specific functions to retrieve the real name and the alias separately. Only go through the sysfs to retrieve the mapping between the real name and the alias once. The result is cached in a list, uncore_pmu_list. Nothing changed for the other ARCHs. With the patch, the perf tool can monitor the PMU with either the real name or the alias. Use the real name, $ perf stat -e uncore_cha_2/event=1/ -x, 4044879584,,uncore_cha_2/event=1/,2528059205,100.00,, Use the alias, $ perf stat -e uncore_type_0_2/event=1/ -x, 3659675336,,uncore_type_0_2/event=1/,2287306455,100.00,, Committer notes: Rename 'struct perf_pmu_alias_name' to 'pmu_alias', the 'perf_' prefix should be used for libperf, things inside just tools/perf/ are being moved away from that prefix. Also 'pmu_alias' is shorter and reflects the abstraction. Also don't use 'pmu' as the name for variables for that type, we should use that for the 'struct perf_pmu' variables, avoiding confusion. Use 'pmu_alias' for 'struct pmu_alias' variables. Co-developed-by: Jin Yao <yao.jin@linux.intel.com> Co-developed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Link: http://lore.kernel.org/lkml/20210902065955.1299-2-yao.jin@linux.intel.com Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Suzuki K Poulose
|
c68b421d8e |
perf session: Report collisions in AUX records
Just like the other flags in the AUX records, report a summary of the Collisions if there were any. Signed-off-by: Suzuki Poulouse <suzuki.poulose@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org LPU-Reference: 20210728091219.527886-1-suzuki.poulose@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Stephen Brennan
|
538d9c1829 |
perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
perf_events may sometimes throttle an event due to creating too many samples during a given timer tick. As of now, the perf tool will not report on throttling, which means this is a silent error. Implement a callback for the throttle and unthrottle events within the Python scripting engine, which can allow scripts to detect and report when events may have been lost due to throttling. The simplest script to report throttle events is: def throttle(*args): print("throttle" + repr(args)) def unthrottle(*args): print("unthrottle" + repr(args)) Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210901210815.133251-1-stephen.s.brennan@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
a80aea64aa |
perf cs-etm: Show a warning for an unknown magic number
Currently perf reports "Cannot allocate memory" which isn't very helpful for a potentially user facing issue. If we add a new magic number in the future, perf will be able to report unrecognised magic numbers. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-10-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
56c62f52b6 |
perf cs-etm: Print the decoder name
Use the real name of the decoder instead of hard-coding "ETM" to avoid confusion when the trace is ETE. This also now distinguishes between ETMv3 and ETMv4. Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-9-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
779f414a48 |
perf cs-etm: Create ETE decoder
If the magic number indicates ETE instantiate a OCSD_BUILTIN_DCD_ETE decoder instead of OCSD_BUILTIN_DCD_ETMV4I. ETE is the new trace feature for Armv9. Testing performed ================= * Old files with v0 and v1 headers for ETMv4 still open correctly * New files with new magic number open on new versions of perf * New files with new magic number fail to open on old versions of perf * Decoding with the ETE decoder results in the same output as the ETMv4 decoder as long as there are no new ETE packet types Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-8-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
212095f7ca |
perf cs-etm: Update OpenCSD decoder for ETE
OpenCSD v1.1.1 has a bug fix for the installation of the ETE decoder headers. This also means that including headers separately for each decoder is unnecessary so remove these. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-7-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
51ba881131 |
perf cs-etm: Save TRCDEVARCH register
When ETE is present save the TRCDEVARCH register and set a new magic number. It will be used to configure the decoder in a later commit. Old versions of perf will not be able to open files with this new magic number, but old files will still work with newer versions of perf. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-5-james.clark@arm.com [ Addressed some cosmetic suggestions by Suzuki Poulouse ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
f4aef1ea26 |
perf cs-etm: Initialise architecture based on TRCIDR1
Currently the architecture is hard coded as ARCH_V8, but from ETMv4.4 onwards this should be ARCH_AA64. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
991f69e9e0 |
perf cs-etm: Refactor initialisation of decoder params.
The initialisation of the decoder params is duplicated between creation of the packet printer and packet decoder. Put them both into one function so that future changes only need to be made in one place. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https //lore.kernel.org/r/20210806134109.1182235-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
79e7ed56d7 |
perf evlist: Add evlist__for_each_entry_from() macro
This patch adds a new iteration macro for evlist that resumes iteration from a given evsel in the evlist. This macro will be used in the workqueue series. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/2386505f8b598adf0dbcd04ec21804c6bcf00826.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
28667a5269 |
perf evsel: Handle precise_ip fallback in evsel__open_cpu()
This is another patch in the effort to separate the fallback mechanisms from the open itself. In case of precise_ip fallback, the original precise_ip will be stored in the evsel (it was stored in a local variable) and the open will be retried. Since the precise_ip fallback will be the first in the chain of fallbacks, there should be no functional change with this patch. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/74208c433d2024a6c4af9c0b140b54ed6b5ea810.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
91233d003b |
perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
I don't see why bpf_counter__install_pe() should get called even if fd = -1, so I'm moving it to the success path. This will be useful in following patches to separate the actual open and the related operations from the fallback mechanisms. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/64f8a1b0a838a6e6049cd43c1beafd432999ae57.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
ebfb045a41 |
perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
test_attr__open() ignores the fd if -1, therefore it is safe to move it to the success path (fd >= 0). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/b3baf11360ca96541c9631730614fd7d217496fc.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
da7c3b4622 |
perf evsel: Move ignore_missing_thread() to fallback code
This patch moves ignore_missing_thread outside the perf_event_open loop. Doing so, we need to move the retry_open flag a few places higher, with minimal impact. Furthermore, thread need not be decreased since it won't get increased by the for loop (since we're jumping back inside), but we need to check that the nthreads decrease didn't put thread out of range. The goal is to have fallbacks handled in one place only, since in the future parallel code, these would be handled separately. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/4eca51443c786baaf6811b7cd8e73aafd97f7606.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
71efc48a4c |
perf evsel: Separate rlimit increase from evsel__open_cpu()
This is a preparatory patch for the workqueue patches with the goal to separate from evlist__open_cpu() the actual opening (which could be performed in parallel), from the existing fallback mechanisms, which should be handled sequentially. This patch separates the rlimit increase from evsel__open_cpu(). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/2f256de8ec37b9809a5cef73c2fa7bce416af5d3.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
d21fc5f077 |
perf evsel: Separate missing feature detection from evsel__open_cpu()
This is a preparatory patch for the workqueue patches with the goal to separate in evlist__open_cpu() the actual opening, which could be performed in parallel, from the existing fallback mechanisms, which should be handled sequentially. This patch separates the missing feature detection in evsel__open_cpu() into a new evsel__detect_missing_features() function. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/cba0b7d939862473662adeedb0f9c9b69566ee9a.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
6efd06e374 |
perf evsel: Add evsel__prepare_open()
This function will prepare the evsel and disable the missing features. It will be used in one of the following patches. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/fa5e78bbb92c848226f044278fdcf777b3ce4583.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
588f4ac763 |
perf evsel: Separate missing feature disabling from evsel__open_cpu
This is a preparatory patch for the patches in the workqueue series with the goal to separate in evlist__open_cpu() the actual opening, which could be performed in parallel, from the existing fallback mechanisms, which should be handled sequentially. This patch separates the disabling of missing features from evlist__open_cpu() into a new function evsel__disable_missing_features((). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/48138bd2932646dde315505da733c2ca635ad2ee.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
46def08f5d |
perf evsel: Save open flags in evsel in prepare_open()
This patch caches the flags used in perf_event_open() inside evsel, so that they can be set in __evsel__prepare_open() (this will be useful in patches in the workqueue series, when the fallback mechanisms will be handled outside the open itself). This also optimizes the code, by not having to recompute them everytime. Since flags are now saved in evsel, the flags argument in perf_event_open() is removed. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/d9f63159098e56fa518eecf25171d72e6f74df37.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
d45ce03434 |
perf evsel: Separate open preparation from open itself
This is a preparatory patch for the following patches with the goal to separate in evlist__open_cpu the actual perf_event_open, which could be performed in parallel, from the existing fallback mechanisms, which should be handled sequentially. This patch separates the first lines of evsel__open_cpu into a new __evsel__prepare_open function. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/e14118b934c338dbbf68b8677f20d0d7dbf9359a.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
bc0496043e |
perf evsel: Remove retry_sample_id goto label
As far as I can tell, there is no good reason, apart from optimization to have the retry_sample_id separate from fallback_missing_features. Probably, this label was added to avoid reapplying patches for missing features that had already been applied. However, missing features that have been added later have not used this optimization, always jumping to fallback_missing_features and reapplying all missing features. This patch removes that label, replacing it with fallback_missing_features. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/340af0d03408d6621fd9c742e311db18b3585b3b.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Riccardo Mancini
|
5d4da30f76 |
perf mmap: Add missing bitops.h header
MMAP_CPU_MASK_BYTES uses the BITS_TO_LONGS macro, which is defined in linux/bitops.h. However, this header is not included directly, but gets imported indirectly in files using the macro. This patch adds the missing include. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/c5b91ee432a2e28e7f16337c740b43b4d0b0e86c.1629490974.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
40a72c6472 |
perf tools: Fix LLVM download hint link
http://llvm.org/apt returns 404, it has moved to https://apt.llvm.org/ Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20210831145501.2135754-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
792adb1aa9 |
perf tools: Fix LLVM test failure when running in verbose mode
A CI system might want to run all tests in verbose mode so that there is enough information to diagnose issues. This LLVM test is the only test that uses "-v" to signify to not skip the test if the preconditions aren't met (LLVM isn't installed). This means that running the test in verbose mode without LLVM installed causes a test failure. For consistency with the other tests, remove this verbose/skip check. An alternate solution would be to make _all_ tests not skip when run in verbose mode, but I don't think that would be intuitive. Also change the search_program() call to search_program_and_warn(). Previously the hint about installing LLVM was only printed by the actual test because this check was skipped in verbose mode. To maintain the old behaviour, the precondition check must also print the full warning. Previous output: $ ./perf test llvm 40: LLVM search and compile : 40.1: Basic BPF llvm compile : Skip $ ./perf test -v llvm 40: LLVM search and compile : 40.1: Basic BPF llvm compile : --- start --- test child forked, pid 2085835 ERROR: unable to find clang. Hint: Try to install latest clang/llvm to support BPF. Check your $PATH ... test child finished with -1 ---- end ---- LLVM search and compile subtest 1: FAILED! New output (non verbose mode is identical, verbose changes from fail to skip): $ ./perf test llvm 40: LLVM search and compile : 40.1: Basic BPF llvm compile : Skip $ ./perf test -v llvm 40: LLVM search and compile : 40.1: Basic BPF llvm compile : --- start --- test child forked, pid 2087680 ERROR: unable to find clang. Hint: Try to install latest clang/llvm to support BPF. Check your $PATH ... No clang, skip this test test child finished with -2 ---- end ---- LLVM search and compile subtest 1: Skip Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20210831145501.2135754-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
a8a2d5c0b3 |
perf tools: Refactor LLVM test warning for missing binary
The same warning is duplicated in two places so refactor it into a single function "search_program_and_warn". This will be used a third time in a later commit. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20210831145501.2135754-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
bbc49f1202 |
perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
When perf runs in compat mode (kernel in 64-bit mode and the perf is in 32-bit mode), the 64-bit value atomicity in the user space cannot be assured, E.g. on some architectures, the 64-bit value accessing is split into two instructions, one is for the low 32-bit word accessing and another is for the high 32-bit word. This patch introduces weak functions compat_auxtrace_mmap__read_head() and compat_auxtrace_mmap__write_tail(), as their naming indicates, when perf tool works in compat mode, it uses these two functions to access the AUX head and tail. These two functions can allow the perf tool to work properly in certain conditions, e.g. when perf tool works in snapshot mode with only using AUX head pointer, or perf tool uses the AUX buffer and the incremented tail is not bigger than 4GB. When perf tool cannot handle the case when the AUX tail is bigger than 4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and tells the caller to bail out for the error. These two functions are declared as weak attribute, this allows to implement arch specific functions if any arch can support the 64-bit value atomicity in compat mode. Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Russell King (oracle)" <linux@armlinux.org.uk> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210829102238.19693-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |