linux

iv/linux

Author	SHA1	Message	Date
Song Yoong Siang	1347b41931	net: stmmac: Add Tx HWTS support to XDP ZC This patch enables transmit hardware timestamp support to XDP zero copy via XDP Tx metadata framework. This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel Tiger Lake platform. Below are the test steps and results. Command on DUT: sudo ./xdp_hw_metadata <interface name> sudo hwstamp_ctl -i <interface name> -t 1 -r 1 Command on Link Partner: echo -n xdp \| nc -u -q1 <destination IPv4 addr> 9091 Result: xsk_ring_cons__peek: 1 0x55bbbf08b6d0: rx_desc[2]->addr=8c100 addr=8c100 comp_addr=8c100 EoP No rx_hash err=-95 rx_timestamp: 1677762688429141540 (sec:1677762688.4291) HW RX-time: 1677762688429141540 (sec:1677762688.4291) delta to User RX-time sec:0.0003 (250.665 usec) XDP RX-time: 1677762688429375597 (sec:1677762688.4294) delta to User RX-time sec:0.0000 (16.608 usec) 0x55bbbf08b6d0: ping-pong with csum=561c (want f488) csum_start=34 csum_offset=6 0x55bbbf08b6d0: complete tx idx=2 addr=2008 tx_timestamp: 1677762688431127273 (sec:1677762688.4311) HW TX-complete-time: 1677762688431127273 (sec:1677762688.4311) delta to User TX-complete-time sec:0.0083 (8331.655 usec) XDP RX-time: 1677762688429375597 (sec:1677762688.4294) delta to User TX-complete-time sec:0.0101 (10083.331 usec) HW RX-time: 1677762688429141540 (sec:1677762688.4291) delta to HW TX-complete-time sec:0.0020 (1985.733 usec) 0x55bbbf08b6d0: complete rx idx=130 addr=8c100 Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-6-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	ec706a860e	net/mlx5e: Implement AF_XDP TX timestamp and checksum offload TX timestamp: - requires passing clock, not sure I'm passing the correct one (from cq->mdev), but the timestamp value looks convincing TX checksum: - looks like device does packet parsing (and doesn't accept custom start/offset), so I'm ignoring user offsets Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	9276009d35	tools: ynl: Print xsk-features from the sample In a similar fashion we do for the other bit masks. Fix mask parsing (>= vs >) while we are it. Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	48eb03dd26	xsk: Add TX timestamp and TX checksum offload support This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags which requests appropriate offloads, followed by the offload-specific fields. The supported per-device offloads are exported via netlink (new xsk-flags). The offloads themselves are still implemented in a bit of a framework-y fashion that's left from my initial kfunc attempt. I'm introducing new xsk_tx_metadata_ops which drivers are supposed to implement. The drivers are also supposed to call xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since xsk_tx_metadata_{request,_complete} are static inline, we don't incur any extra overhead doing indirect calls. The benefit of this scheme is as follows: - keeps all metadata layout parsing away from driver code - makes it easy to grep and see which drivers implement what - don't need any extra flags to maintain to keep track of what offloads are implemented; if the callback is implemented - the offload is supported (used by netlink reporting code) Two offloads are defined right now: 1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset 2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata area upon completion (tx_timestamp field) XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass metadata pointer). The struct is forward-compatible and can be extended in the future by appending more fields. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Stanislav Fomichev	341ac980ea	xsk: Support tx_metadata_len For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata. Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case). The size of the metadata has mostly the same constraints as XDP: - less than 256 bytes - 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte timestamp in the completion) - non-zero This data is not interpreted in any way right now. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-11-29 14:59:40 -08:00
Jakub Kicinski	83f2df9d66	tools: ynl-gen: always construct struct ynl_req_state struct ynl_req_state carries reply-related info from generated code into generic YNL code. While we don't need reply info to execute a request without a reply, we still need to pass in the struct, because it's also where we get the pointer to struct ynl_sock from. Passing NULL results in crashes if kernel returns an error or an unexpected reply. Fixes: `dc0956c98f` ("tools: ynl-gen: move the response reading logic into YNL") Link: https://lore.kernel.org/r/20231126225858.2144136-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:44:02 -08:00
Jakub Kicinski	cbeb989e41	ethtool: don't propagate EOPNOTSUPP from dumps The default dump handler needs to clear ret before returning. Otherwise if the last interface returns an inconsequential error this error will propagate to user space. This may confuse user space (ethtool CLI seems to ignore it, but YNL doesn't). It will also terminate the dump early for mutli-skb dump, because netlink core treats EOPNOTSUPP as a real error. Fixes: `728480f124` ("ethtool: default handlers for GET requests") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231126225806.2143528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:43:27 -08:00
Jakub Kicinski	987b71f86c	Merge branch 'fine-tune-flow-control-and-speed-configurations-in-microchip-ksz8xxx-dsa-driver' Oleksij Rempel says: ==================== Fine-Tune Flow Control and Speed Configurations in Microchip KSZ8xxx DSA Driver This patch set focuses on enhancing the configurability of flow control, speed, and duplex settings in the Microchip KSZ8xxx DSA driver. The first patch allows more granular control over the CPU port's flow control, speed, and duplex settings. The second patch introduces a method for port configurations for port with integrated PHYs, primarily concerning flow control based on duplex mode. ==================== Link: https://lore.kernel.org/r/20231127145101.3039399-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:38:36 -08:00
Oleksij Rempel	71cd5ce7e2	net: dsa: microchip: make phylink_mac_link_up() not optional Last part of the driver do now support phylink_mac_link_up(). So, make it not optional. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231127145101.3039399-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:38:34 -08:00
Oleksij Rempel	2f58148c41	net: dsa: microchip: ksz8: Add function to configure ports with integrated PHYs This patch introduces the function 'ksz8_phy_port_link_up' to the Microchip KSZ8xxx driver. This function is responsible for setting up flow control and duplex settings for the ports that are integrated with PHYs. The KSZ8795 switch supports asymmetric pause control, which can't be fully utilized since a single bit controls both RX and TX pause. Despite this, the flow control can be adjusted based on the auto-negotiation process, taking into account the capabilities of both link partners. On the other hand, the KSZ8873's PORT_FORCE_FLOW_CTRL bit can be set by the hardware bootstrap, which ignores the auto-negotiation result. Therefore, even in auto-negotiation mode, we need to ensure that this bit is correctly set. When auto-negotiation isn't in use, we enforce symmetric pause control for the KSZ8795 switch. Please note, forcing flow control disable on a port while still advertising pause support isn't possible. While this scenario might not be practical or desired, it's important to be aware of this limitation when working with the KSZ8873 and similar devices. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231127145101.3039399-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:38:34 -08:00
Oleksij Rempel	87f062ed85	net: dsa: microchip: ksz8: Make flow control, speed, and duplex on CPU port configurable Allow flow control, speed, and duplex settings on the CPU port to be configurable. Previously, the speed and duplex relied on default switch values, which limited flexibility. Additionally, flow control was hardcoded and only functional in duplex mode. This update enhances the configurability of these parameters. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231127145101.3039399-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:38:34 -08:00
Jakub Kicinski	bed7b22e13	Merge branch 'gve-add-support-for-non-4k-page-sizes' John Fraker says: ==================== gve: Add support for non-4k page sizes. This patch series adds support for non-4k page sizes to the driver. Prior to this patch series, the driver assumes a 4k page size in many small ways, and will crash in a kernel compiled for a different page size. This changeset aims to be a minimal changeset that unblocks certain arm platforms with large page sizes. ==================== Link: https://lore.kernel.org/r/20231128002648.320892-1-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:38 -08:00
John Fraker	da7d4b42ca	gve: Remove dependency on 4k page size. Prior to this change, gve crashes when attempting to run in kernels with page sizes other than 4k. This change removes unnecessary references to PAGE_SIZE and replaces them with more meaningful constants. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-6-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	513072fb4b	gve: Add page size register to the register_page_list command. This register is required on platforms with page sizes greater than 4k. This is because the tx side of the driver vmaps the entire queue page list of pages into a single flat address space, then uses the entire space. Without communicating the guest page size to the backend, the backend will only access the first 4k of each page in the queue page list. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-5-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	ce260cb114	gve: Remove obsolete checks that rely on page size. These checks are safe to remove as they are no longer enforced by the backend. Retaining them would require updating them to work differently with page sizes larger than 4k. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-4-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	8ae980d241	gve: Deprecate adminq_pfn for pci revision 0x1. adminq_pfn assumes a page size of 4k, causing this mechanism to break in kernels compiled with different page sizes. A new PCI device revision was needed for the device to be able to communicate with the driver how to set up the admin queue prior to having access to the admin queue. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-3-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	955f4d3bf0	gve: Perform adminq allocations through a dma_pool. This allows the adminq to be smaller than a page, paving the way for non 4k page support. This is to support platforms where PAGE_SIZE is not 4k, such as some ARM platforms. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-2-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:35 -08:00
Linus Torvalds	3b47bc037b	Some pin control fixes for the v6.7 cycle: - Fix a really interesting potential core bug in the list iterator requireing the use of READ_ONCE() discovered when testing kernel compiles with clang. - Check devm_kcalloc() return value and an array bounds in the STM32 driver. - Fix an exotic string truncation issue in the s32cc driver, found by the kernel test robot (impressive!) - Fix an undocumented struct member in the cy8c95x0 driver. - Fix a symbol overlap with MIPS in the Lochnagar driver, MIPS defines a global symbol "RST" which is a bit too generic and collide with stuff. OK this one should be renamed too, we will fix that as well. - Fix erroneous branch taking in the Realtek driver. - Fix the mail address in MAINTAINERS for the s32g2 driver. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmVnKK8ACgkQQRCzN7AZ XXPr1BAAuG5cy4goUT+ycz9WOH65D3gaxqyNxMOHSI7XrbyTQWzPkR+HCcFHORQH hdCRsTy8WoSnN3UJ/ySaEHl8D8nKl51M8KKqARNpaLnROyKWmGlQQdFZZScIUcpH 3YXKMngBg8buYKL9ZVDJrgCKxpAKVIXxkPDLRFMtfDIWBivvZcx4g7yeuX8SkZyT hQEkx4VrN9mdA30ZY6s6avhDWyxL/CF669fk7qaQUaT3dZVQdFQ9lx/3yzk7Fm5D z2hXwQ5ghLRQS/ZbxNS2oHixU9sA5xUVzaj3Rn3Of43NSAmrj8JWuqC2pyZ6V70e jJSEldBqqY5tiP/0mh2irqi/Uo2CQqVbYtmEtqWxwxWcTv21orO5Hl8CAldXAQZt v0kVbdsPtePiXfwP/EqsE3qBd3GFeLJKxtK3qZsYSdcDjbqPoxHqGBGSJUHoNfB0 ZFh0PDrFknRGO7ZHJKUdYJ4PQklFSp/JmVtbbZI+6/yafNX/bO9w1RpHOZQ5Je/s 7k9G1dfmJOmP+PprxuqjZTyZ0b+JPqfsiZje3LAnLCEyezJtKqlZwmDGjzhWM2Gv 5yTDIuxmtgI1ynuQVlePPuFZw6hdGCrAZYvKnzb0Wjw582P/XsSyXC4mnyq/dD4S wxZrsyJFbUcn5LoST1fmio3u6M/tV8eYzLO5Zvg7oWtPAv0Kqlk= =oW3z -----END PGP SIGNATURE----- Merge tag 'pinctrl-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix a really interesting potential core bug in the list iterator requireing the use of READ_ONCE() discovered when testing kernel compiles with clang. - Check devm_kcalloc() return value and an array bounds in the STM32 driver. - Fix an exotic string truncation issue in the s32cc driver, found by the kernel test robot (impressive!) - Fix an undocumented struct member in the cy8c95x0 driver. - Fix a symbol overlap with MIPS in the Lochnagar driver, MIPS defines a global symbol "RST" which is a bit too generic and collide with stuff. OK this one should be renamed too, we will fix that as well. - Fix erroneous branch taking in the Realtek driver. - Fix the mail address in MAINTAINERS for the s32g2 driver. * tag 'pinctrl-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: dt-bindings: pinctrl: s32g2: change a maintainer email address pinctrl: realtek: Fix logical error when finding descriptor pinctrl: lochnagar: Don't build on MIPS pinctrl: avoid reload of p state in list iteration pinctrl: cy8c95x0: Fix doc warning pinctrl: s32cc: Avoid possible string truncation pinctrl: stm32: fix array read out of bound pinctrl: stm32: Add check for devm_kcalloc	2023-11-29 06:45:22 -08:00
Andrii Nakryiko	40d0eb0259	Merge branch 'selftests-bpf-use-pkg-config-to-determine-ld-flags' Akihiko Odaki says: ==================== selftests/bpf: Use pkg-config to determine ld flags When linking statically, libraries may require other dependencies to be included to ld flags. In particular, libelf may require libzstd. Use pkg-config to determine such dependencies. V4 -> V5: Introduced variables LIBELF_CFLAGS and LIBELF_LIBS. (Daniel Borkmann) Added patch "selftests/bpf: Choose pkg-config for the target". V3 -> V4: Added "2> /dev/null". V2 -> V3: Added missing "echo". V1 -> V2: Implemented fallback, referring to HOSTPKG_CONFIG. ==================== Link: https://lore.kernel.org/r/20231125084253.85025-1-akihiko.odaki@daynix.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-11-28 22:15:30 -08:00
Akihiko Odaki	8998a479fd	selftests/bpf: Use pkg-config for libelf When linking statically, libraries may require other dependencies to be included to ld flags. In particular, libelf may require libzstd. Use pkg-config to determine such dependencies. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-4-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Akihiko Odaki	18f6f9de98	selftests/bpf: Override PKG_CONFIG for static builds A library may need to depend on additional archive files for static builds so pkg-config should be instructed to list them. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-3-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Akihiko Odaki	2ce344b689	selftests/bpf: Choose pkg-config for the target pkg-config is used to build sign-file executable. It should use the library for the target instead of the host as it is called during tests. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231125084253.85025-2-akihiko.odaki@daynix.com	2023-11-28 22:15:29 -08:00
Andrii Nakryiko	d4e7dd4842	Merge branch 'bpf-add-link_info-support-for-uprobe-multi-link' Jiri Olsa says: ==================== bpf: Add link_info support for uprobe multi link hi, this patchset adds support to get bpf_link_info details for uprobe_multi links and adding support for bpftool link to display them. v4 changes: - move flags field up in bpf_uprobe_multi_link [Andrii] - include zero terminating byte in path_size [Andrii] - return d_path error directly [Yonghong] - use SEC(".probes") for semaphores [Yonghong] - fix ref_ctr_offsets leak in test [Yonghong] - other smaller fixes [Yonghong] thanks, jirka --- ==================== Link: https://lore.kernel.org/r/20231125193130.834322-1-jolsa@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-11-28 21:50:10 -08:00
Jiri Olsa	a7795698f8	bpftool: Add support to display uprobe_multi links Adding support to display details for uprobe_multi links, both plain: # bpftool link -p ... 24: uprobe_multi prog 126 uprobe.multi path /home/jolsa/bpf/test_progs func_cnt 3 pid 4143 offset ref_ctr_offset cookies 0xd1f88 0xf5d5a8 0xdead 0xd1f8f 0xf5d5aa 0xbeef 0xd1f96 0xf5d5ac 0xcafe and json: # bpftool link -p [{ ... },{ "id": 24, "type": "uprobe_multi", "prog_id": 126, "retprobe": false, "path": "/home/jolsa/bpf/test_progs", "func_cnt": 3, "pid": 4143, "funcs": [{ "offset": 860040, "ref_ctr_offset": 16111016, "cookie": 57005 },{ "offset": 860047, "ref_ctr_offset": 16111018, "cookie": 48879 },{ "offset": 860054, "ref_ctr_offset": 16111020, "cookie": 51966 } ] } ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-7-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	147c69307b	selftests/bpf: Add link_info test for uprobe_multi link Adding fill_link_info test for uprobe_multi link. Setting up uprobes with bogus ref_ctr_offsets and cookie values to test all the bpf_link_info::uprobe_multi fields. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231125193130.834322-6-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	1703612885	selftests/bpf: Use bpf_link__destroy in fill_link_info tests The fill_link_info test keeps skeleton open and just creates various links. We are wrongly calling bpf_link__detach after each test to close them, we need to call bpf_link__destroy. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20231125193130.834322-5-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	e56fdbfb06	bpf: Add link_info support for uprobe multi link Adding support to get uprobe_link details through bpf_link_info interface. Adding new struct uprobe_multi to struct bpf_link_info to carry the uprobe_multi link details. The uprobe_multi.count is passed from user space to denote size of array fields (offsets/ref_ctr_offsets/cookies). The actual array size is stored back to uprobe_multi.count (allowing user to find out the actual array size) and array fields are populated up to the user passed size. All the non-array fields (path/count/flags/pid) are always set. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20231125193130.834322-4-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	4930b7f53a	bpf: Store ref_ctr_offsets values in bpf_uprobe array We will need to return ref_ctr_offsets values through link_info interface in following change, so we need to keep them around. Storing ref_ctr_offsets values directly into bpf_uprobe array. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-3-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jiri Olsa	48f0dfd8d3	libbpf: Add st_type argument to elf_resolve_syms_offsets function We need to get offsets for static variables in following changes, so making elf_resolve_syms_offsets to take st_type value as argument and passing it to elf_sym_iter_new. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20231125193130.834322-2-jolsa@kernel.org	2023-11-28 21:50:09 -08:00
Jakub Kicinski	f1be1e04c7	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-11-27 (i40e, iavf) This series contains updates to i40e and iavf drivers. Ivan Vecera performs more cleanups on i40e and iavf drivers; removing unused fields, defines, and unneeded fields. Petr Oros utilizes iavf_schedule_aq_request() helper to replace open coded equivalents. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: use iavf_schedule_aq_request() helper iavf: Remove queue tracking fields from iavf_adminq_ring i40e: Remove queue tracking fields from i40e_adminq_ring i40e: Remove AQ register definitions for VF types i40e: Delete unused and useless i40e_pf fields ==================== Link: https://lore.kernel.org/r/20231127211037.1135403-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-28 20:10:30 -08:00
Heiner Kallweit	cd04b44bf0	r8169: remove multicast filter limit Once upon a time, when r8169 was new, the multicast filter limit code was copied from RTL8139 driver. There the filter limit is even user-configurable. The filtering is hash-based and we don't have perfect filtering. Actually the mc filtering on RTL8125 still seems to be the same as used on 8390/NE2000. So it's not clear to me which benefit it should bring when switching to all-multi mode once a certain number of filter bits is set. More the opposite: Filtering out at least some unwanted mc traffic is better than no filtering. Also the available chip documentation doesn't mention any restriction. Therefore remove the filter limit. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/57076c05-3730-40d1-ab9a-5334b263e41a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-28 20:09:36 -08:00
Dan Carpenter	1bc9d12e1c	ice: fix error code in ice_eswitch_attach() Set the "err" variable on this error path. Fixes: `fff292b47a` ("ice: add VF representors one by one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://lore.kernel.org/r/e0349ee5-76e6-4ff4-812f-4aa0d3f76ae7@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-28 20:09:11 -08:00
Yu Xiao	4540c29ab9	nfp: ethtool: support TX/RX pause frame on/off Add support for ethtool -A tx on/off and rx on/off. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231127055116.6668-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-28 20:08:29 -08:00
Yoshihiro Shimoda	9870257a0a	ravb: Fix races between ravb_tx_timeout_work() and net related ops Fix races between ravb_tx_timeout_work() and functions of net_device_ops and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that since ravb_close() is under the rtnl lock and calls cancel_work_sync(), ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock may happen in ravb_tx_timeout_work() like below: CPU0 CPU1 ravb_tx_timeout() schedule_work() ... __dev_close_many() // Under rtnl lock ravb_close() cancel_work_sync() // Waiting ravb_tx_timeout_work() rtnl_lock() // This is possible to cause a deadlock If rtnl_trylock() fails, rescheduling the work with sleep for 1 msec. Fixes: `c156633f13` ("Renesas Ethernet AVB driver proper") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20231127122420.3706751-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-28 19:54:24 -08:00
Linus Torvalds	18d46e76d7	for-6.7-rc3-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmVmHvEACgkQxWXV+ddt WDtczA//UdxSPxQwJY1oOQj3k5Zgb/zOThfyD4x5wrDxFAYwVh1XhuyxV2XT8qyq ipp+mi9doVykZKLSY+oRAeqjGlF0nIYm1z2PGSU7JXz4VsEj970rsDs3ePwH5TW6 V8/6VraOIIvmOnTft7aiuM8CXUjyndalNl7RvHu+v6grgAAgQAaly/3CmRIsm/Ui 2Wb6/J8ciAOBZ8TkFMr0PiTJd+CjUL+1Y9IaYEywujf0nVNJSgHp+R2CpwLDvM/5 1x6LdRrUKmnY7mhOvC7QwWfQmgsPnj3OuR+3+L+8jULTvcpwka2KEcpCH8/s6mUK +4XhKQ4xXOJr8M+KmAUpy1yZZ30G6cDSwnfCWbWRCfR03396tTb08kb6G21fR+NL o2qEUOe4DoMbYX/5zd9xEVqbwyGhAIXB0fJ7KJ0RqbaNBh/roRALBVCseP2CFwJE P0DE9phjeIGQf3ybdfP7XqnMfk520bqoeV49Akbn2us2SrV1+O9Yjqmj2pbTnljE M30Jh/btaiTFtsGB3MBDRRnGhf7F2l1dsmdmMVhdOK8HMY6obcJUdv6YXVLAjBDn ATWtUVVizOpHvSZL0G/+1fXqHhLqOnHLY4A97uMjcElK5WJfuYZv8vZK7GVKC/jW y5F4w/FPxU8dmhorMGksya2CLMvUsv5dikyAzGHirjEAdyrK1jg= =85Pb -----END PGP SIGNATURE----- Merge tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few fixes and message updates: - for simple quotas, handle the case when a snapshot is created and the target qgroup already exists - fix a warning when file descriptor given to send ioctl is not writable - fix off-by-one condition when checking chunk maps - free pages when page array allocation fails during compression read, other cases were handled - fix memory leak on error handling path in ref-verify debugging feature - copy missing struct member 'version' in 64/32bit compat send ioctl - tree-checker verifies inline backref ordering - print messages to syslog on first mount and last unmount - update error messages when reading chunk maps" * tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: send: ensure send_fd is writable btrfs: free the allocated memory if btrfs_alloc_page_array() fails btrfs: fix 64bit compat send ioctl arguments not initializing version member btrfs: make error messages more clear when getting a chunk map btrfs: fix off-by-one when checking chunk map includes logical address btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod() btrfs: add dmesg output for first mount and last unmount of a filesystem btrfs: do not abort transaction if there is already an existing qgroup btrfs: tree-checker: add type and sequence check for inline backrefs	2023-11-28 11:16:04 -08:00
Paolo Abeni	a379972973	Merge branch 'net-page_pool-add-netlink-based-introspection' Jakub Kicinski says: ==================== net: page_pool: add netlink-based introspection We recently started to deploy newer kernels / drivers at Meta, making significant use of page pools for the first time. We immediately run into page pool leaks both real and false positive warnings. As Eric pointed out/predicted there's no guarantee that applications will read / close their sockets so a page pool page may be stuck in a socket (but not leaked) forever. This happens a lot in our fleet. Most of these are obviously due to application bugs but we should not be printing kernel warnings due to minor application resource leaks. Conversely the page pool memory may get leaked at runtime, and we have no way to detect / track that, unless someone reconfigures the NIC and destroys the page pools which leaked the pages. The solution presented here is to expose the memory use of page pools via netlink. This allows for continuous monitoring of memory used by page pools, regardless if they were destroyed or not. Sample in patch 15 can print the memory use and recycling efficiency: $ ./page-pool eth0[2] page pools: 10 (zombies: 0) refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) v4: - use dev_net(netdev)->loopback_dev - extend inflight doc v3: https://lore.kernel.org/all/20231122034420.1158898-1-kuba@kernel.org/ - ID is still here, can't decide if it matters - rename destroyed -> detach-time, good enough? - fix build for netsec v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org - hopefully fix build with PAGE_POOL=n v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/ - The main change compared to the RFC is that the API now exposes outstanding references and byte counts even for "live" page pools. The warning is no longer printed if page pool is accessible via netlink. RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20231126230740.2148636-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:43 +01:00
Jakub Kicinski	637567e4a3	tools: ynl: add sample for getting page-pool information Regenerate the tools/ code after netdev spec changes. Add sample to query page-pool info in a concise fashion: $ ./page-pool eth0[2] page pools: 10 (zombies: 0) refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	be0096676e	net: page_pool: mute the periodic warning for visible page pools Mute the periodic "stalled pool shutdown" warning if the page pool is visible to user space. Rolling out a driver using page pools to just a few hundred hosts at Meta surfaces applications which fail to reap their broken sockets. Obviously it's best if the applications are fixed, but we don't generally print warnings for application resource leaks. Admins can now depend on the netlink interface for getting page pool info to detect buggy apps. While at it throw in the ID of the pool into the message, in rare cases (pools from destroyed netns) this will make finding the pool with a debugger easier. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	d49010adae	net: page_pool: expose page pool stats via netlink Dump the stats into netlink. More clever approaches like dumping the stats per-CPU for each CPU individually to see where the packets get consumed can be implemented in the future. A trimmed example from a real (but recently booted system): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-stats-get [{'info': {'id': 19, 'ifindex': 2}, 'alloc-empty': 48, 'alloc-fast': 3024, 'alloc-refill': 0, 'alloc-slow': 48, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 0, 'recycle-cached': 0, 'recycle-released-refcnt': 0, 'recycle-ring': 0, 'recycle-ring-full': 0}, {'info': {'id': 18, 'ifindex': 2}, 'alloc-empty': 66, 'alloc-fast': 11811, 'alloc-refill': 35, 'alloc-slow': 66, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 1145, 'recycle-cached': 6541, 'recycle-released-refcnt': 0, 'recycle-ring': 1275, 'recycle-ring-full': 0}, {'info': {'id': 17, 'ifindex': 2}, 'alloc-empty': 73, 'alloc-fast': 62099, 'alloc-refill': 413, ... Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	69cb4952b6	net: page_pool: report when page pool was destroyed Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leaked page pools we currently print to dmesg. Example output for a fake leaked page pool using some hacks in netdevsim (one "live" pool, and one "leaked" on the same dev): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 2, 'ifindex': 3}, {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}] Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	7aee8429ee	net: page_pool: report amount of memory held by page pools Advanced deployments need the ability to check memory use of various system components. It makes it possible to make informed decisions about memory allocation and to find regressions and leaks. Report memory use of page pools. Report both number of references and bytes held. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	d2ef6aa077	net: page_pool: add netlink notifications for state changes Generate netlink notifications about page pool state changes. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	950ab53b77	net: page_pool: implement GET in the netlink API Expose the very basic page pool information via netlink. Example using ynl-py for a system with 9 queues: $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 19, 'ifindex': 2, 'napi-id': 147}, {'id': 18, 'ifindex': 2, 'napi-id': 146}, {'id': 17, 'ifindex': 2, 'napi-id': 145}, {'id': 16, 'ifindex': 2, 'napi-id': 144}, {'id': 15, 'ifindex': 2, 'napi-id': 143}, {'id': 14, 'ifindex': 2, 'napi-id': 142}, {'id': 13, 'ifindex': 2, 'napi-id': 141}, {'id': 12, 'ifindex': 2, 'napi-id': 140}, {'id': 11, 'ifindex': 2, 'napi-id': 139}, {'id': 10, 'ifindex': 2, 'napi-id': 138}] Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	839ff60df3	net: page_pool: add nlspec for basic access to page pools Add a Netlink spec in YAML for getting very basic information about page pools. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	7cc9e6d77f	eth: link netdev to page_pools in drivers Link page pool instances to netdev for the drivers which already link to NAPI. Unless the driver is doing something very weird per-NAPI should imply per-netdev. Add netsec as well, Ilias indicates that it fits the mold. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	02b3de80c5	net: page_pool: stash the NAPI ID for easier access To avoid any issues with race conditions on accessing napi and having to think about the lifetime of NAPI objects in netlink GET - stash the napi_id to which page pool was linked at creation time. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	083772c9f9	net: page_pool: record pools per netdev Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about moving them as the netdev gets moved. Or we record them directly on the netdev so they move with the netdev without any extra work. Implement the latter option. Since pools may outlast netdev we need a place to store orphans. In time honored tradition use loopback for this purpose. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	f17c69649c	net: page_pool: id the page pools To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Jakub Kicinski	23cfaf67ba	net: page_pool: factor out uninit We'll soon (next change in the series) need a fuller unwind path in page_pool_create() so create the inverse of page_pool_init(). Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 15:48:39 +01:00
Heiner Kallweit	91d3d14997	r8169: prevent potential deadlock in rtl8169_close ndo_stop() is RTNL-protected by net core, and the worker function takes RTNL as well. Therefore we will deadlock when trying to execute a pending work synchronously. To fix this execute any pending work asynchronously. This will do no harm because netif_running() is false in ndo_stop(), and therefore the work function is effectively a no-op. However we have to ensure that no task is running or pending after rtl_remove_one(), therefore add a call to cancel_work_sync(). Fixes: `abe5fc42f9` ("r8169: use RTNL to protect critical sections") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/12395867-1d17-4cac-aa7d-c691938fcddf@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-11-28 12:52:49 +01:00

... 3 4 5 6 7 ...

1234857 Commits