linux

iv/linux

Author	SHA1	Message	Date
Donald Hunter	8c1b74a26d	doc: netlink: Add hyperlinks to generated Netlink docs Update ynl-gen-rst to generate hyperlinks to definitions, attribute sets and sub-messages from all the places that reference them. Note that there is a single label namespace for all of the kernel docs. Hyperlinks within a single netlink doc need to be qualified by the family name to avoid collisions. The label format is 'family-type-name' which gives, for example, 'rt-link-attribute-set-link-attrs' as the link id. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240329135021.52534-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-01 21:26:02 -07:00
Donald Hunter	4cc1730a90	doc: netlink: Change generated docs to limit TOC to depth 3 The tables of contents in the generated Netlink docs include individual attribute definitions. This can make the contents exceedingly long and repeats a lot of what is on the rest of the pages. See for example: https://docs.kernel.org/networking/netlink_spec/tc.html Add a depth limit to the contents directive in generated .rst files to limit the contents depth to 3 levels. This reduces the contents to: - Family - Summary - Operations - op-one - op-two - ... - Definitions - struct-one - struct-two - enum-one - ... - Attribute sets - attrs-one - attrs-two - ... Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240329135021.52534-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-01 21:26:01 -07:00
David S. Miller	3b4cf29bda	Merge branch 'net-rps-misc' Eric Dumazet says: ==================== net: rps: misc changes Make RPS/RFS a bit more efficient with better cache locality and heuristics. Aso shrink include/linux/netdevice.h a bit. v2: fixed a build issue in patch 6/8 with CONFIG_RPS=n (Jakub and kernel build bots) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	d3ae5f4632	net: rps: move received_rps field to a better location Commit `14d898f3c1` ("dev: Move received_rps counter next to RPS members in softnet data") was unfortunate: received_rps is dirtied by a cpu and never read by other cpus in fast path. Its presence in the hot RPS cache line (shared by many cpus) is hurting RPS/RFS performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	c62fdf5b11	net: rps: add rps_input_queue_head_add() helper process_backlog() can batch increments of sd->input_queue_head, saving some memory bandwidth. Also add READ_ONCE()/WRITE_ONCE() annotations around sd->input_queue_head accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	36b83ffcf2	net: rps: change input_queue_tail_incr_save() input_queue_tail_incr_save() is incrementing the sd queue_tail and save it in the flow last_qtail. Two issues here : - no lock protects the write on last_qtail, we should use appropriate annotations. - We can perform this write after releasing the per-cpu backlog lock, to decrease this lock hold duration (move away the cache line miss) Also move input_queue_head_incr() and rps helpers to include/net/rps.h, while adding rps_ prefix to better reflect their role. v2: Fixed a build issue (Jakub and kernel build bots) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	f7efd01fe2	net: enqueue_to_backlog() cleanup We can remove a goto and a label by reversing a condition. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	a7ae7b0b2e	net: make softnet_data.dropped an atomic_t If under extreme cpu backlog pressure enqueue_to_backlog() has to drop a packet, it could do this without dirtying a cache line and potentially slowing down the target cpu. Move sd->dropped into a separate cache line, and make it atomic. In non pressure mode, this field is not touched, no need to consume valuable space in a hot cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:32 +01:00
Eric Dumazet	95e48d862a	net: enqueue_to_backlog() change vs not running device If the device attached to the packet given to enqueue_to_backlog() is not running, we drop the packet. But we accidentally increase sd->dropped, giving false signals to admins: sd->dropped should be reserved to cpu backlog pressure, not to temporary glitches at device dismantles. While we are at it, perform the netif_running() test before we get the rps lock, and use REASON_DEV_READY drop reason instead of NOT_SPECIFIED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:31 +01:00
Eric Dumazet	2fe50a4d72	net: move dev_xmit_recursion() helpers to net/core/dev.h Move dev_xmit_recursion() and friends to net/core/dev.h They are only used from net/core/dev.c and net/core/filter.c. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:31 +01:00
Eric Dumazet	b9495b564d	net: move kick_defer_list_purge() to net/core/dev.h kick_defer_list_purge() is defined in net/core/dev.c and used from net/core/skubff.c Because we need softnet_data, include <linux/netdevice.h> from net/core/dev.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 11:28:31 +01:00
David S. Miller	d823265dd4	Merge branch 'ice-pfcp-filter' Alexander Lobakin says: ==================== ice: add PFCP filter support Add support for creating PFCP filters in switchdev mode. Add pfcp module that allows to create a PFCP-type netdev. The netdev then can be passed to tc when creating a filter to indicate that PFCP filter should be created. To add a PFCP filter, a special netdev must be created and passed to tc command: ip link add pfcp0 type pfcp tc filter add dev eth0 ingress prio 1 flower pfcp_opts \ 1:12ab/ff:fffffffffffffff0 skip_hw action mirred egress redirect \ dev pfcp0 Changes in iproute2 [1] are required to use pfcp_opts in tc. ICE COMMS package is required as it contains PFCP profiles. Part of this patchset modifies IP_TUNNEL__OPTs, which were previously stored in a __be16. All possible values have already been used, making it impossible to add new ones. 1-3: add new bitmap_{read,write}(), which is used later in the IP tunnel flags code (from Alexander's ARM64 MTE series[2]); * 4-14: some bitmap code preparations also used later in IP tunnels; * 15-17: convert IP tunnel flags from __be16 to a bitmap; * 18-21: add PFCP module and support for it in ice. [1] https://lore.kernel.org/netdev/20230614091758.11180-1-marcin.szycik@linux.intel.com [2] https://lore.kernel.org/linux-kernel/20231218124033.551770-1-glider@google.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:29 +01:00
Marcin Szycik	784feaa65d	ice: Add support for PFCP hardware offload in switchdev Add support for creating PFCP filters in switchdev mode. Add support for parsing PFCP-specific tc options: S flag and SEID. To create a PFCP filter, a special netdev must be created and passed to tc command: ip link add pfcp0 type pfcp tc filter add dev eth0 ingress prio 1 flower pfcp_opts \ 1:123/ff:fffffffffffffff0 skip_hw action mirred egress redirect \ dev pfcp0 Changes in iproute2 [1] are required to be able to use pfcp_opts in tc. ICE COMMS package is required to create a filter as it contains PFCP profiles. Link: https://lore.kernel.org/netdev/20230614091758.11180-1-marcin.szycik@linux.intel.com [1] Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:29 +01:00
Marcin Szycik	2312dfdfab	ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS FLOW_DISSECTOR_KEY_ENC_OPTS can be used for multiple headers, but currently it is treated as GTP-exclusive in ice. Rename ICE_TC_FLWR_FIELD_ENC_OPTS to ICE_TC_FLWR_FIELD_GTP_OPTS and check for tunnel type earlier. After this refactor, it is easier to add new headers using FLOW_DISSECTOR_KEY_ENC_OPTS - instead of checking tunnel type in ice_tc_count_lkups() and ice_tc_fill_tunnel_outer(), it needs to be checked only once, in ice_parse_tunnel_attr(). Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:29 +01:00
Michal Swiatkowski	6dd514f481	pfcp: always set pfcp metadata In PFCP receive path set metadata needed by flower code to do correct classification based on this metadata. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Wojciech Drewek	76c8764ef3	pfcp: add PFCP module Packet Forwarding Control Protocol (PFCP) is a 3GPP Protocol used between the control plane and the user plane function. It is specified in TS 29.244[1]. Note that this module is not designed to support this Protocol in the kernel space. There is no support for parsing any PFCP messages. There is no API that could be used by any userspace daemon. Basically it does not support PFCP. This protocol is sophisticated and there is no need for implementing it in the kernel. The purpose of this module is to allow users to setup software and hardware offload of PFCP packets using tc tool. When user requests to create a PFCP device, a new socket is created. The socket is set up with port number 8805 which is specific for PFCP [29.244 4.2.2]. This allow to receive PFCP request messages, response messages use other ports. Note that only one PFCP netdev can be created. Only IPv4 is supported at this time. [1] https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3111 Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	5b2be2ab76	net: net_test: add tests for IP tunnel flags conversion helpers Now that there are helpers for converting IP tunnel flags between the old __be16 format and the bitmap format, make sure they work as expected by adding a couple of tests to the networking testing suite. The helpers are all inline, so no dependencies on the related CONFIG_* (or a standalone module) are needed. Cover three possible cases: 1. No bits past BIT(15) are set, VTI/SIT bits are not set. This conversion is almost a direct assignment. 2. No bits past BIT(15) are set, but VTI/SIT bit is set. During the conversion, it must be transformed into BIT(16) in the bitmap, but still compatible with the __be16 format. 3. The bitmap has bits past BIT(15) set (not the VTI/SIT one). The result will be truncated. Note that currently __IP_TUNNEL_FLAG_NUM is 17 (incl. special), which means that the result of this case is currently semi-false-positive. When BIT(17) is finally here, it will be adjusted accordingly. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	5832c4a77d	ip_tunnel: convert __be16 tunnel flags to bitmaps Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL__BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_ occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [] ip_tunnel.ko: 390/-900 (-510) [] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [] ip6_tunnel.ko: 118/-10 (108) [] gre_flags_to_tnl_flags() grew, but still is inlined [] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	117aef12a7	ip_tunnel: use a separate struct to store tunnel params in the kernel Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure to store params inside the kernel, IPv4 tunnel code uses the same ip_tunnel_parm which is being used to talk with the userspace. This makes it difficult to alter or add any fields or use a different format for whatever data. Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for now, and use it throughout the code. Define the pieces, where the copy user <-> kernel happens, as standalone functions, and copy the data there field-by-field, so that the kernel-side structure could be easily modified later on and the users wouldn't have to care about this. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	7adaf37f7f	lib/bitmap: add compile-time test for __assign_bit() optimization Commit `dc34d50366` ("lib: test_bitmap: add compile-time optimization/evaluations assertions") initially missed __assign_bit(), which led to that quite a time passed before I realized it doesn't get optimized at compilation time. Now that it does, add test for that just to make sure nothing will break one day. To make things more interesting, use bitmap_complement() and bitmap_full(), thus checking their compile-time evaluation as well. And remove the misleading comment mentioning the workaround removed recently in favor of adding the whole file to GCov exceptions. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	b44759705f	bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}() Now that we have generic bitmap_read() and bitmap_write(), which are inline and try to take care of non-bound-crossing and aligned cases to keep them optimized, collapse bitmap_{get,set}_value8() into simple wrappers around the former ones. bloat-o-meter shows no difference in vmlinux and -2 bytes for gpio-pca953x.ko, which says the optimization didn't suffer due to that change. The converted helpers have the value width embedded and always compile-time constant and that helps a lot. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	a37fbe666c	bitmap: introduce generic optimized bitmap_size() The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	10a04ff09b	tools: move alignment-related macros to new <linux/align.h> Currently, tools have ALIGN() macros scattered across the unrelated headers, as there are only 3 of them and they were added separately each time on an as-needed basis. Anyway, let's make it more consistent with the kernel headers and allow using those macros outside of the mentioned headers. Create <linux/align.h> inside the tools/ folder and include it where needed. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	4ca532d646	btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() bitmap_set_bits() does not start with the FS' prefix and may collide with a new generic helper one day. It operates with the FS-specific types, so there's no change those two could do the same thing. Just add the prefix to exclude such possible conflict. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:28 +01:00
Alexander Lobakin	3f5ef5109f	fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64() bitmap_size() is a pretty generic name and one may want to use it for a generic bitmap API function. At the same time, its logic is NTFS-specific, as it aligns to the sizeof(u64), not the sizeof(long) (although it uses ideologically right ALIGN() instead of division). Add the prefix 'ntfs3_' used for that FS (not just 'ntfs_' to not mix it with the legacy module) and use generic BITS_TO_U64() while at it. Suggested-by: Yury Norov <yury.norov@gmail.com> # BITS_TO_U64() Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Lobakin	c1023f5634	s390/cio: rename bitmap_size() -> idset_bitmap_size() bitmap_size() is a pretty generic name and one may want to use it for a generic bitmap API function. At the same time, its logic is not "generic", i.e. it's not just `nbits -> size of bitmap in bytes` converter as it would be expected from its name. Add the prefix 'idset_' used throughout the file where the function resides. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Lobakin	8fab6a9d72	linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Since commit `b03fc1173c` ("bitops: let optimize out non-atomic bitops on compile-time constants"), the non-atomic bitops are macros which can be expanded by the compilers into compile-time expressions, which will result in better optimized object code. Unfortunately, turned out that passing `volatile` to those macros discards any possibility of optimization, as the compilers then don't even try to look whether the passed bitmap is known at compilation time. In addition to that, the mentioned linkmode helpers are marked with `inline`, not `__always_inline`, meaning that it's not guaranteed some compiler won't uninline them for no reason, which will also effectively prevent them from being optimized (it's a well-known thing the compilers sometimes uninline `2 + 2`). Convert linkmode_*_bit() from inlines to macros. Their calling convention are 1:1 with the corresponding bitops, so that it's not even needed to enumerate and map the arguments, only the names. No changes in vmlinux' object code (compiled by LLVM for x86_64) whatsoever, but that doesn't necessarily means the change is meaningless. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Lobakin	5259401ef8	bitops: let the compiler optimize {__,}assign_bit() Since commit `b03fc1173c` ("bitops: let optimize out non-atomic bitops on compile-time constants"), the compilers are able to expand inline bitmap operations to compile-time initializers when possible. However, during the round of replacement if-__set-else-__clear with __assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes difference in object code size for one module (even one function), where the pattern: DECLARE_BITMAP(foo) = { }; // on the stack, zeroed if (a) __set_bit(const_bit_num, foo); if (b) __set_bit(another_const_bit_num, foo); ... is heavily used, although there should be no difference: the bitmap is zeroed, so the second half of __assign_bit() should be compiled-out as a no-op. I either missed the fact that __assign_bit() has bitmap pointer marked as `volatile` (as we usually do for bitops) or was hoping that the compilers would at least try to look past the `volatile` for __always_inline functions. Anyhow, due to that attribute, the compilers were always compiling the whole expression and no mentioned compile-time optimizations were working. Convert __assign_bit() to a macro since it's a very simple if-else and all of the checks are performed inside __set_bit() and __clear_bit(), thus that wrapper has to be as transparent as possible. After that change, despite it showing only -20 bytes change for vmlinux (due to that it's still relatively unpopular), no drastic code size changes happen when replacing if-set-else-clear for onstack bitmaps with __assign_bit(), meaning the compiler now expands them to the actual operations will all the expected optimizations. Atomic assign_bit() is less affected due to its nature, but let's convert it to a macro as well to keep the code consistent and not leave a place for possible suboptimal codegen. Moreover, with certain kernel configuration it actually gives some saves (x86): do_ip_setsockopt 4154 4099 -55 Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Lobakin	7d8296b250	bitops: make BYTES_TO_BITS() treewide-available Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Lobakin	72cc1980a0	bitops: add missing prototype check Commit `8238b45798` ("wait_on_bit: add an acquire memory barrier") added a new bitop, test_bit_acquire(), with proper wrapping in order to try to optimize it at compile-time, but missed the list of bitops used for checking their prototypes a bit below. The functions added have consistent prototypes, so that no more changes are required and no functional changes take place. Fixes: `8238b45798` ("wait_on_bit: add an acquire memory barrier") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Potapenko	f3e28876b6	lib/test_bitmap: use pr_info() for non-error messages pr_err() messages may be treated as errors by some log readers, so let us only use them for test failures. For non-error messages, replace them with pr_info(). Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Alexander Potapenko	991e558364	lib/test_bitmap: add tests for bitmap_{read,write}() Add basic tests ensuring that values can be added at arbitrary positions of the bitmap, including those spanning into the adjacent unsigned longs. Two new performance tests, test_bitmap_read_perf() and test_bitmap_write_perf(), can be used to assess future performance improvements of bitmap_read() and bitmap_write(): [ 0.431119][ T1] test_bitmap: Time spent in test_bitmap_read_perf: 615253 [ 0.433197][ T1] test_bitmap: Time spent in test_bitmap_write_perf: 916313 (numbers from a Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz machine running QEMU). Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Syed Nayyar Waris	63c15822b8	lib/bitmap: add bitmap_{read,write}() The two new functions allow reading/writing values of length up to BITS_PER_LONG bits at arbitrary position in the bitmap. The code was taken from "bitops: Introduce the for_each_set_clump macro" by Syed Nayyar Waris with a number of changes and simplifications: - instead of using roundup(), which adds an unnecessary dependency on <linux/math.h>, we calculate space as BITS_PER_LONG-offset; - indentation is reduced by not using else-clauses (suggested by checkpatch for bitmap_get_value()); - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read() and bitmap_write(); - some redundant computations are omitted. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com> Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/lkml/fe12eedf3666f4af5138de0e70b67a07c7f40338.1592224129.git.syednwaris@gmail.com/ Suggested-by: Yury Norov <yury.norov@gmail.com> Co-developed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Jakub Kicinski	d79b28fd34	Merge branch 'add-property-in-dwmac-stm32-documentation' Christophe Roullier says: ==================== Add property in dwmac-stm32 documentation Introduce property in dwmac-stm32 documentation - st,ext-phyclk: is present since 2020 in driver so need to explain it and avoid dtbs check issue : views/kernel/upstream/net-next/arch/arm/boot/dts/st/stm32mp157c-dk2.dtb: ethernet@5800a000: Unevaluated properties are not allowed ('st,ext-phyclk' was unexpected) Furthermore this property will be use in upstream of MP13 dwmac glue. (next step) ==================== Link: https://lore.kernel.org/r/20240328185337.332703-1-christophe.roullier@foss.st.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:42:14 -07:00
Christophe Roullier	929107d3d2	dt-bindings: net: dwmac: Document STM32 property st,ext-phyclk The Linux kernel dwmac-stm32 driver currently supports three DT properties used to configure whether PHY clock are generated by the MAC or supplied to the MAC from the PHY. Originally there were two properties, st,eth-clk-sel and st,eth-ref-clk-sel, each used to configure MAC clocking in different bus mode and for different MAC clock frequency. Since it is possible to determine the MAC 'eth-ck' clock frequency from the clock subsystem and PHY bus mode from the 'phy-mode' property, two disparate DT properties are no longer required to configure MAC clocking. Linux kernel commit `1bb694e208` ("net: ethernet: stmmac: simplify phy modes management for stm32") introduced a third, unified, property st,ext-phyclk. This property covers both use cases of st,eth-clk-sel and st,eth-ref-clk-sel DT properties, as well as a new use case for 25 MHz clock generated by the MAC. The third property st,ext-phyclk is so far undocumented, document it. Below table summarizes the clock requirement and clock sources for supported PHY interface modes. __________________________________________________________________________ \|PHY_MODE \| Normal \| PHY wo crystal\| PHY wo crystal \|No 125Mhz from PHY\| \| \| \| 25MHz \| 50MHz \| \| --------------------------------------------------------------------------- \| MII \| - \| eth-ck \| n/a \| n/a \| \| \| \| st,ext-phyclk \| \| \| --------------------------------------------------------------------------- \| GMII \| - \| eth-ck \| n/a \| n/a \| \| \| \| st,ext-phyclk \| \| \| --------------------------------------------------------------------------- \| RGMII \| - \| eth-ck \| n/a \| eth-ck \| \| \| \| st,ext-phyclk \| \| st,eth-clk-sel or\| \| \| \| \| \| st,ext-phyclk \| --------------------------------------------------------------------------- \| RMII \| - \| eth-ck \| eth-ck \| n/a \| \| \| \| st,ext-phyclk \| st,eth-ref-clk-sel \| \| \| \| \| \| or st,ext-phyclk \| \| --------------------------------------------------------------------------- Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Link: https://lore.kernel.org/r/20240328185337.332703-2-christophe.roullier@foss.st.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:42:11 -07:00
Johannes Berg	e8058a49e6	netlink: introduce type-checking attribute iteration There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:06:02 -07:00
Jakub Kicinski	9494dc0b08	Merge branch 'udp-small-changes-on-receive-path' Eric Dumazet says: ==================== udp: small changes on receive path This series is based on an observation I made in UDP receive path. The sock_def_readable() costs are pretty high, especially when epoll is used to generate EPOLLIN events. First patch annotates races on sk->sk_rcvbuf reads. Second patch replaces an atomic_add_return() with a less expensive atomic_add() Third patch avoids calling sock_def_readable() when possible. Fourth patch adds sk_wake_async_rcu() to get better inlining and code generation. ==================== Link: https://lore.kernel.org/r/20240328144032.1864988-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:03:14 -07:00
Eric Dumazet	1abe267f17	net: add sk_wake_async_rcu() helper While looking at UDP receive performance, I saw sk_wake_async() was no longer inlined. This matters at least on AMD Zen1-4 platforms (see SRSO) This might be because rcu_read_lock() and rcu_read_unlock() are no longer nops in recent kernels ? Add sk_wake_async_rcu() variant, which must be called from contexts already holding rcu lock. As SOCK_FASYNC is deprecated in modern days, use unlikely() to give a hint to the compiler. sk_wake_async_rcu() is properly inlined from __udp_enqueue_schedule_skb() and sock_def_readable(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240328144032.1864988-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:03:11 -07:00
Eric Dumazet	612b1c0dec	udp: avoid calling sock_def_readable() if possible sock_def_readable() is quite expensive (particularly when ep_poll_callback() is in the picture). We must call sk->sk_data_ready() when : - receive queue was empty, or - SO_PEEK_OFF is enabled on the socket, or - sk->sk_data_ready is not sock_def_readable. We still need to call sk_wake_async(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240328144032.1864988-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:03:11 -07:00
Eric Dumazet	6a1f12dd85	udp: relax atomic operation on sk->sk_rmem_alloc atomic_add_return() is more expensive than atomic_add() and seems overkill in UDP rx fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240328144032.1864988-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:03:10 -07:00
Eric Dumazet	6055796995	udp: annotate data-race in __udp_enqueue_schedule_skb() sk->sk_rcvbuf is read locklessly twice, while other threads could change its value. Use a READ_ONCE() to annotate the race. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240328144032.1864988-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 15:03:10 -07:00
Jakub Kicinski	46dc11bee2	Merge branch 'address-remaining-wtautological-constant-out-of-range-compare' Arnd Bergmann says: ==================== address remaining -Wtautological-constant-out-of-range-compare The warning option was introduced a few years ago but left disabled by default. All of the actual bugs that this has found have been fixed in the meantime, and this series should address the remaining false-positives, as tested on arm/arm64/x86 randconfigs as well as allmodconfig builds for all architectures supported by clang. ==================== Link: https://lore.kernel.org/r/20240328143051.1069575-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:47:58 -07:00
Arnd Bergmann	a5535e5336	mlx5: stop warning for 64KB pages When building with 64KB pages, clang points out that xsk->chunk_size can never be PAGE_SIZE: drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:19:22: error: result of comparison of constant 65536 with expression of type 'u16' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (xsk->chunk_size > PAGE_SIZE \|\| ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~ In older versions of this code, using PAGE_SIZE was the only possibility, so this would have never worked on 64KB page kernels, but the patch apparently did not address this case completely. As Maxim Mikityanskiy suggested, 64KB chunks are really not all that useful, so just shut up the warning by adding a cast. Fixes: `282c0c798f` ("net/mlx5e: Allow XSK frames smaller than a page") Link: https://lore.kernel.org/netdev/20211013150232.2942146-1-arnd@kernel.org/ Link: https://lore.kernel.org/lkml/a7b27541-0ebb-4f2d-bd06-270a4d404613@app.fastmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240328143051.1069575-9-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:47:58 -07:00
Suraj Gupta	06c2a5cd48	net: axienet: Fix kernel doc warnings Add description of mdio enable, mdio disable and mdio wait functions. Add description of skb pointer in axidma_bd data structure. Remove 'phy_node' description in axienet local data structure since it is not a valid struct member. Correct description of struct axienet_option. Fix below kernel-doc warnings in drivers/net/ethernet/xilinx/: 1) xilinx_axienet_mdio.c:1: warning: no structured comments found 2) xilinx_axienet.h:379: warning: Function parameter or struct member 'skb' not described in 'axidma_bd' 3) xilinx_axienet.h:538: warning: Excess struct member 'phy_node' description in 'axienet_local' 4) xilinx_axienet.h:1002: warning: expecting prototype for struct axiethernet_option. Prototype was for struct axienet_option instead Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://lore.kernel.org/r/20240328110713.12885-1-suraj.gupta2@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:41:22 -07:00
Su Hui	1ab6fe64d2	octeontx2-pf: remove unused variables req_hdr and rsp_hdr Clang static checker(scan-buid): drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:503:2: warning: Value stored to 'rsp_hdr' is never read [deadcode.DeadStores] Remove these unused variables to save some space. Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240328020723.4071539-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:39:23 -07:00
Krzysztof Kozlowski	e93af72286	nfc: st95hf: drop driver owner assignment Core in spi_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240327174810.519676-4-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:32:51 -07:00
Krzysztof Kozlowski	e3c95d5619	nfc: mrvl: spi: drop driver owner assignment Core in spi_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240327174810.519676-3-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:32:51 -07:00
Krzysztof Kozlowski	3439412061	net: wwan: mhi: drop driver owner assignment Core in mhi_driver_register() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Link: https://lore.kernel.org/r/20240327174810.519676-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:32:51 -07:00
Krzysztof Kozlowski	648bb2bf44	net: microchip: encx24j600: drop driver owner assignment Core in spi_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240327174810.519676-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:32:50 -07:00
Donald Hunter	bd3ce405fe	tools/net/ynl: Add extack policy attribute decoding The NLMSGERR_ATTR_POLICY extack attribute has been ignored by ynl up to now. Extend extack decoding to include _POLICY and the nested NL_POLICY_TYPE_ATTR_* attributes. For example: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_link.yaml \ --create --do newlink --json '{ "ifname": "12345678901234567890", "linkinfo": {"kind": "bridge"} }' Netlink error: Numerical result out of range nl_len = 104 (88) nl_flags = 0x300 nl_type = 2 error: -34 extack: {'msg': 'Attribute failed policy validation', 'policy': {'max-length': 15, 'type': 'string'}, 'bad-attr': '.ifname'} Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240328155636.64688-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-29 12:26:02 -07:00

1 2 3 4 5 ...

1264886 Commits