IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* reorder buffer filter checks can cause bad shift/UBSAN
warning with newer HW, avoid the check (mac80211)
* add Kconfig dependency for iwlwifi for PTP clock usage
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmTkrKMACgkQ10qiO8sP
aAD6/A//bR8/98qCxJqeZurTw83JwWDo7J5Twthk4ik4cpi35s+7bqVYc34a1vIT
poIiIxffZRUSsHOoMXpTP6xd7gLP9Hz6Ba8Jd7X9NG+/lfdpJlWGLDFri3JpNREi
Jjqq8XOzB7+c+TwrK85j7nxY1JXLqLzXOxetAtuIZEqVrmC+T6+nnAw1ITLfCVL/
fVaAFx3mDxTLi/qbAYnYPBKY5Kq1KH0F3j389vfIVPzTXBmLcYppZ17Lz38io0pZ
MC0NO3W0U4MFX3i5P2Q/yjlUsuwJJROqJmnAOcy9+McQU9nFHrIAV3Na9G1mZbyS
xmdcbjokhcE95ht8JI9yHESAbtACaSM4jq4W03zlgvvj3TrVt+1bDjjXLexqb6kc
YinRoVc3Dn+Nvw8DrDa/1PMuO1YAlVg3ZYXRyjlL4dcLbz0SOiEHFyTIjIKflZYZ
TbstNDbygIxBBtmVx/aJBZZoFo3G6e5FGQrVZ0uvDPqVaaFvs3ESv+ooXek9gl06
OlzngnrTJO52Ky4quCTtmR16+J3GeAUjZNUsQKqNHu28zzBsF+CP8j+OCUTPQTe0
YZuqBqnSDPUcllBxXEDp2pPm102q0DvHeKXg7cugdY2zyzvjgrxBSBwh5xwUA4sT
RMI55Gok6gcdzQ1GHHRzLK6UkKzgmAbTjMypJkVV6Qh7dG6RNJo=
=edA4
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Two fixes:
- reorder buffer filter checks can cause bad shift/UBSAN
warning with newer HW, avoid the check (mac80211)
- add Kconfig dependency for iwlwifi for PTP clock usage
* tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
wifi: iwlwifi: mvm: add dependency for PTP clock
====================
Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Highlights include:
Stable fixes
- NFS: Fix a use after free in nfs_direct_join_group()
Bugfixes
- NFS: Fix a sysfs server name memory leak
- NFS: Fix a lock recovery hang in NFSv4.0
- NFS: Fix page free in the error path for nfs42_proc_getxattr
- NFS: Fix page free in the error path for __nfs4_get_acl_uncached
- SUNRPC/rdma: Fix receive buffer dma-mapping after a server disconnect
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmTjgEEACgkQZwvnipYK
APItFA//WzGcKbujlMXpiRdvUg6k6CfG/ikBRB1UwQEyZjK/tVZ96qt6UuHGNMbz
b8GaGls7NRYJKezAcMSW9QMMPYVyG0PLwxOW6BPwsZS61Zn6HMeM1YRboaZEid7f
JrUNhbUXHl6bVWrBNEtcr3IN/5ERU4sGCAa4A3uWdNxGyffD/avrK06/bfmE/SJi
+7LVPp0M9rM5X5Z1c407TbWfg+L81Q9t0tTz7II3Ba9i2BzQ0uhQhyVUQAGF767u
Vua4XWTRoqG1es+tA4iuwZ3KtaqXoaMRDWPLGTkmBrY+pAo+u4IPzY5LCwfUu6kI
vttkZU5b0b05+UomJ1d+Muzr8uEjRmBhIHZsP6lgVVmuNzqkDb0gCGkfix87J+RO
0QmDZ9D0ftJxsb8fSdp8iy8NqmqJ6X4FhsylRtANEuCrf8+zrkUlBJi47CCwpYDD
8gq6SoTfA8MmiSgzrBuYkJe2HSx7c2csDl3xp5KrJX2IHODjbzlHC05fNadTWc6W
0jQvq1cJ2xBYDNSxkG0Trsd3lTTao3rZC4M7imVVjTTOHS8X1LNCLkbZ7LVnA8rn
0F+lp/h1qs/daXSp0aMG5wyvZNkx5rsJ23o+InNCjiCh3cDvoi9mg6DN5bQK8Foy
Iqd2MTgxrMaF/FUbdGLdnFX4GQkgFPng8TpdX8sqqm1JHUprpqg=
=nd41
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- fix a use after free in nfs_direct_join_group() (Cc: stable)
- fix sysfs server name memory leak
- fix lock recovery hang in NFSv4.0
- fix page free in the error path for nfs42_proc_getxattr() and
__nfs4_get_acl_uncached()
- SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect
* tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
xprtrdma: Remap Receive buffers after a reconnect
NFSv4: fix out path in __nfs4_get_acl_uncached
NFSv4.2: fix error handling in nfs42_proc_getxattr
NFS: Fix sysfs server name memory leak
NFS: Fix a use after free in nfs_direct_join_group()
NFSv4: Fix dropped lock for racing OPEN and delegation return
abort early so task can exit faster if a fatal signal is pending,
no need to continue validation in that case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Prefer `strscpy_pad` as it's a more robust interface whilst maintaing
zero-padding behavior.
There may have existed a bug here due to both `tbl->repl.name` and
`info->name` having a size of 32 as defined below:
| #define XT_TABLE_MAXNAMELEN 32
This may lead to buffer overreads in some situations -- `strscpy` solves
this by guaranteeing NUL-termination of the dest buffer.
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Use `strscpy_pad` over `strncpy` for NUL-terminated strings.
We can also drop the + 1 from `NFT_OSF_MAXGENRELEN + 1` since `strscpy`
will guarantee NUL-termination.
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following
warning appears:
In function ‘fortify_memcpy_chk’,
inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2:
./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Werror=attribute-warning]
592 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The compiler is complaining:
memcpy(&offsets[1], &entry->watchers_offset,
sizeof(offsets) - sizeof(offsets[0]));
where memcpy reads beyong &entry->watchers_offset to copy
{watchers,target,next}_offset altogether into offsets[]. Silence the
warning by wrapping these three up via struct_group().
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
When the ifdown function in the dst_ops structure is referenced, the input
parameter 'how' is always true. In the current implementation of the
ifdown interface, ip6_dst_ifdown does not use the input parameter 'how',
xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter
'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and
xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown
function.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cfg80211_rnr_elems.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-5-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cfg80211_pmsr_request.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-4-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cfg80211_mbssid_elems.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-3-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cfg80211_cqm_config.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-2-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cfg80211_acl_data.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230817211531.4193219-1-keescook@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor.
Instead of requiring a list of pages or bvecs, we can just use an
iov_iter directly, and avoid extra allocations.
We assume that the pages represented by the iter are pinned such that
they shouldn't incur page faults, which is the case for the iov_iters
created by netfs.
While working on this, Al Viro informed me that he was going to change
iov_iter_get_pages to auto-advance the iterator as that pattern is more
or less required for ITER_PIPE anyway. We emulate that here for now by
advancing in the _next op and tracking that amount in the "lastlen"
field.
In the event that _next is called twice without an intervening
_advance, we revert the iov_iter by the remaining lastlen before
calling iov_iter_get_pages.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Have get_reply check for the presence of sparse read ops in the
request and set the sparse_read boolean in the msg. That will queue the
messenger layer to use the sparse read codepath instead of the normal
data receive.
Add a new sparse_read operation for the OSD client, driven by its own
state machine. The messenger will repeatedly call the sparse_read
operation, and it will pass back the necessary info to set up to read
the next extent of data, while zero-filling the sparse regions.
The state machine will stop at the end of the last extent, and will
attach the extent map buffer to the ceph_osd_req_op so that the caller
can use it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add 2 new fields to ceph_connection_v1_info to track the necessary info
in sparse reads. Skip initializing the cursor for a sparse read.
Break out read_partial_message_section into a wrapper around a new
read_partial_message_chunk function that doesn't zero out the crc first.
Add new helper functions to drive receiving into the destinations
provided by the sparse_read state machine.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add a new init_sgs_pages helper that populates the scatterlist from
an arbitrary point in an array of pages.
Change setup_message_sgs to take an optional pointer to an array of
pages. If that's set, then the scatterlist will be set using that
array instead of the cursor.
When given a sparse read on a secure connection, decrypt the data
in-place rather than into the final destination, by passing it the
in_enc_pages array.
After decrypting, run the sparse_read state machine in a loop, copying
data from the decrypted pages until it's complete.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add support for a new sparse_read ceph_connection operation. The idea is
that the client driver can define this operation use it to do special
handling for incoming reads.
The alloc_msg routine will look at the request and determine whether the
reply is expected to be sparse. If it is, then we'll dispatch to a
different set of state machine states that will repeatedly call the
driver's sparse_read op to get length and placement info for reading the
extent map, and the extents themselves.
This necessitates adding some new field to some other structs:
- The msg gets a new bool to track whether it's a sparse_read request.
- A new field is added to the cursor to track the amount remaining in the
current extent. This is used to cap the read from the socket into the
msg_data
- Handing a revoke with all of this is particularly difficult, so I've
added a new data_len_remain field to the v2 connection info, and then
use that to skip that much on a revoke. We may want to expand the use of
that to the normal read path as well, just for consistency's sake.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When the OSD sends back a sparse read reply, it contains an array of
these structures. Define the structure and add a couple of helpers for
dealing with them.
Also add a place in struct ceph_osd_req_op to store the extent buffer,
and code to free it if it's populated when the req is torn down.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In a later patch, we're going to need to search for a request in
the rbtree, but taking the o_mutex is inconvenient as we already
hold the con mutex at the point where we need it.
Add a new spinlock that we take when inserting and erasing entries from
the o_requests tree. Search of the rbtree can be done with either the
mutex or the spinlock, but insertion and removal requires both.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The commit 06470f7468c8 ("mac80211: add API to allow filtering frames in BA sessions")
added reorder_buf_filtered to mark frames filtered by firmware, and it
can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
it stores the bitmap in a u64 variable.
However, new HE or EHT devices can support BlockAck number up to 256 or
1024, and then using a higher subframe index leads UBSAN warning:
UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
shift exponent 215 is too large for 64-bit type 'long long unsigned int'
Call Trace:
<IRQ>
dump_stack_lvl+0x48/0x70
dump_stack+0x10/0x20
__ubsan_handle_shift_out_of_bounds+0x1ac/0x360
ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
ieee80211_rx_list+0xaef/0xf60 [mac80211]
ieee80211_rx_napi+0x53/0xd0 [mac80211]
Since only old hardware that supports <=64 BlockAck uses
ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
WARN_ONCE() and comment to note to avoid using this function if hardware
capability is not suitable.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
sk_getsockopt() runs locklessly. This means sk->sk_lingertime
can be read while other threads are changing its value.
Other reads also happen without socket lock being held,
and must be annotated.
Remove preprocessor logic using BITS_PER_LONG, compilers
are smart enough to figure this by themselves.
v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning
(Jakub)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack info for IPv4 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.
No extack message for the ifa_local checking in __inet_insert_ifa() as
it has been checked in find_matching_ifa().
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After deleting an IPv6 address on an interface and cleaning up the
related preferred source entries, it is important to ensure that all
routes associated with the deleted address are properly cleared. The
current implementation of rt6_remove_prefsrc() only checks the preferred
source addresses bound to the current device. However, there may be
routes that are bound to other devices but still utilize the same
preferred source address.
To address this issue, it is necessary to also delete entries that are
bound to other interfaces but share the same source address with the
current device. Failure to delete these entries would leave routes that
are bound to the deleted address unclear. Here is an example reproducer
(I have omitted unrelated routes):
+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 1:2:3:4::5/64 dev dummy1
+ ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5
+ ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5
+ ip -6 route show
1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium
7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
As Ido reminds, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device, which is different with IPv4.
So, while removing the device checking, we also need to add an
ipv6_chk_addr() check to make sure the address does not exist on the other
devices of the rt nexthop device's VRF.
After fix:
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 metric 1024 pref medium
Reported-by: Thomas Haller <thaller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2170513
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
addrconf_prefix_rcv returned early without releasing the inet6_dev
pointer when the PIO lifetime is less than accept_ra_min_lft.
Fixes: 5027d54a9c30 ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting IP_RECVERR and IPV6_RECVERR options to zero currently
purges the socket error queue, which was probably not expected
for zerocopy and tx_timestamp users.
I discovered this issue while preparing commit 6b5f43ea0815
("inet: move inet->recverr to inet->inet_flags"), I presume this
change does not need to be backported to stable kernels.
Add skb_errqueue_purge() helper to purge error messages only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.
v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================
Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
veth and vxcan need to make sure the ifindexes of the peer
are not negative, core does not validate this.
Using iproute2 with user-space-level checking removed:
Before:
# ./ip link add index 10 type veth peer index -1
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
-1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
Now:
$ ./ip link add index 10 type veth peer index -1
Error: ifindex can't be negative.
This problem surfaced in net-next because an explicit WARN()
was added, the root cause is older.
Fixes: e6f8f1a739b6 ("veth: Allow to create peer link with given ifindex")
Fixes: a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_queue_purge() and __skb_queue_purge() become wrappers
around the new generic functions.
New SKB_DROP_REASON_QUEUE_PURGE drop reason is added,
but users can start adding more specific reasons.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support max links per lgr negotiation in clc handshake for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for the final
value of max links based on the client preferred max links and
self-preferred max links. Here use the minimum value of the client
preferred max links and server preferred max links.
Client Server
Proposal(max links(client preferred))
-------------------------------------->
Accept(max links(accepted value))
accepted value=min(client preferred, server preferred)
<-------------------------------------
Confirm(max links(accepted value))
------------------------------------->
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support max connections per lgr negotiation for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for
the final value of max conns based on the client preferred
max conns and self-preferred max conns. Here use the minimum
value of client preferred max conns and server preferred max
conns.
Client Server
Proposal(max conns(client preferred))
------------------------------------>
Accept(max conns(accepted value))
accepted value=min(client preferred, server preferred)
<-----------------------------------
Confirm(max conns(accepted value))
----------------------------------->
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support SMC v2.x features validate for SMC v2.1. This is the frame
code for SMC v2.x features validate, and will take effects only when
the negotiated release version is v2.1 or later.
For Server, v2.x features' validation should be done in smc_clc_srv_
v2x_features_validate when receiving v2.1 or later CLC Proposal Message,
such as max conns, max links negotiation, the decision of the final
value of max conns and max links should be made in this function.
And final check for server when receiving v2.1 or later CLC Confirm
Message should be done in smc_clc_v2x_features_confirm_check.
For client, v2.x features' validation should be done in smc_clc_clnt_
v2x_features_validate when receiving v2.1 or later CLC Accept Message,
for example, the decision to accpt the accepted value or to decline
should be made in this function.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add vendor unique experimental options area in clc handshake. In clc
accept and confirm msg, vendor unique experimental options use the
16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext
in previous version. Because of the struct smc_clc_first_contact_ext
is widely used and limit the scope of modification, this patch moves
the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and
followed with the struct smc_clc_first_contact_ext in a new struct
names struct smc_clc_first_contact_ext_v2x.
For SMC-R first connection, in previous version, the struct smc_clc_
first_contact_ext and the 16-Bytes reserved field has already been
included in clc accept and confirm msg. Thus, this patch use struct
smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_
contact_ext and the 16-Bytes reserved field in SMC-R clc accept and
confirm msg is compatible with previous version.
For SMC-D first connection, in previous version, only the struct smc_
clc_first_contact_ext is included in clc accept and confirm msg, and
the 16-Bytes reserved field is not included. Thus, when the negotiated
smc release version is the version before v2.1, we still use struct
smc_clc_first_contact_ext for compatible consideration. If the negotiated
smc release version is v2.1 or later, use struct smc_clc_first_contact_
ext_v2x instead.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support smc release version negotiation in clc handshake based on
SMC v2, where no negotiation process for different releases, but
for different versions. The latest smc release version was updated
to v2.1. And currently there are two release versions of SMCv2, v2.0
and v2.1. In the release version negotiation, client sends the preferred
release version by CLC Proposal Message, server makes decision for which
release version to use based on the client preferred release version and
self-supported release version (here choose the minimum release version
of the client preferred and server latest supported), then the decision
returns to client by CLC Accept Message. Client confirms the decision by
CLC Confirm Message.
Client Server
Proposal(preferred release version)
------------------------------------>
Accept(accpeted release version)
min(client preferred, server latest supported)
<------------------------------------
Confirm(accpeted release version)
------------------------------------>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We changed tcp_poll() over time, bug never updated dccp.
Note that we also could remove dccp instead of maintaining it.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enabling BIG TCP on a low end platform apparently increased
chances of getting flows locked on one busy TX queue.
A similar problem was handled in commit 9b462d02d6dd
("tcp: TCP Small Queues and strange attractors"),
but the strategy worked for either bulk flows,
or 'large enough' RPC. BIG TCP changed how large
RPC needed to be to enable the work around:
If RPC fits in a single skb, TSQ never triggers.
Root cause for the problem is a busy TX queue,
with delayed TX completions.
This patch changes how we set skb->ooo_okay to detect
the case TX completion was not done, but incoming ACK
already was processed and emptied rtx queue.
Update the comment to explain the tricky details.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*prot->memory_pressure is read/writen locklessly, we need
to add proper annotations.
A recent commit added a new race, it is time to audit all accesses.
Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()")
Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Abel Wu <wuyun.abel@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited fixes commit introduced linecard notifications for register,
however it didn't add them for unregister. Fix that by adding them.
Fixes: c246f9b5fd61 ("devlink: add support to create line card and expose to user")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- bump version strings, by Simon Wunderlich
- Remove unused declarations, by Yue Haibing
- Clean up MTU handling, by Sven Eckelmann (2 patches)
- Clean up/remove (obsolete) functions, by Sven Eckelmann (3 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmTc+z8WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoeEDD/44gdQmazggfYU2KXn42Pq8CLWR
Nsr4ljmTXiCVh1a/Gw3pRvT7vjH12Jv+nlzoEXzv+iAHKnchVvq1zAguIMSQJymQ
6MbZNHYMckwb6zBXxYGC3qaDI3Zf67ECMEmQkX6ZvwYbXYUiAR+/P7UPlzc145Ur
jWS0wgm9kn8631ZQoWsETIapaN8X9N9Iqt1kDLm29bMlQoEmE6XdpGgpv4JWFpwp
8lDqwcI+v4dt/QsNd07xka0+3lpnQB6bYeyYDjWY1ae04OME5B81PqbbehGNYigp
VP2yepBtfWmWM2nCLqM9tK98ALw+vxtFN9K0M/9p7jd9TyVEWephD5SBrA9aJiKL
ruOIu1YNe1nQ1CFIcc28MiK/1o8ljDUVBr8Mt4OhZ+Pn9fhOlWLy1ZqahiFmley/
DcmwZfreyyNZ1BIrVSsLHGgvJ+X3j4llNFdtk7oB8LplBL3wMVpvrLSwDH34aZbS
dNuWanCEjoqpBEnMLD3J6BgFhvfBjuANXDmpWR6fvXz4WNFCyirvInldG1aJFHKL
HyUPG6ANlT7q6A+vgWHDVsEcxIuiCZlBuNlj/Z2Sg6DAq4WTPcdg+/BYnPusDe4t
M0tAHnkn/dCOCkA0Iq4vPZREgj/xvip+yTfBp5R8hMNibL5TOGoYOLVk/q0X5Dah
ON1TAvgsFbymf129Jg==
=f1Uk
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Remove unused declarations, by Yue Haibing
- Clean up MTU handling, by Sven Eckelmann (2 patches)
- Clean up/remove (obsolete) functions, by Sven Eckelmann (3 patches)
* tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
batman-adv: Drop per algo GW section class code
batman-adv: Keep batadv_netlink_notify_* static
batman-adv: Drop unused function batadv_gw_bandwidth_set
batman-adv: Check hardif MTU against runtime MTU
batman-adv: Avoid magic value for minimum MTU
batman-adv: Remove unused declarations
batman-adv: Start new development cycle
====================
Link: https://lore.kernel.org/r/20230816164000.190884-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann
- Fix header access for memory reallocation case, by Remi Pommarel
- Fix two memory leaks (2 patches), by Remi Pommarel
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmTc+eYWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobgFD/9oBvQKj9rObbnHIgxe3ZZ4x4po
FOln4eLv83YZwVP54BC0X8COymO+gd3tBbGg9U1s9kpn+hIOXi7zI8xnmS/jrKGB
t8tDQ/1S9laCfanfDoHDdQ96ifJfQR6Mp7ZH1e64L22Ag5hKjVoGeQp2Mf5X2S+S
7ZFdhofr/ZNi0Tz4Y+Jw9bh3W5TMnwSFfexSIfUJmh+06RGRRspOx2WbbgArMs12
hP4UST0cIfIr0CinBQz+LiyT90GgC6r+xjkQrP3LgzYegC7eBW+bQxLgCtnk+Hic
+t+aS3SnityZzFyaNJrULX7/u8WJumW4udu0jDl9raAWIJBUV5pNr7sNagQ45mvE
NZ4/VnWGg6MnjdPC6CIuU6AuCLZYn1NiE6mp1vuFxMpqmiJUhMjwjTp8DaLpQZCV
vDYca/bBuDMbTIl5LxQ965svbNVDiAS6gNHbrVs2k3bq3Ji7QS1M7MVR3npehGT0
xInqQNO7QJ0c+/PaFMTZwi0LKk8qwvHLggsZyKXJ6i6YBH7YG1LTbEtqLT7rH3nv
sHfTyPzw7b0oFh7/rSMzvf1P9yiMG1ZCY622uhU1M+CxA2Axr1Lcq1yc3umDP4Ds
BM3JKilHsj/mM7g/HobBs7eg+BL9/KjNKFbj12Bc0xV7IXC4SNTlKSJUN0ZlGSsH
RP7dNXOzS0LUErpARw==
=EsR1
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann
- Fix header access for memory reallocation case, by Remi Pommarel
- Fix two memory leaks (2 patches), by Remi Pommarel
* tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Don't increase MTU when set by user
batman-adv: Trigger events for auto adjusted MTU
====================
Link: https://lore.kernel.org/r/20230816163318.189996-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
now is the time to use this flag for networking as proposed
earlier see link.
The SKB (sk_buff) kmem_cache slab is critical for network performance.
Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
performance by amortising the alloc/free cost.
For the bulk API to perform efficiently the slub fragmentation need to
be low. Especially for the SLUB allocator, the efficiency of bulk free
API depend on objects belonging to the same slab (page).
When running different network performance microbenchmarks, I started
to notice that performance was reduced (slightly) when machines had
longer uptimes. I believe the cause was 'skbuff_head_cache' got
aliased/merged into the general slub for 256 bytes sized objects (with
my kernel config, without CONFIG_HARDENED_USERCOPY).
For SKB kmem_cache network stack have other various reasons for
not merging, but it varies depending on kernel config (e.g.
CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.
When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
disabled. Thus, for this case drop the SLAB_NO_MERGE flag.
Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/169211265663.1491038.8580163757548985946.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/sfc/tc.c
fa165e194997 ("sfc: don't unregister flow_indr if it was never registered")
3bf969e88ada ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>