IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Go-To-Vport action on RX is not allowed when the vport is uplink.
In such case, the packet should be dropped.
Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When FW provides ICM addresses for drop RX/TX, the provided capability
is 64 bits that contain its GVMI as well as the ICM address itself.
In case of TX DROP this GVMI is different from the GVMI that the
domain is operating on.
This patch fixes the action to use these GVMI IDs, as provided by FW.
Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To enable multicast packets which are offloaded in bridge multicast
offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should
be set. Add this bit to FTE for the bridge multicast offload rules.
Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The processing of traffic in hairpin queues occurs in HW/FW and does not
involve the cpus, hence the upper bound on max num channels does not
apply to them. Using this bound for the hairpin RQT max_table_size is
wrong. It could be too small, and cause the error below [1]. As the
RQT size provided on init does not get modified later, use the same
value for both actual and max table sizes.
[1]
mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1200): CREATE_RQT(0x916) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x538faf), err(-22)
Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Indirection (*) is of lower precedence than postfix increment (++). Logic
in napi_poll context would cause an out-of-bound read by first increment
the pointer address by byte address space and then dereference the value.
Rather, the intended logic was to dereference first and then increment the
underlying value.
Fixes: 92214be5979c ("net/mlx5e: Update doorbell for port timestamping CQ before the software counter")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.
Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit moved the code of mlx5e_create_tises() and changed the
loop to create TISes over MLX5_MAX_PORTS constant value, instead of
getting the correct lag ports supported by the device, which can cause
FW errors on devices with less than MLX5_MAX_PORTS ports.
Change that back to mlx5e_get_num_lag_ports(mdev).
Also IPoIB interfaces create there own TISes, they don't use the eth
TISes, pass a flag to indicate that.
This fixes the following errors that might appear in kernel log:
mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22)
mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22
Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.
However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].
Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.
As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.
[1]
unreferenced object 0xffff888107e28800 (size 2048):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[......
01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab374e>] __kmalloc+0x4e/0x90
[<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
[<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
[<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8.....
10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m....
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
[<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
[<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
[<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
[<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[2]
# tc qdisc add dev swp1 clsact
# tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
# tc qdisc del dev swp1 clsact
# devlink dev reload pci/0000:06:00.0
Fixes: bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE
is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a
result, the default VMBus ring buffer size on ARM64 with 64K page size
is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break
anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120
Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer).
Unfortunately, the module parameter specifying the ring buffer size
is in units of 4 Kbyte pages. Ideally, it should be in units that
are independent of PAGE_SIZE, but backwards compatibility prevents
changing that now.
Fix this by having netvsc_drv_init() hardcode 4096 instead of using
PAGE_SIZE when calculating the ring buffer size in bytes. Also
use the VMBUS_RING_SIZE macro to ensure proper alignment when running
with page size larger than 4K.
Cc: <stable@vger.kernel.org> # 5.15.x
Fixes: 7aff79e297ee ("Drivers: hv: Enable Hyper-V code to be built on ARM64")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The most visible fix here is the ath11k crash fix which was introduced
in v6.7. We also have a fix for iwlwifi memory corruption and few
smaller fixes in the stack.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmWuipMRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZt17wgAhrkxpwRpMuRrV6VxHl9m+NXk7is2vni2
JZbqlvMIw1Hm+40K9D0WgFdNZUeAtBcd567MAbiqdzqRNB9DtEvnsXIKlKINwxIA
QFskkXR1f0sj79Hz3q7iWQq+jxDvAU5tge/WU65Na7+224sdyzBg7DZab8/buOsm
1xdx69MtGNU+dm4+V1Xp8h9jB7WAjq7N+ZhC6YfH6QSCL7JSL9Co/NC098gBnAEx
cm59vPOxk8+QoHKDjjmClTIhxOEgR6pSM8T3Dne9OYO8ONhxqdVSgd0Br+mEZgQ4
r61i88zK6ZmVZYckk6fhuGCLiKC6CFwS0eCLDQnKK1ufyRxDi84Y/Q==
=Cwmf
-----END PGP SIGNATURE-----
Merge tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.8-rc2
The most visible fix here is the ath11k crash fix which was introduced
in v6.7. We also have a fix for iwlwifi memory corruption and few
smaller fixes in the stack.
* tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix race condition on enabling fast-xmit
wifi: iwlwifi: fix a memory corruption
wifi: mac80211: fix potential sta-link leak
wifi: cfg80211/mac80211: remove dependency on non-existing option
wifi: cfg80211: fix missing interfaces when dumping
wifi: ath11k: rely on mac80211 debugfs handling for vif
wifi: p54: fix GCC format truncation warning with wiphy->fw_version
====================
Link: https://lore.kernel.org/r/20240122153434.E0254C433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I analyze the potential sleeping issue of the following processes:
Thread A Thread B
... netlink_create //ref = 1
do_mq_notify ...
sock = netlink_getsockbyfilp ... //ref = 2
info->notify_sock = sock; ...
... netlink_sendmsg
... skb = netlink_alloc_large_skb //skb->head is vmalloced
... netlink_unicast
... sk = netlink_getsockbyportid //ref = 3
... netlink_sendskb
... __netlink_sendskb
... skb_queue_tail //put skb to sk_receive_queue
... sock_put //ref = 2
... ...
... netlink_release
... deferred_put_nlk_sk //ref = 1
mqueue_flush_file
spin_lock
remove_notification
netlink_sendskb
sock_put //ref = 0
sk_free
...
__sk_destruct
netlink_sock_destruct
skb_queue_purge //get skb from sk_receive_queue
...
__skb_queue_purge_reason
kfree_skb_reason
__kfree_skb
...
skb_release_all
skb_release_head_state
netlink_skb_destructor
vfree(skb->head) //sleeping while holding spinlock
In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
When the mqueue executes flush, the sleeping bug will occur. Use
vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to
fire somewhat randomly.
# # RUN so_incoming_cpu.before_reuseport.test3 ...
# # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0)
# # test3: Test terminated by assertion
# # FAIL so_incoming_cpu.before_reuseport.test3
# not ok 3 so_incoming_cpu.before_reuseport.test3
When the test failed, not-yet-accepted CLOSE_WAIT sockets received
SYN with a "challenging" SEQ number, which was sent from an unexpected
CPU that did not create the receiver.
The test basically does:
1. for each cpu:
1-1. create a server
1-2. set SO_INCOMING_CPU
2. for each cpu:
2-1. set cpu affinity
2-2. create some clients
2-3. let clients connect() to the server on the same cpu
2-4. close() clients
3. for each server:
3-1. accept() all child sockets
3-2. check if all children have the same SO_INCOMING_CPU with the server
The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse.
In a loop of 2., close() changed the client state to FIN_WAIT_2, and
the peer transitioned to CLOSE_WAIT.
In another loop of 2., connect() happened to select the same port of
the FIN_WAIT_2 socket, and it was reused as the default value of
net.ipv4.tcp_tw_reuse is 2.
As a result, the new client sent SYN to the CLOSE_WAIT socket from
a different CPU, and the receiver's sk_incoming_cpu was overwritten
with unexpected CPU ID.
Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket
responded with Challenge ACK. The new client properly returned RST
and effectively killed the CLOSE_WAIT socket.
This way, all clients were created successfully, but the error was
detected later by 3-2., ASSERT_EQ(cpu, i).
To avoid the failure, let's make sure that (i) the number of clients
is less than the number of available ports and (ii) such reuse never
happens.
Fixes: 6df96146b202 ("selftest: Add test for SO_INCOMING_CPU.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.
Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e732d ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.
No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.91ms
75.000% 1.13ms
90.000% 1.46ms
99.000% 1.74ms
99.900% 1.89ms
99.990% 41.95ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 48.96ms
With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.90ms
75.000% 1.13ms
90.000% 1.45ms
99.000% 1.72ms
99.900% 1.83ms
99.990% 2.11ms <<< no 40+ ms extra latency
99.999% 2.53ms
100.000% 2.62ms
Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.
Fixes: 7aa5470c2c09 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Syzcaller UBSAN crash occurs in rds_cmsg_recv(),
which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1),
but with array size of 4 (RDS_RX_MAX_TRACES).
Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from
trace.rx_trace_pos[i] in rds_recv_track_latency(),
with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the
off-by-one bounds check in rds_recv_track_latency() to prevent
a potential crash in rds_cmsg_recv().
Found by syzcaller:
=================================================================
UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39
index 4 is out of range for type 'u64 [4]'
CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348
rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585
rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716
sock_recvmsg_nosec net/socket.c:1044 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1066
__sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246
__do_sys_recvfrom net/socket.c:2264 [inline]
__se_sys_recvfrom net/socket.c:2260 [inline]
__x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
==================================================================
Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7LfYFvkRsOVcw@mail.gmail.com/
Signed-off-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW has the capability to check each frame if it is a PTP frame,
which domain it is, which ptp frame type it is, different ip address in
the frame. And if one of these checks fail then the frame is not
timestamp. Most of these checks were disabled except checking the field
minorVersionPTP inside the PTP header. Meaning that once a partner sends
a frame compliant to 8021AS which has minorVersionPTP set to 1, then the
frame was not timestamp because the HW expected by default a value of 0
in minorVersionPTP. This is exactly the same issue as on lan8841.
Fix this issue by removing this check so the userspace can decide on this.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of multiple kernel module instances using the same dpll device:
if only one registers dpll device, then only that one can register
directly connected pins with a dpll device. When unregistered parent is
responsible for determining if the muxed pin can be registered with it
or not, the drivers need to be loaded in serialized order to work
correctly - first the driver instance which registers the direct pins
needs to be loaded, then the other instances could register muxed type
pins.
Allow registration of a pin with a parent even if the parent was not
yet registered, thus allow ability for unserialized driver instance
load order.
Do not WARN_ON notification for unregistered pin, which can be invoked
for described case, instead just return error.
Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If parent pin was unregistered but child pin was not, the userspace
would see the "zombie" pins - the ones that were registered with
a parent pin (dpll_pin_on_pin_register(..)).
Technically those are not available - as there is no dpll device in the
system. Do not dump those pins and prevent userspace from any
interaction with them. Provide a unified function to determine if the
pin is available and use it before acting/responding for user requests.
Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a kernel module is unbound but the pin resources were not entirely
freed (other kernel module instance of the same PCI device have had kept
the reference to that pin), and kernel module is again bound, the pin
properties would not be updated (the properties are only assigned when
memory for the pin is allocated), prop pointer still points to the
kernel module memory of the kernel module which was deallocated on the
unbind.
If the pin dump is invoked in this state, the result is a kernel crash.
Prevent the crash by storing persistent pin properties in dpll subsystem,
copy the content from the kernel module when pin is allocated, instead of
using memory of the kernel module.
Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If pin type is not expected, or pin properities failed to allocate
memory, the unwind error path shall not destroy pin's xarrays, which
were not yet initialized.
Add new goto label and use it to fix broken error path.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunjian Wang says:
====================
fixes for tun
There are few places on the receive path where packet receives and
packet drops were not accounted for. This patchset fixes that issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The TUN can be used as vhost-net backend, and it is necessary to
count the packets transmitted from TUN to vhost-net/virtio-net.
However, there are some places in the receive path that were not
taken into account when using XDP. It would be beneficial to also
include new accounting for successfully received bytes using
dev_sw_netstats_rx_add.
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 8ae1aff0b331 ("tuntap: split out XDP logic") includes
dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions.
Unfortunately, that commit missed the dropped counter when error
occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes
this issue.
Fixes: 8ae1aff0b331 ("tuntap: split out XDP logic")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark reports a BUG() when a net namespace is removed.
kernel BUG at net/core/dev.c:11520!
Physical interfaces moved outside of init_net get "refunded"
to init_net when that namespace disappears. The main interface
name may get overwritten in the process if it would have
conflicted. We need to also discard all conflicting altnames.
Recent fixes addressed ensuring that altnames get moved
with the main interface, which surfaced this problem.
Reported-by: Марк Коренберг <socketpair@gmail.com>
Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/
Fixes: 7663d522099e ("net: check for altname conflicts when changing netdev's netns")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
idpf registers multiple netdevs (virtual ports) for one PCI function,
but it does not provide a way for userspace to distinguish them with
sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is
a bug not to set dev_port for independent ports on the same PCI bus,
device and function.
Without dev_port set, systemd-udevd's default naming policy attempts
to assign the same name ("ens2f0") to all four idpf netdevs on my test
system and obviously fails, leaving three of them with the initial
eth<N> name.
With this patch, systemd-udevd is able to assign unique names to the
netdevs (e.g. "ens2f0", "ens2f0d1", "ens2f0d2", "ens2f0d3").
The Intel-provided out-of-tree idpf driver already sets dev_port. In
this patch I chose to do it in the same place in the idpf_cfg_netdev
function.
Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue
for presence of packets.
Problem is that for UDP sockets after blamed commit, some packets
could be present in another queue: udp_sk(sk)->reader_queue
In some cases, a busy poller could spin until timeout expiration,
even if some packets are available in udp_sk(sk)->reader_queue.
v3: - make sk_busy_loop_end() nicer (Willem)
v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats.
- add a sk_is_inet() check in sk_is_udp() (Willem feedback)
- add a sk_is_inet() check in sk_is_tcp().
Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported an uninit-value bug below. [0]
llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
(0x0011), and syzbot abused the latter to trigger the bug.
write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
llc_conn_handler() initialises local variables {saddr,daddr}.mac
based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
them to __llc_lookup().
However, the initialisation is done only when skb->protocol is
htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
__llc_lookup_listener() will read garbage.
The missing initialisation existed prior to commit 211ed865108e
("net: delete all instances of special processing for token ring").
It removed the part to kick out the token ring stuff but forgot to
close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
Let's remove llc_tr_packet_type and complete the deprecation.
[0]:
BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
__llc_lookup_established+0xe9d/0xf90
__llc_lookup net/llc/llc_conn.c:611 [inline]
llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
__netif_receive_skb_one_core net/core/dev.c:5527 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
netif_receive_skb_internal net/core/dev.c:5727 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5786
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x1490 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Local variable daddr created at:
llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Fixes: 211ed865108e ("net: delete all instances of special processing for token ring")
Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the vlan_changelink function, a loop is used to parse the nested
attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to
obtain the struct ifla_vlan_qos_mapping. These two nested attributes are
checked in the vlan_validate_qos_map function, which calls
nla_validate_nested_deprecated with the vlan_map_policy.
However, this deprecated validator applies a LIBERAL strictness, allowing
the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC.
Consequently, the loop in vlan_changelink may parse an attribute of type
IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of
struct ifla_vlan_qos_mapping, which is not necessarily true.
To address this issue and ensure compatibility, this patch introduces two
type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING.
Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan says:
====================
bnxt_en: Bug fixes
This series contains 5 miscellaneous fixes. The fixes include adding
delay for FLR, buffer memory leak, RSS table size calculation,
ethtool self test kernel warning, and mqprio crash.
====================
Link: https://lore.kernel.org/r/20240117234515.226944-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We call bnxt_half_open_nic() to setup the chip partially to run
loopback tests. The rings and buffers are initialized normally
so that we can transmit and receive packets in loopback mode.
That means page pool buffers are allocated for the aggregation ring
just like the normal case. NAPI is not needed because we are just
polling for the loopback packets.
When we're done with the loopback tests, we call bnxt_half_close_nic()
to clean up. When freeing the page pools, we hit a WARN_ON()
in page_pool_unlink_napi() because the NAPI state linked to the
page pool is uninitialized.
The simplest way to avoid this warning is just to initialize the
NAPIs during half open and delete the NAPIs during half close.
Trying to skip the page pool initialization or skip linking of
NAPI during half open will be more complicated.
This fix avoids this warning:
WARNING: CPU: 4 PID: 46967 at net/core/page_pool.c:946 page_pool_unlink_napi+0x1f/0x30
CPU: 4 PID: 46967 Comm: ethtool Tainted: G S W 6.7.0-rc5+ #22
Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021
RIP: 0010:page_pool_unlink_napi+0x1f/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 18 48 85 c0 74 1b 48 8b 50 10 83 e2 01 74 08 8b 40 34 83 f8 ff 74 02 <0f> 0b 48 c7 47 18 00 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90
RSP: 0018:ffa000003d0dfbe8 EFLAGS: 00010246
RAX: ff110003607ce640 RBX: ff110010baf5d000 RCX: 0000000000000008
RDX: 0000000000000000 RSI: ff110001e5e522c0 RDI: ff110010baf5d000
RBP: ff11000145539b40 R08: 0000000000000001 R09: ffffffffc063f641
R10: ff110001361eddb8 R11: 000000000040000f R12: 0000000000000001
R13: 000000000000001c R14: ff1100014553a080 R15: 0000000000003fc0
FS: 00007f9301c4f740(0000) GS:ff1100103fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91344fa8f0 CR3: 00000003527cc005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x81/0x140
? page_pool_unlink_napi+0x1f/0x30
? report_bug+0x102/0x200
? handle_bug+0x44/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? bnxt_free_ring.isra.123+0xb1/0xd0 [bnxt_en]
? page_pool_unlink_napi+0x1f/0x30
page_pool_destroy+0x3e/0x150
bnxt_free_mem+0x441/0x5e0 [bnxt_en]
bnxt_half_close_nic+0x2a/0x40 [bnxt_en]
bnxt_self_test+0x21d/0x450 [bnxt_en]
__dev_ethtool+0xeda/0x2e30
? native_queued_spin_lock_slowpath+0x17f/0x2b0
? __link_object+0xa1/0x160
? _raw_spin_unlock_irqrestore+0x23/0x40
? __create_object+0x5f/0x90
? __kmem_cache_alloc_node+0x317/0x3c0
? dev_ethtool+0x59/0x170
dev_ethtool+0xa7/0x170
dev_ioctl+0xc3/0x530
sock_do_ioctl+0xa8/0xf0
sock_ioctl+0x270/0x310
__x64_sys_ioctl+0x8c/0xc0
do_syscall_64+0x3e/0xf0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Fixes: 294e39e0d034 ("bnxt: hook NAPIs to page pools")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing formula used in the driver to calculate the number of RSS
table entries is to round up the number of RX rings to the next integer
multiples of 64 (e.g. 64, 128, 192, ..). This is incorrect. The valid
values supported by the chip are 64, 128, 256, 512 only (power of 2
starting from 64). When the number of RX rings is greater than 128, the
entry size will likely be wrong. Firmware will round down the invalid
value (e.g. 192 rounded down to 128) provided by the driver, causing some
RSS rings to not receive any packets.
We already have an existing function bnxt_calc_nr_ring_pages() to
do this calculation. Use it in bnxt_get_nr_rss_ctxs() to calculate the
number of RSS contexts correctly for P5_PLUS chips.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Fixes: 7b3af4f75b81 ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_hwrm_get_rings() can abort and return error when there are not
enough ring resources. It aborts without releasing the HWRM DMA buffer,
causing a dma_pool_destroy warning when the driver is unloaded:
bnxt_en 0000:99:00.0: dma_pool_destroy bnxt_hwrm, 000000005b089ba8 busy
Fixes: f1e50b276d37 ("bnxt_en: Fix trimming of P5 RX and TX rings")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first message to firmware may fail if the device is undergoing FLR.
The driver has some recovery logic for this failure scenario but we must
wait 100 msec for FLR to complete before proceeding. Otherwise the
recovery will always fail.
Fixes: ba02629ff6cb ("bnxt_en: log firmware status on firmware init failure")
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When tests are run by runner.sh, bond_options.sh gets killed before
it can complete:
make -C tools/testing/selftests run_tests TARGETS="drivers/net/bonding"
[...]
# timeout set to 120
# selftests: drivers/net/bonding: bond_options.sh
# TEST: prio (active-backup miimon primary_reselect 0) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 1) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 2) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ]
#
not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds
This test includes many sleep statements, at least some of which are
related to timers in the operation of the bonding driver itself. Increase
the test timeout to allow the test to complete.
I ran the test in slightly different VMs (including one without HW
virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and
13m2.956s. Use a ~1.5x "safety factor" and set the timeout to 1200s.
Fixes: 42a8d4aaea84 ("selftests: bonding: add bonding prio option test")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240116104402.1203850a@kernel.org/#t
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240118001233.304759-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A crash was found when dumping SMC-D connections. It can be reproduced
by following steps:
- run nginx/wrk test:
smc_run nginx
smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
- continuously dump SMC-D connections in parallel:
watch -n 1 'smcss -D'
BUG: kernel NULL pointer dereference, address: 0000000000000030
CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55
RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x66/0x150
? exc_page_fault+0x69/0x140
? asm_exc_page_fault+0x26/0x30
? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
? __kmalloc_node_track_caller+0x35d/0x430
? __alloc_skb+0x77/0x170
smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
smc_diag_dump+0x26/0x60 [smc_diag]
netlink_dump+0x19f/0x320
__netlink_dump_start+0x1dc/0x300
smc_diag_handler_dump+0x6a/0x80 [smc_diag]
? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
sock_diag_rcv_msg+0x121/0x140
? __pfx_sock_diag_rcv_msg+0x10/0x10
netlink_rcv_skb+0x5a/0x110
sock_diag_rcv+0x28/0x40
netlink_unicast+0x22a/0x330
netlink_sendmsg+0x1f8/0x420
__sock_sendmsg+0xb0/0xc0
____sys_sendmsg+0x24e/0x300
? copy_msghdr_from_user+0x62/0x80
___sys_sendmsg+0x7c/0xd0
? __do_fault+0x34/0x160
? do_read_fault+0x5f/0x100
? do_fault+0xb0/0x110
? __handle_mm_fault+0x2b0/0x6c0
__sys_sendmsg+0x4d/0x80
do_syscall_64+0x69/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76
It is possible that the connection is in process of being established
when we dump it. Assumed that the connection has been registered in a
link group by smc_conn_create() but the rmb_desc has not yet been
initialized by smc_buf_create(), thus causing the illegal access to
conn->rmb_desc. So fix it by checking before dump.
Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous releases - regressions:
- Revert "net: rtnetlink: Enslave device before bringing it up",
breaks the case inverse to the one it was trying to fix
- net: dsa: fix oob access in DSA's netdevice event handler
dereference netdev_priv() before check its a DSA port
- sched: track device in tcf_block_get/put_ext() only for clsact
binder types
- net: tls, fix WARNING in __sk_msg_free when record becomes full
during splice and MORE hint set
- sfp-bus: fix SFP mode detect from bitrate
- drv: stmmac: prevent DSA tags from breaking COE
Previous releases - always broken:
- bpf: fix no forward progress in in bpf_iter_udp if output
buffer is too small
- bpf: reject variable offset alu on registers with a type
of PTR_TO_FLOW_KEYS to prevent oob access
- netfilter: tighten input validation
- net: add more sanity check in virtio_net_hdr_to_skb()
- rxrpc: fix use of Don't Fragment flag on RESPONSE packets,
avoid infinite loop
- amt: do not use the portion of skb->cb area which may get clobbered
- mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option
Misc:
- spring cleanup of inactive maintainers
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmWpnvoACgkQMUZtbf5S
Irvskg/+Or5tETxOmpQXxnj6ECZyrSp0Jcyd7+TIcos/7JfPdn3Kebl004SG4h/s
bwKDOIIP1iSjQ+0NFsPjyYIVd6wFuCElSB7npV5uQAT6ptXx7A4Ym68/rVxodI8T
6hiYV/mlPuZF8JjRhtp/VJL8sY1qnG7RIUB4oH3y9HQNfwZX0lIWChuUilHuWfbq
zQ2Iu97tMkoIBjXrkIT3Qaj0aFxYbjCOrg9zy+FZ69a7Rmrswr//7amlCH6saNTx
Ku7Wl8FXhe7O23OiM6GSl7AechSM1aJ5kOS3orseej0+aSp9eH3ekYGmbsQr6sjz
ix/eZ7V7SUkJK3bEH5haeymk4TDV3lHE8SziMbosK4wVbHOyPwEmqCxppADYJLZs
WycHZKcTBluFBOxknAofH7m5Hh0ToXkeTfpptSSGtRB4WncAOMsMapr3yS4WXg/q
AnOo/tzCBgMrnSJtD/kjqgUiCk8vYoLc8lBR9K74l0zqI1sf13OfuTHvEgqIS6z1
Ir/ewlAV6fCH8gQbyzjKUVlyjZS4+vFv19xg/2GgLf+LdyzcCOxUZkND3/DE6+OA
Dgf9gtABYU4hGXMUfTfml3KCBTF65QmY8dIh17zraNylYUHEJ2lI4D+sdiqWUrXb
mXPBJh4nOPwIV5t2gT80skNwF3aWPr6l4ieY2codSbP04rO74S8=
=YhQQ
-----END PGP SIGNATURE-----
Merge tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Previous releases - regressions:
- Revert "net: rtnetlink: Enslave device before bringing it up",
breaks the case inverse to the one it was trying to fix
- net: dsa: fix oob access in DSA's netdevice event handler
dereference netdev_priv() before check its a DSA port
- sched: track device in tcf_block_get/put_ext() only for clsact
binder types
- net: tls, fix WARNING in __sk_msg_free when record becomes full
during splice and MORE hint set
- sfp-bus: fix SFP mode detect from bitrate
- drv: stmmac: prevent DSA tags from breaking COE
Previous releases - always broken:
- bpf: fix no forward progress in in bpf_iter_udp if output buffer is
too small
- bpf: reject variable offset alu on registers with a type of
PTR_TO_FLOW_KEYS to prevent oob access
- netfilter: tighten input validation
- net: add more sanity check in virtio_net_hdr_to_skb()
- rxrpc: fix use of Don't Fragment flag on RESPONSE packets, avoid
infinite loop
- amt: do not use the portion of skb->cb area which may get clobbered
- mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option
Misc:
- spring cleanup of inactive maintainers"
* tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
i40e: Include types.h to some headers
ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
selftests: mlxsw: qos_pfc: Remove wrong description
mlxsw: spectrum_router: Register netdevice notifier before nexthop
mlxsw: spectrum_acl_tcam: Fix stack corruption
mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
ethtool: netlink: Add missing ethnl_ops_begin/complete
selftests: bonding: Add more missing config options
selftests: netdevsim: add a config file
libbpf: warn on unexpected __arg_ctx type when rewriting BTF
selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
bpf: extract bpf_ctx_convert_map logic and make it more reusable
libbpf: feature-detect arg:ctx tag support in kernel
ipvs: avoid stat macros calls from preemptible context
netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description
netfilter: nf_tables: skip dead set elements in netlink dump
netfilter: nf_tables: do not allow mismatch field size and set key length
...
(controllers set the flag, but there is no client to use it). Also,
CLASS_SPD support gets simplified to prepare removal in the future.
Class based instantiation is not recommended these days anyhow.
Furthermore, I2C core now creates a debugfs directory per I2C adapter.
Current bus driver users were converted to use it. Then, there are also
quite some driver updates. Standing out are patches for the wmt-driver
which is refactored to support more variants. This is the rebased pull
request where a large series for the designware driver was dropped.
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmWph0UPHHdzYUBrZXJu
ZWwub3JnAAoJEBQN5MwUoCm2kbIQAJotSmX0mM+nNPReYCMMiloxoxUwgpiErNwY
WDrYQSezthAJ1LDsGOEeLcE4f4I+UcUHBO1BoERtOZg3cGtE0Ii5N845sp100S9O
ktyaKS5utoErymThWFFrnZX60/8yKXUMzZmNzy96560gPcxbFyyyVhKfBSPzK9T+
O8CGu7GRNqgWHlvH3yqGeCbreWYrYVSrluEpBu6807cp3zDxrU+autOnsewm5+md
ka3DdqrbxJSblYK8fJKESAUgkRmZgYKbgl0iiCuqX+ib6I4OA3Z68ny7dl0fY3Ws
vwt7d88SaBKDdJmUZyb/sm4aJsW69GN+ECZolxrn4TIw45k4tes2s6Ma5+TV3E9h
Fd1RuqduFEqQ7cj31UPe2x8rgj5Fo5nbjCWxdZv+/3zF8+cHwi8iwkp2PScsPCsa
fmCdehUE5DrgobsRNANe6XJzxY5wp2VNpGEWKeaQz2Z0/d9T1YFS7a8aewvhXoPC
isZboi6GQh2XoE8UgGJa29VUuaIkUW513DwCGw8mz1yKN+kHGcsRXXjkjaZoQn3U
MMvh/zkI2Hpy/m2R8PWeIq5XhLJvmlZ19JJzUHJIjXh9Fn9EVtXhlUleh6mzMfeM
n8NOg7Eukep2sBgmaufkUKz2Jtogs59YDSXZEvqJjIkPM2Wi0hA18Qj+pilES1ff
3ckk3mxY
=8D3Q
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"This removes the currently unused CLASS_DDC support (controllers set
the flag, but there is no client to use it).
Also, CLASS_SPD support gets simplified to prepare removal in the
future. Class based instantiation is not recommended these days
anyhow.
Furthermore, I2C core now creates a debugfs directory per I2C adapter.
Current bus driver users were converted to use it.
Finally, quite some driver updates. Standing out are patches for the
wmt-driver which is refactored to support more variants.
This is the rebased pull request where a large series for the
designware driver was dropped"
* tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
MAINTAINERS: use proper email for my I2C work
i2c: stm32f7: add support for stm32mp25 soc
i2c: stm32f7: perform I2C_ISR read once at beginning of event isr
dt-bindings: i2c: document st,stm32mp25-i2c compatible
i2c: stm32f7: simplify status messages in case of errors
i2c: stm32f7: perform most of irq job in threaded handler
i2c: stm32f7: use dev_err_probe upon calls of devm_request_irq
i2c: i801: Add lis3lv02d for Dell XPS 15 7590
i2c: i801: Add lis3lv02d for Dell Precision 3540
i2c: wmt: Reduce redundant: REG_CR setting
i2c: wmt: Reduce redundant: function parameter
i2c: wmt: Reduce redundant: clock mode setting
i2c: wmt: Reduce redundant: wait event complete
i2c: wmt: Reduce redundant: bus busy check
i2c: mux: reg: Remove class-based device auto-detection support
i2c: make i2c_bus_type const
dt-bindings: at24: add ROHM BR24G04
eeprom: at24: use of_match_ptr()
i2c: cpm: Remove linux,i2c-index conversion from be32
i2c: imx: Make SDA actually optional for bus recovering
...
Subsytem:
New driver:
- Analog Devices MAX31335
- Nuvoton ma35d1
- Texas Instrument TPS6594 PMIC RTC
Drivers:
- cmos: use ACPI alarm instead of HPET on recent AMD platforms
- nuvoton: add NCT3015Y-R and NCT3018Y-R support
- rv8803: proper suspend/resume and wakeup-source support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmWpkqQACgkQY6TcMGxw
OjJ2eQ//cDaUPgvOUQ2+qQiZOyOrjnck+QEyXZr2mcnNfRnMPiafJX09v3W4BMPh
0oOVNF5BbGofTgJRnTC9BCRQnq8XZUwXaMr1B+mLQF4V9Xq3896pILkdRmd0q7EW
3qwKCNP1vkYx2hGyWB9wVAMESAdUIFHCLxJWeQZ3ESGUacMoON0cdFa96TUx4fKa
m29ybMTHRHKpnZsIYpegxG42lWp84IPvTbtySbT52dr2ucLToVos/dX23juQ40D0
nyUa8Q+g6aLoTxjZPcFwK6dHJJIwWz56s40IbMRGr6dVfRis5QZfIQ/cB8ULj48L
AkCtN6kptVsov/W2R9ZriTf5p53K7Fmwz+dhccW7SxA82cGyswWOD3BNzzOYnzPY
pKSVeTnR7mD2IC28pGrekSpg3ExuNu/4+WHnz0EwpjgyXmtlxnfK6oV9nf8bIUsn
JsY3tNOvenv8NQSQ1at3GugeQ4bGMbxay6pL9zm5EZjYGMDX0z7IcPFr9KeYC5tJ
60dYWCGuB2JF0WocuAXSNrj2l9VFFqhn7OdDVuiB00rBROQcKKV/H30EnmAwRUI1
8SZhVJ0TIeoG6XsajhbapvOH8EWUnKPn1mgJxSiyVR6Hz93oZldIPK3cUtfjcltx
xl7LdVeseE1CXGh4g222CI0MCX6x1QPQ5L8jhgc67dyTUx/wQ4Q=
=7u4i
-----END PGP SIGNATURE-----
Merge tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There are three new drivers this cycle. Also the cmos driver is
getting fixes for longstanding wakeup issues on AMD.
New drivers:
- Analog Devices MAX31335
- Nuvoton ma35d1
- Texas Instrument TPS6594 PMIC RTC
Drivers:
- cmos: use ACPI alarm instead of HPET on recent AMD platforms
- nuvoton: add NCT3015Y-R and NCT3018Y-R support
- rv8803: proper suspend/resume and wakeup-source support"
* tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (26 commits)
rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R
rtc: da9063: Use dev_err_probe()
rtc: da9063: Use device_get_match_data()
rtc: da9063: Make IRQ as optional
rtc: max31335: Fix comparison in max31335_volatile_reg()
rtc: max31335: use regmap_update_bits_check
rtc: max31335: remove unecessary locking
rtc: max31335: add driver support
dt-bindings: rtc: max31335: add max31335 bindings
rtc: rv8803: add wakeup-source support
rtc: ac100: remove misuses of kernel-doc
rtc: class: Remove usage of the deprecated ida_simple_xx() API
rtc: MAINTAINERS: drop Alessandro Zummo
rtc: ma35d1: remove hardcoded UIE support
dt-bindings: rtc: qcom-pm8xxx: fix inconsistent example
rtc: rv8803: Add power management support
rtc: ds3232: avoid unused-const-variable warning
rtc: lpc24xx: add missing dependency
rtc: tps6594: Add driver for TPS6594 RTC
rtc: Add driver for Nuvoton ma35d1 rtc controller
...
- a new driver for Adafruit Seesaw gamepad device
- Zforce touchscreen will handle standard device properties for axis
swap/inversion
- handling of advanced sensitivity settings in Microchip CAP11xx
capacitive sensor driver
- more drivers have been converted to use newer gpiod API
- support for dedicated wakeup IRQs in gpio-keys dirver
- support for slider gestures and OTP variants in iqs269a driver
- atkbd will report keyboard version as 0xab83 in cases when GET ID
command was skipped (to deal with problematic firmware on newer
laptops), restoring the previous behavior
- other assorted cleanups and changes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZalqoQAKCRBAj56VGEWX
nOqHAP4u4b/r4w2aeULy3kpESgbUQ1vXpFUus/6AHTw1FAtbsgD/XxV3ZWKZ0H8J
VfZ81yXvT3WstJM7p7YNP0GGXJ/HRQg=
=l0Z8
-----END PGP SIGNATURE-----
Merge tag 'input-for-v6.8-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for Adafruit Seesaw gamepad device
- Zforce touchscreen will handle standard device properties for axis
swap/inversion
- handling of advanced sensitivity settings in Microchip CAP11xx
capacitive sensor driver
- more drivers have been converted to use newer gpiod API
- support for dedicated wakeup IRQs in gpio-keys dirver
- support for slider gestures and OTP variants in iqs269a driver
- atkbd will report keyboard version as 0xab83 in cases when GET ID
command was skipped (to deal with problematic firmware on newer
laptops), restoring the previous behavior
- other assorted cleanups and changes
* tag 'input-for-v6.8-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (44 commits)
Input: atkbd - use ab83 as id when skipping the getid command
Input: driver for Adafruit Seesaw Gamepad
dt-bindings: input: bindings for Adafruit Seesaw Gamepad
Input: da9063_onkey - avoid explicitly setting input's parent
Input: da9063_onkey - avoid using OF-specific APIs
Input: iqs269a - add support for OTP variants
dt-bindings: input: iqs269a: Add bindings for OTP variants
Input: iqs269a - add support for slider gestures
dt-bindings: input: iqs269a: Add bindings for slider gestures
Input: gpio-keys - filter gpio_keys -EPROBE_DEFER error messages
Input: zforce_ts - accept standard touchscreen properties
dt-bindings: touchscreen: neonode,zforce: Use standard properties
dt-bindings: touchscreen: convert neonode,zforce to json-schema
dt-bindings: input: convert drv266x to json-schema
Input: da9063 - use dev_err_probe()
Input: da9063 - drop redundant prints in probe()
Input: da9063 - simplify obtaining OF match data
Input: as5011 - convert to GPIO descriptor
Input: omap-keypad - drop optional GPIO support
Input: tca6416-keypad - drop unused include
...
- Core: add concept of controller_id to deal with clear Controller/Manager
hierarchy
- bunch of qcom driver refactoring for qcom_swrm_stream_alloc_ports(),
qcom_swrm_stream_alloc_ports() and setting controller id to hw master id
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmWpV0gACgkQfBQHDyUj
g0ebow/+Ot1EFq8ADVjCnR4noelvlvfwhNHwM5PSN+GMkSmkTOI6lr4mueDFkEfT
S2yFuKCtO7gIjV+EWt5G0uWAGM33E0aWwcJ+ubw95p+m1lilo2EYgQU3NDTfb+SK
dUxSbwVDHl2jYTQHSV4UzuA66J9xyK7x5WdUuPvLVSzF5thXvu5g1hcXyd0zqDDT
3i+WKkvhquOFOjsexSEAjXsbfa8XNM2R/7iiGlxZUdTh2NwSR5dyQ8WuyhEkw0Gb
23KEOw9wPj1Gcp0JTQ1tSr7C/Ag9jC0QBZLI4ggdivXPUO9NuUHT1owHxLrwXr9R
uz0zzbBBNvPGOOeW9C8CetcSXVvZAVmgPHZumiuXBZMcuDLyrS0BoQyNzl/yTflB
ZBlXbzt7C9l6udHxG+4gLy2Xur0wwzBrH8ZqjItkp6MZTBUxcoMgIY6Y24e0DHsn
cwgsuRblsnL7+JHcq5DL8KYBOnBmexxMCaFtKWGdlU9dWOQtPPe20yIwQACzxUrC
VpvthWTk4P2KZB2hlP8xGbX9IZmINABN0vwQgf2wfUnRGzmyx7wdLXElXXahim4A
cyIIjk9AiM2wglUQxKccKqdntg2SAwMeUE0YuMbBzAlzR8fgIklWzFvahOHLED0L
kvPg1fr/dM5TN9PraptbrML+D5IHXCyCTThZB7ujTTMF4ekwdCo=
=N74M
-----END PGP SIGNATURE-----
Merge tag 'soundwire-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- Core: add concept of controller_id to deal with clear Controller /
Manager hierarchy
- bunch of qcom driver refactoring for qcom_swrm_stream_alloc_ports(),
qcom_swrm_stream_alloc_ports() and setting controller id to hw master
id
* tag 'soundwire-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: drop bus freq calculation and set 'max_clk_freq'
soundwire: generic_bandwidth_allocation use bus->params.max_dr_freq
soundwire: qcom: set controller id to hw master id
soundwire: fix initializing sysfs for same devices on different buses
soundwire: bus: introduce controller_id
soundwire: stream: constify sdw_port_config when adding devices
soundwire: qcom: move sconfig in qcom_swrm_stream_alloc_ports() out of critical section
soundwire: qcom: drop unneeded qcom_swrm_stream_alloc_ports() cleanup
- revert the changes aiming to use a read-write semaphore to protect the
list of GPIO devices due to calls to legacy API taking that lock from
atomic context in old code
- fix inverted logic in DEFINE_FREE() for GPIO device references
- check the return value of bgpio_init() in gpio-mlxbf3
- fix node address in the DT bindings example for gpio-xilinx
- fix signedness bug in gpio-rtd
- fix kernel-doc warnings in gpio-en7523
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmWpGfgACgkQEacuoBRx
13K92A//TonLW0SmSJ5OM0jn9pqUhYLsSxbknu9M9RVmX+GTIq9YK+S5SiCbRS50
rXn4q3WSqbocM+9R0MUlJJJK8UkGT+JU330vPlJl4gZAy0bK8aoeOqg7vb7nY12e
0/PvroMpTHoBEIp5XloK4Wjxp3dnc1mGfozSJ7af9FXuY9i1cv8KhnrI8krKbszr
quuWlZRrtMxKeR3FnsZVn8WNvEWqB5gAN1QJrX0h4sOe0/ZFVfdpZaCW4j1YZtux
aF6d/9XyEnoR/yJpAE/Rt5Wy727RkilRvSiihnijnJd5JqdMU89CdXdgUF2oFOZ6
rTAh6XrXiJaaAst4H+5rRe37PR0FOBEAwY1pWUlk9caikfPujNhzPmjvXbdAW7N9
xe8J+2hTxYfOPYPNtSD/yjcWOudgOb8iiaRREWcRzR96vX/K6R94nF/8hOOEnm+F
q/xbon1Rt8zhtjZ5lylGsmKoBlKDHlch+c3BXSBf4AxahMNgkebgx8pbkl3jUfg8
30k+Z5wl08t7oRgeydBKfw7Y42LElBMqAFikjEK8ACyhEGC/e5XD8H0xlWGaKqU5
sGhd5VS5Y2BQjsrpWkpdQGyOGquz86z5AjwDAAQp3nLgCNrs1UJ8imMPn5c2MPXl
JAQhMg+zM1NlUhLrXwChYpQ08vypgFUsbGVQ+SDnEbFUo9DbHpg=
=g2QY
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Apart from some regular driver fixes there's a relatively big revert
of the locking changes that were introduced to GPIOLIB in this merge
window.
This is because it turned out that some legacy GPIO interfaces - that
need to translate a number from the global GPIO numberspace to the
address of the relevant descriptor, thus running a GPIO device lookup
and taking the GPIO device list lock - are still used in old code from
atomic context resulting in "scheduling while atomic" errors.
I'll try to make the read-only part of the list access entirely
lockless using SRCU but this will take some time so let's go back to
the old global spinlock for now.
Summary:
- revert the changes aiming to use a read-write semaphore to protect
the list of GPIO devices due to calls to legacy API taking that
lock from atomic context in old code
- fix inverted logic in DEFINE_FREE() for GPIO device references
- check the return value of bgpio_init() in gpio-mlxbf3
- fix node address in the DT bindings example for gpio-xilinx
- fix signedness bug in gpio-rtd
- fix kernel-doc warnings in gpio-en7523"
* tag 'gpio-fixes-for-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: revert the attempt to protect the GPIO device list with an rwsem
gpio: EN7523: fix kernel-doc warnings
gpiolib: Fix scope-based gpio_device refcounting
gpio: mlxbf3: add an error code check in mlxbf3_gpio_probe
dt-bindings: gpio: xilinx: Fix node address in gpio
gpio: rtd: Fix signedness bug in probe