IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 93f479d3ad05497f29f2bed58e4a6c6a4f0a548c upstream.
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1a7ac6f3ba6e157adcd0ca94d92a401f1943f56 upstream.
When ping_group_range is updated, 'ping' uses the DGRAM ICMP socket,
instead of an IP raw socket. In this case, 'ping' is unable to bind its
socket to a local address owned by a vrflite.
Before the patch:
$ sysctl -w net.ipv4.ping_group_range='0 2147483647'
$ ip link add blue type vrf table 10
$ ip link add foo type dummy
$ ip link set foo master blue
$ ip link set foo up
$ ip addr add 192.168.1.1/24 dev foo
$ ip addr add 2001::1/64 dev foo
$ ip vrf exec blue ping -c1 -I 192.168.1.1 192.168.1.2
ping: bind: Cannot assign requested address
$ ip vrf exec blue ping6 -c1 -I 2001::1 2001::2
ping6: bind icmp socket: Cannot assign requested address
CC: stable@vger.kernel.org
Fixes: 1b69c6d0ae90 ("net: Introduce L3 Master device abstraction")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f00432063db1a0db484e85193eccc6845435b80e upstream.
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.
Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Meena Shanmugam <meenashanmugam@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream.
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[meenashanmugam: Backported to 5.10: Fixed merge conflict in xs_tcp_setup_socket]
Signed-off-by: Meena Shanmugam <meenashanmugam@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 upstream.
If we have already set up the socket and are waiting for it to connect,
then don't immediately close and retry.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Meena Shanmugam <meenashanmugam@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4dfa9b438ee34caca4e6a4e5e961641807367f6f ]
In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3740651bf7e200109dd42d5b2fb22226b26f960a ]
The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.
Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.
This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.
Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3c46e41b32b6266cf60b0985c61748f53bf1c61 ]
Non blocking sendmsg will return -EAGAIN when any signal pending
and no send space left, while non blocking recvmsg return -EINTR
when signal pending and no data received. This may makes confused.
As TCP returns -EAGAIN in the conditions described above. Align the
behavior of smc with TCP.
Fixes: 846e344eb722 ("net/smc: add receive timeout check")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b796475fd7882663a870456466a4fb315cc1bd6 ]
Currently pedit tries to ensure that the accessed skb offset
is writable via skb_unclone(). The action potentially allows
touching any skb bytes, so it may end-up modifying shared data.
The above causes some sporadic MPTCP self-test failures, due to
this code:
tc -n $ns2 filter add dev ns2eth$i egress \
protocol ip prio 1000 \
handle 42 fw \
action pedit munge offset 148 u8 invert \
pipe csum tcp \
index 100
The above modifies a data byte outside the skb head and the skb is
a cloned one, carrying a TCP output packet.
This change addresses the issue by keeping track of a rough
over-estimate highest skb offset accessed by the action and ensuring
such offset is really writable.
Note that this may cause performance regressions in some scenarios,
but hopefully pedit is not in the critical path.
Fixes: db2c24175d14 ("act_pedit: access skb->data safely")
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5076fe4049cadef1f040eda4aaa001bb5424225 ]
netlink_recvmsg() does not need to change transport header.
If transport header was needed, it should have been reset
by the producer (netlink_dump()), not the consumer(s).
The following trace probably happened when multiple threads
were using MSG_PEEK.
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1:
skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
__sys_recvfrom+0x204/0x2c0 net/socket.c:2097
__do_sys_recvfrom net/socket.c:2115 [inline]
__se_sys_recvfrom net/socket.c:2111 [inline]
__x64_sys_recvfrom+0x74/0x90 net/socket.c:2111
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0:
skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
____sys_recvmsg+0x162/0x2f0
___sys_recvmsg net/socket.c:2674 [inline]
__sys_recvmsg+0x209/0x3f0 net/socket.c:2704
__do_sys_recvmsg net/socket.c:2714 [inline]
__se_sys_recvmsg net/socket.c:2711 [inline]
__x64_sys_recvmsg+0x42/0x50 net/socket.c:2711
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff -> 0x0000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700ebd6be-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86af062f40a73bf63321694e6bf637144f0383fe ]
Currently MBSSID parameters in struct ieee80211_bss_conf
are not reset upon connection. This could be problematic
with some drivers in a scenario where the device first
connects to a non-transmit BSS and then connects to a
transmit BSS of a Multi BSS AP. The MBSSID parameters
which are set after connecting to a non-transmit BSS will
not be reset and the same parameters will be passed on to
the driver during the subsequent connection to a transmit
BSS of a Multi BSS AP.
For example, firmware running on the ath11k device uses the
Multi BSS data for tracking the beacon of a non-transmit BSS
and reports the driver when there is a beacon miss. If we do
not reset the MBSSID parameters during the subsequent
connection to a transmit BSS, then the driver would have
wrong MBSSID data and FW would be looking for an incorrect
BSSID in the MBSSID beacon of a Multi BSS AP and reports
beacon loss leading to an unstable connection.
Reset the MBSSID parameters upon every connection to solve this
problem.
Fixes: 78ac51f81532 ("mac80211: support multi-bssid")
Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
Link: https://lore.kernel.org/r/20220428052744.27040-1-quic_mpubbise@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a063f2fba3fa633a599253b62561051ac185fa99 ]
The receiving interface might have used GRO to receive more fragments than
MAX_SKB_FRAGS fragments. In this case, these will not be stored in
skb_shinfo(skb)->frags but merged into the frag list.
batman-adv relies on the function skb_split to split packets up into
multiple smaller packets which are not larger than the MTU on the outgoing
interface. But this function cannot handle frag_list entries and is only
operating on skb_shinfo(skb)->frags. If it is still trying to split such an
skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST,
then validate_xmit_skb() will try to linearize it. But this fails due to
inconsistent information. And __pskb_pull_tail will trigger a BUG_ON after
skb_copy_bits() returns an error.
In case of entries in frag_list, just linearize the skb before operating on
it with skb_split().
Reported-by: Felix Kaechele <felix@kaechele.ca>
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Tested-by: Felix Kaechele <felix@kaechele.ca>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 103a2f3255a95991252f8f13375c3a96a75011cd upstream.
Set a size limit of 8 bytes of the written buffer to "hdev->name"
including the terminating null byte, as the size of "hdev->name" is 8
bytes. If an id value which is greater than 9999 is allocated,
then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)"
function call would lead to a truncation of the id value in decimal
notation.
Set an explicit maximum id parameter in the id allocation function call.
The id allocation function defines the maximum allocated id value as the
maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined
as 10000.
Signed-off-by: Itay Iellin <ieitayie@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4071bf121d59944d5cd2238de0642f3d7995a997 upstream.
There are sleep in atomic bug that could cause kernel panic during
firmware download process. The root cause is that nlmsg_new with
GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer
handler. The call trace is shown below:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
Call Trace:
kmem_cache_alloc_node
__alloc_skb
nfc_genl_fw_download_done
call_timer_fn
__run_timers.part.0
run_timer_softirq
__do_softirq
...
The nlmsg_new with GFP_KERNEL parameter may sleep during memory
allocation process, and the timer handler is run as the result of
a "software interrupt" that should not call any other function
that could sleep.
This patch changes allocation mode of netlink message from GFP_KERNEL
to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.
Fixes: 9674da8759df ("NFC: Add firmware upload netlink command")
Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da5c0f119203ad9728920456a0f52a6d850c01cd upstream.
The device_is_registered() in nfc core is used to check whether
nfc device is registered in netlink related functions such as
nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
is protected by device_lock, there is still a race condition between
device_del() and device_is_registered(). The root cause is that
kobject_del() in device_del() is not protected by device_lock.
(cleanup task) | (netlink task)
|
nfc_unregister_device | nfc_fw_download
device_del | device_lock
... | if (!device_is_registered)//(1)
kobject_del//(2) | ...
... | device_unlock
The device_is_registered() returns the value of state_in_sysfs and
the state_in_sysfs is set to zero in kobject_del(). If we pass check in
position (1), then set zero in position (2). As a result, the check
in position (1) is useless.
This patch uses bool variable instead of device_is_registered() to judge
whether the nfc device is registered, which is well synchronized.
Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72ed3ee9fa0b461ad086403a8b5336154bd82234 upstream.
As a carry over from the CAN_RAW socket (which allows to change the CAN
interface while mantaining the filter setup) the re-binding of the
CAN_ISOTP socket needs to take care about CAN ID address information and
subscriptions. It turned out that this feature is so limited (e.g. the
sockopts remain fix) that it finally has never been needed/used.
In opposite to the stateless CAN_RAW socket the switching of the CAN ID
subscriptions might additionally lead to an interrupted ongoing PDU
reception. So better remove this unneeded complexity.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220422082337.1676-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3d0562d4dc039bca39445e1cddde7951662e17d upstream.
This reverts commit 7073ea8799a8cf73db60270986f14e4aae20fa80.
We must not try to connect the socket while the transport is under
construction, because the mechanisms to safely tear it down are not in
place. As the code stands, we end up leaking the sockets on a connection
error.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 743b83f15d4069ea57c3e40996bf4a1077e0cdc1 upstream.
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.
Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.
Use in output work if a socket is already attached to the skb.
Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0df71948e9548de819a6f1da68f5f1742258a52 ]
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.
If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.
This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4bfe744ff1644fbc0a991a2677dc874475dd6776 ]
I had this bug sitting for too long in my pile, it is time to fix it.
Thanks to Doug Porter for reminding me of it!
We had various attempts in the past, including commit
0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"),
but the issue is that TCP stack currently only generates
EPOLLOUT from input path, when tp->snd_una has advanced
and skb(s) cleaned from rtx queue.
If a flow has a big RTT, and/or receives SACKs, it is possible
that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
and no more data can be sent until tp->snd_una finally advances.
What is needed is to also check if POLLOUT needs to be generated
whenever tp->snd_nxt is advanced, from output path.
This bug triggers more often after an idle period, as
we do not receive ACK for at least one RTT. tcp_notsent_lowat
could be a fraction of what CWND and pacing rate would allow to
send during this RTT.
In a followup patch, I will remove the bogus call
to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
from tcp_check_space(). Fact that we have decided to generate
an EPOLLOUT does not mean the application has immediately
refilled the transmit queue. This optimistic call
might have been the reason the bug seemed not too serious.
Tested:
200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
$ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
$ cat bench_rr.sh
SUM=0
for i in {1..10}
do
V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
echo $V
SUM=$(($SUM + $V))
done
echo SUM=$SUM
Before patch:
$ bench_rr.sh
130000000
80000000
140000000
140000000
140000000
140000000
130000000
40000000
90000000
110000000
SUM=1140000000
After patch:
$ bench_rr.sh
430000000
590000000
530000000
450000000
450000000
350000000
450000000
490000000
480000000
460000000
SUM=4680000000 # This is 410 % of the value before patch.
Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Doug Porter <dsp@fb.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31c417c948d7f6909cb63f0ac3298f3c38f8ce20 ]
As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the
following sequence of events:
1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
add ... external". "ip" ignores "[o]seq" if "external" is specified,
so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
it uses lockless TX);
2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
gre_build_header(skb, tun_hlen,
flags, protocol,
tunnel_id_to_key32(tun_info->key.tun_id),
(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
: 0); ^^^^^^^^^^^^^^^^^
Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by
making o_seqno atomic_t.
As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
xmit"), making o_seqno atomic_t increases "chance for packets being out
of order at receiver" when NETIF_F_LLTX is on.
Maybe a better fix would be:
1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if
they want the kernel to allow sequencing of outgoing packets;
2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
with "oseq".
Unfortunately, that would break userspace.
We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
do it in separate patches to keep this fix minimal.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fde98ae91f79cab4e020f40c35ed23cbdc59661c ]
For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
native mode. According to RFC 2890 2.2., "The first datagram is sent
with a sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
where tunnel->o_seqno is passed to gre_build_header() before getting
incremented.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff827beb706ed719c766acf36449801ded0c17fc ]
For GRE and GRETAP devices, currently o_seqno starts from 1 in native
mode. According to RFC 2890 2.2., "The first datagram is sent with a
sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
gre_build_header() before getting incremented.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e2e65e2e56c6ceb4ea1719360080c0af083229e ]
In the current implementation, when TCP initiates a connection
to an unavailable [ip,port], ECONNREFUSED will be stored in the
TCP socket, but SMC will not. However, some apps (like curl) use
getsockopt(,,SO_ERROR,,) to get the error information, which makes
them miss the error message and behave strangely.
Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 165e3e17fe8fe6a8aab319bc6e631a2e23b9a857 ]
A null pointer reference issue can be triggered when the response of a
stream reconf request arrives after the timer is triggered, such as:
send Incoming SSN Reset Request --->
CPU0:
reconf timer is triggered,
go to the handler code before hold sk lock
<--- reply with Outgoing SSN Reset Request
CPU1:
process Outgoing SSN Reset Request,
and set asoc->strreset_chunk to NULL
CPU0:
continue the handler code, hold sk lock,
and try to hold asoc->strreset_chunk, crash!
In Ying Xu's testing, the call trace is:
[ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
[ ] Call Trace:
[ ] <IRQ>
[ ] sctp_sf_send_reconf+0x2c/0x100 [sctp]
[ ] sctp_do_sm+0xa4/0x220 [sctp]
[ ] sctp_generate_reconf_event+0xbd/0xe0 [sctp]
[ ] call_timer_fn+0x26/0x130
This patch is to fix it by returning from the timer handler if asoc
strreset_chunk is already set to NULL.
Fixes: 7b9438de0cd4 ("sctp: add stream reconf timer")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b253a0680ceadc5d7b4acca7aa2d870326cad8ad ]
If an ACK (s)acks multiple skbs, we favor the information
from the most recently sent skb by choosing the skb with
the highest prior_delivered count. But in the interval
between receiving ACKs, we send multiple skbs with the same
prior_delivered, because the tp->delivered only changes
when we receive an ACK.
We used RACK's solution, copying tcp_rack_sent_after() as
tcp_skb_sent_after() helper to determine "which packet was
sent last?". Later, we will use tcp_skb_sent_after() instead
in RACK.
Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1650422081-22153-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b0b9e4c2c895227c8852488b3f09839233bba54 ]
In tcp_create_openreq_child we adjust tcp_header_len for md5 using the
remote address in newsk. But that address is still 0 in newsk at this
point, and it is only set later by the callers (tcp_v[46]_syn_recv_sock).
Use the address from the request socket instead.
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220421005026.686A45EC01F2@us226.sjc.aristanetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit babc3dc9524f0bcb5a0ec61f3c3639b11508fad6 ]
This patch fixes spurious EEXIST errors.
Extend d2df92e98a34 ("netfilter: nft_set_rbtree: handle element
re-addition after deletion") to deal with elements with same end flags
in the same transation.
Reset the overlap flag as described by 7c84d41416d8 ("netfilter:
nft_set_rbtree: Detect partial overlaps on insertion").
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Fixes: d2df92e98a34 ("netfilter: nft_set_rbtree: handle element re-addition after deletion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc06b2867f4cea543505acfb194c2be4ebf0c7d3 ]
The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
of_node_put() will check for NULL value.
Fixes: a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eba1a872cb73314280d5448d934935b23e30b7ca ]
The memory size of ip_vs_conn_tab changed after we use hlist
instead of list.
Fixes: 731109e78415 ("ipvs: use hlist instead of list")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d73497081710c876c3c61444445512989e102152 upstream.
The first attempt to fix a the 'impossible' WARN_ON_ONCE(1) in
isotp_tx_timer_handler() focussed on the identical CAN IDs created by
the syzbot reproducer and lead to upstream fix/commit 3ea566422cbd
("can: isotp: sanitize CAN ID checks in isotp_bind()"). But this did
not catch the root cause of the wrong tx.state in the tx_timer handler.
In the isotp 'first frame' case a timeout monitoring needs to be started
before the 'first frame' is send. But when this sending failed the timeout
monitoring for this specific frame has to be disabled too.
Otherwise the tx_timer is fired with the 'warn me' tx.state of ISOTP_IDLE.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220405175112.2682-1-socketcan@hartkopp.net
Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cefa91b2332d7009bc0be5d951d6cbbf349f90f8 upstream.
Given a sufficiently large number of actions, while copying and
reserving memory for a new action of a new flow, if next_offset is
greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
bytes increasing actions_len by req_size. This can then lead to an OOB
write access, especially when further actions need to be copied.
Fix it by rearranging the flow action size check.
KASAN splat below:
==================================================================
BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
Write of size 65360 at addr ffff888147e4001c by task handler15/836
CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
...
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x5a
print_report.cold+0x5e/0x5db
? __lock_text_start+0x8/0x8
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_report+0xb5/0x130
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_check_range+0xf5/0x1d0
memcpy+0x39/0x60
reserve_sfa_size+0x1ba/0x380 [openvswitch]
__add_action+0x24/0x120 [openvswitch]
ovs_nla_add_action+0xe/0x20 [openvswitch]
ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
? __kernel_text_address+0xe/0x30
? unwind_get_return_address+0x56/0xa0
? create_prof_cpu_mask+0x20/0x20
? ovs_ct_verify+0xf0/0xf0 [openvswitch]
? prep_compound_page+0x198/0x2a0
? __kasan_check_byte+0x10/0x40
? kasan_unpoison+0x40/0x70
? ksize+0x44/0x60
? reserve_sfa_size+0x75/0x380 [openvswitch]
__ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
? __zone_watermark_ok+0x420/0x420
? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
? __alloc_pages+0x1a9/0x3e0
? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
? unwind_next_frame+0x991/0x1e40
? __mod_node_page_state+0x99/0x120
? __mod_lruvec_page_state+0x2e3/0x470
? __kasan_kmalloc_large+0x90/0xe0
ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
...
Cc: stable@vger.kernel.org
Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9cb7c013420f98fa6fd12fc6a5dc055170c108db ]
Reads and Writes to ip6_rt_gc_expire always have been racy,
as syzbot reported lately [1]
There is a possible risk of under-flow, leading
to unexpected high value passed to fib6_run_gc(),
although I have not observed this in the field.
Hosts hitting ip6_dst_gc() very hard are under pretty bad
state anyway.
[1]
BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
dst_alloc+0x9b/0x160 net/core/dst.c:86
ip6_dst_alloc net/ipv6/route.c:344 [inline]
icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
mld_send_cr net/ipv6/mcast.c:2119 [inline]
mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
dst_alloc+0x9b/0x160 net/core/dst.c:86
ip6_dst_alloc net/ipv6/route.c:344 [inline]
icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
mld_send_cr net/ipv6/mcast.c:2119 [inline]
mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
value changed: 0x00000bb3 -> 0x00000ba9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec5b0f605b105457f257f2870acad4a5d463984b ]
While investigating a related syzbot report,
I found that whenever call to tcf_exts_init()
from u32_init_knode() is failing, we end up
with an elevated refcount on ht->refcnt
To avoid that, only increase the refcount after
all possible errors have been evaluated.
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab198e1d0dd8dc4bc7575fb50758e2cbd51e14e1 ]
Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
[ 82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[ 82.607380] Call Trace:
[ 82.609389] <TASK>
[ 82.611136] skb_push.cold.109+0x10/0x10
[ 82.614289] __gre6_xmit+0x41e/0x590
[ 82.617169] ip6gre_tunnel_xmit+0x344/0x3f0
[ 82.620526] dev_hard_start_xmit+0xf1/0x330
[ 82.623882] sch_direct_xmit+0xe4/0x250
[ 82.626961] __dev_queue_xmit+0x720/0xfe0
<...>
[ 82.633431] packet_sendmsg+0x96a/0x1cb0
[ 82.636568] sock_sendmsg+0x30/0x40
<...>
The following sequence of events caused the BUG:
1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
"tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
room, triggering the BUG.
As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.
Reproducer:
OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
ip netns add at_ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns at_ns0
ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip link set dev veth1 up mtu 1500
ip addr add dev veth1 172.16.1.200/24
ip netns exec at_ns0 ip addr add ::11/96 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip addr add dev veth1 ::22/96
ip link set dev veth1 up
ip netns exec at_ns0 \
ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
local ::11 remote ::22
ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
ip netns exec at_ns0 ip link set dev ip6gretap00 up
ip link add dev ip6gretap11 type ip6gretap external
ip addr add dev ip6gretap11 10.1.1.200/24
ip addr add dev ip6gretap11 fc80::200/24
ip link set dev ip6gretap11 up
tc qdisc add dev ip6gretap11 clsact
tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
ping6 -c 3 -w 10 -q ::11
Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f40c064e933d7787ca7411b699504d7a2664c1f5 ]
Do not update tunnel->tun_hlen in data plane code. Use a local variable
instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29e8e659f984be00d75ec5fef4e37c88def72712 ]
packet_sock xmit could be dev_queue_xmit, which also returns negative
errors. So only checking positive errors is not enough, or userspace
sendmsg may return success while packet is not send out.
Move the net_xmit_errno() assignment in the braces as checkpatch.pl said
do not use assignment in if condition.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a74e99323746353bba11562a2f2d0aa8102f402 ]
Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown
and fallback"), for a fallback connection, __smc_release() does not call
sock_put() if its state is already SMC_CLOSED.
When calling smc_shutdown() after falling back, its state is set to
SMC_CLOSED but does not call sock_put(), so this patch calls it.
Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee3b0826b4764f6c13ad6db67495c5a1c38e9025 ]
A recent patch[1] from Eric Dumazet flipped the order in which the
keepalive timer and the keepalive worker were cancelled in order to fix a
syzbot reported issue[2]. Unfortunately, this enables the mirror image bug
whereby the timer races with rxrpc_exit_net(), restarting the worker after
it has been cancelled:
CPU 1 CPU 2
=============== =====================
if (rxnet->live)
<INTERRUPT>
rxnet->live = false;
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_queue_work(&rxnet->peer_keepalive_work);
del_timer_sync(&rxnet->peer_keepalive_timer);
Fix this by restoring the removed del_timer_sync() so that we try to remove
the timer twice. If the timer runs again, it should see ->live == false
and not restart the worker.
Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5bd8baab087dff657e05387aee802e70304cc813 ]
Commit ebe48d368e97 ("esp: Fix possible buffer overflow in ESP
transformation") tried to fix skb_page_frag_refill usage in ESP by
capping allocsize to 32k, but that doesn't completely solve the issue,
as skb_page_frag_refill may return a single page. If that happens, we
will write out of bounds, despite the check introduced in the previous
patch.
This patch forces COW in cases where we would end up calling
skb_page_frag_refill with a size larger than a page (first in
esp_output_head with tailen, then in esp_output_tail with
skb->data_len).
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 82e31755e55fbcea6a9dfaae5fe4860ade17cbc0 upstream.
There are race conditions that may lead to UAF bugs in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
ax25_release() to deallocate ax25_dev.
One of the UAF bugs caused by ax25_release() is shown below:
(Thread 1) | (Thread 2)
ax25_dev_device_up() //(1) |
... | ax25_kill_by_device()
ax25_bind() //(2) |
ax25_connect() | ...
ax25_std_establish_data_link() |
ax25_start_t1timer() | ax25_dev_device_down() //(3)
mod_timer(&ax25->t1timer,..) |
| ax25_release()
(wait a time) | ...
| ax25_dev_put(ax25_dev) //(4)FREE
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //USE| ...
... |
We increase the refcount of ax25_dev in position (1) and (2), and
decrease the refcount of ax25_dev in position (3) and (4).
The ax25_dev will be freed in position (4) and be used in
ax25_t1timer_expiry().
The fail log is shown below:
==============================================================
[ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
[ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
[ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
[ 106.116942] Call Trace:
...
[ 106.116942] ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] call_timer_fn+0x122/0x3d0
[ 106.116942] __run_timers.part.0+0x3f6/0x520
[ 106.116942] run_timer_softirq+0x4f/0xb0
[ 106.116942] __do_softirq+0x1c2/0x651
...
This patch adds del_timer_sync() in ax25_release(), which could ensure
that all timers stop before we deallocate ax25_dev.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[OP: backport to 5.10: adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>