IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 1158f79f82d437093aeed87d57df0548bdd68146 ]
VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: Pudak, Filip <Filip.Pudak@windriver.com>
Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ]
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
tls_set_sw_offload(). The return value of crypto_aead_ivsize()
for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
memory space will trigger slab-out-of-bounds bug as following:
==================================================================
BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
Read of size 16 at addr ffff888114e84e60 by task tls/10911
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x5e/0x5db
? decrypt_internal+0x385/0xc40 [tls]
kasan_report+0xab/0x120
? decrypt_internal+0x385/0xc40 [tls]
kasan_check_range+0xf9/0x1e0
memcpy+0x20/0x60
decrypt_internal+0x385/0xc40 [tls]
? tls_get_rec+0x2e0/0x2e0 [tls]
? process_rx_list+0x1a5/0x420 [tls]
? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
decrypt_skb_update+0x9d/0x400 [tls]
tls_sw_recvmsg+0x3c8/0xb50 [tls]
Allocated by task 10911:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
tls_set_sw_offload+0x2eb/0xa20 [tls]
tls_setsockopt+0x68c/0x700 [tls]
__sys_setsockopt+0xfe/0x1b0
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7496b59f588dd52886fdbac7633608097543a0a5 ]
The socket layer requires that we use the socket lock to protect changes
to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a80a8461868905823609be97f91776a26befe839 ]
Currently, tasks marked as "swapper" tasks get put to the front of
non-priority rpc_queues, and are sorted earlier than non-swapper tasks on
the transport's ->xmit_queue.
This is pointless as currently *all* tasks for a mount that has swap
enabled on *any* file are marked as "swapper" tasks. So the net result
is that the non-priority rpc_queues are reverse-ordered (LIFO).
This scheduling boost is not necessary to avoid deadlocks, and hurts
fairness, so remove it. If there were a need to expedite some requests,
the tk_priority mechanism is a more appropriate tool.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ]
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything. So it
must not block waiting for memory.
xprt_dynamic_alloc_slot can block indefinitely. This can tie up all
workqueue threads and NFS can deadlock. So when called from a
workqueue, set __GFP_NORETRY.
The rdma alloc_slot already does not block. However it sets the error
to -EAGAIN suggesting this will trigger a sleep. It does not. As we
can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN
causes immediate retry.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c487216bec83b0c5a8803e5c61433d33ad7b104d ]
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.
So it must not block waiting for memory.
mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running. If all available
workqueue threads are waiting on the mempool, no thread is available to
return anything.
rpc_malloc() can block, and this might cause deadlocks.
So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if
blocking is acceptable.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f22881de730ebd472e15bcc2c0d1d46e36a87b9c ]
In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x21c/0x230
show_stack+0x1c/0x60
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0x70/0x2d0
__kasan_report+0x158/0x16c
kasan_report+0x74/0x120
__asan_load1+0x80/0xa0
netlbl_bitmap_walk+0x3c/0xd0
calipso_opt_getattr+0x1a8/0x230
calipso_sock_getattr+0x218/0x340
calipso_sock_getattr+0x44/0x60
netlbl_sock_getattr+0x44/0x80
selinux_netlbl_socket_setsockopt+0x138/0x170
selinux_socket_setsockopt+0x4c/0x60
security_socket_setsockopt+0x4c/0x90
__sys_setsockopt+0xbc/0x2b0
__arm64_sys_setsockopt+0x6c/0x84
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x88/0x200
do_el0_svc+0x88/0xa0
el0_svc+0x128/0x1b0
el0t_64_sync_handler+0x9c/0x120
el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ]
This fixes the following trace caused by receiving
HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without
first checking if conn->type is in fact AMP_LINK and in case it is
do properly cleanup upper layers with hci_disconn_cfm:
==================================================================
BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50
Read of size 8 at addr ffff88800e404818 by task bluetoothd/142
CPU: 0 PID: 142 Comm: bluetoothd Not tainted
5.17.0-rc5-00006-gda4022eeac1a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x1f/0x150
kasan_report.cold+0x7f/0x11b
hci_send_acl+0xaba/0xc50
l2cap_do_send+0x23f/0x3d0
l2cap_chan_send+0xc06/0x2cc0
l2cap_sock_sendmsg+0x201/0x2b0
sock_sendmsg+0xdc/0x110
sock_write_iter+0x20f/0x370
do_iter_readv_writev+0x343/0x690
do_iter_write+0x132/0x640
vfs_writev+0x198/0x570
do_writev+0x202/0x280
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05
<48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77
R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580
RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001
</TASK>
R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0
Allocated by task 45:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
hci_chan_create+0x9a/0x2f0
l2cap_conn_add.part.0+0x1a/0xdc0
l2cap_connect_cfm+0x236/0x1000
le_conn_complete_evt+0x15a7/0x1db0
hci_le_conn_complete_evt+0x226/0x2c0
hci_le_meta_evt+0x247/0x450
hci_event_packet+0x61b/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
Freed by task 45:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xfb/0x130
kfree+0xac/0x350
hci_conn_cleanup+0x101/0x6a0
hci_conn_del+0x27e/0x6c0
hci_disconn_phylink_complete_evt+0xe0/0x120
hci_event_packet+0x812/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88800c0f0500
The buggy address is located 24 bytes inside of
which belongs to the cache kmalloc-128 of size 128
The buggy address belongs to the page:
128-byte region [ffff88800c0f0500, ffff88800c0f0580)
flags: 0x100000000000200(slab|node=0|zone=1)
page:00000000fe45cd86 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xc0f0
raw: 0000000000000000 0000000080100010 00000001ffffffff
0000000000000000
raw: 0100000000000200 ffffea00003a2c80 dead000000000004
ffff8880078418c0
page dumped because: kasan: bad access detected
ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
Memory state around the buggy address:
>ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
==================================================================
ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-by: Sönke Huster <soenke.huster@eknoes.de>
Tested-by: Sönke Huster <soenke.huster@eknoes.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 155fb43b70b5fce341347a77d1af2765d1e8fbb8 ]
Property list (altname is a link "property") is wrapped
in a nlattr. nlattrs length is 16bit so practically
speaking the list of properties can't be longer than
that, otherwise user space would have to interpret
broken netlink messages.
Prevent the problem from occurring by checking the length
of the property list before adding new entries.
Reported-by: George Shuklin <george.shuklin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d26cff5bdbebdf98ba48217c078ff102536f134 ]
George reports that altnames can eat up kernel memory.
We should charge that memory appropriately.
Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ]
The N_As value describes the time a CAN frame needs on the wire when
transmitted by the CAN controller. Even very short CAN FD frames need
arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s).
Having N_As to be zero (the former default) leads to 'no CAN frame
separation' when STmin is set to zero by the receiving node. This 'burst
mode' should not be enabled by default as it could potentially dump a high
number of CAN frames into the netdev queue from the soft hrtimer context.
This does not affect the system stability but is just not nice and
cooperative.
With this N_As/frame_txtime value the 'burst mode' is disabled by default.
As user space applications usually do not set the frame_txtime element
of struct can_isotp_options the new in-kernel default is very likely
overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked.
To make sure that a N_As value of zero is only set intentional the
value '0' is now interpreted as 'do not change the current value'.
When a frame_txtime of zero is required for testing purposes this
CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime.
Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 ]
rmbe_update_limit is used to limit announcing receive
window updating too frequently. RFC7609 request a minimal
increase in the window size of 10% of the receive buffer
space. But current implementation used:
min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)
and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost
always less then 10% of the receive buffer space.
This causes the receiver always sending CDC message to
update its consumer cursor when it consumes more then 2K
of data. And as a result, we may encounter something like
"TCP silly window syndrome" when sending 2.5~8K message.
This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).
With this patch and SMC autocorking enabled, qperf 2K/4K/8K
tcp_bw test shows 45%/75%/40% increase in throughput respectively.
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ]
In case user space sends a packet destined to a broadcast address when a
matching broadcast route is not configured, the kernel will create a
unicast neighbour entry that will never be resolved [1].
When the broadcast route is configured, the unicast neighbour entry will
not be invalidated and continue to linger, resulting in packets being
dropped.
Solve this by invalidating unresolved neighbour entries for broadcast
addresses after routes for these addresses are internally configured by
the kernel. This allows the kernel to create a broadcast neighbour entry
following the next route lookup.
Another possible solution that is more generic but also more complex is
to have the ARP code register a listener to the FIB notification chain
and invalidate matching neighbour entries upon the addition of broadcast
routes.
It is also possible to wave off the issue as a user space problem, but
it seems a bit excessive to expect user space to be that intimately
familiar with the inner workings of the FIB/neighbour kernel code.
[1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.com/
Reported-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f9bf2a2f5aacf988e6d5e56b961ba45c5a25248 ]
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket:🔒
local_bh_disable() + spin_lock(&ilb->lock):
inet_listen()
inet_csk_listen_start()
sk->sk_prot->hash() := inet_hash()
local_bh_disable()
__inet_hash()
spin_lock(&ilb->lock);
acquire(&ilb->lock);
Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
tcp_seq_next()
listening_get_next()
spin_lock(&ilb2->lock);
acquire(&ilb2->lock);
tcp4_seq_show()
get_tcp4_sock()
sock_i_ino()
read_lock_bh(&sk->sk_callback_lock);
acquire(softirq_ctrl) // <---- whoops
acquire(&sk->sk_callback_lock)
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ]
This fixes minor data-races in ip6_mc_input() and
batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5666ee154f4696c011dfa8544aaf5591b6b87515 ]
When adding 6GHz channels to scan request based on reported
co-located APs, don't add channels that have only APs with
"non-transmitted" BSSes if they only match the wildcard SSID since
they will be found by probing the "transmitted" BSS.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d3644f292d00dfe0f42653269f2559e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4421a582718ab81608d8486734c18083b822390d ]
Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:
offsetof(struct bpf_sock, dst_port) + 0 <port MSB>
+ 8 <port LSB>
+16 0x00
+24 0x00
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
Reported-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f19c44452b58a84d95e209b847f5495d91c9983a upstream.
IPv6 nd target mask was not getting populated in flow dump.
In the function __ovs_nla_put_key the icmp code mask field was checked
instead of icmp code key field to classify the flow as neighbour discovery.
ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
dst=00:00:00:00:00:00/00:00:00:00:00:00),
eth_type(0x86dd),
ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
icmpv6(type=135,code=0),
nd(target=2001::2/::,
sll=00:00:00:00:00:00/00:00:00:00:00:00,
tll=00:00:00:00:00:00/00:00:00:00:00:00),
packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2
Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c)
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6340dcbd619450c1bb55eb999e554e4f0e6dab0a upstream.
The commit b37a46683739 ("netdevice: add the case if dev is NULL") changed
the way how the NULL check for net_devices have to be handled when trying
to reduce its reference counter. Before this commit, it was the
responsibility of the caller to check whether the object is NULL or not.
But it was changed to behave more like kfree. Now the callee has to handle
the NULL-case.
The batman-adv code was scanned via cocinelle for similar places. These
were changed to use the paradigm
@@
identifier E, T, R, C;
identifier put;
@@
void put(struct T *E)
{
+ if (!E)
+ return;
kref_put(&E->C, R);
}
Functions which were used in other sources files were moved to the header
to allow the compiler to inline the NULL check and the kref_put call.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e382fea8ae54f5bb62869c6b69b33993d43adeca ]
In commit 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when
reading from socket") a new check for recvmsg flags has been
introduced that only checked for the flags that are handled in
isotp_recvmsg() itself.
This accidentally removed the MSG_PEEK feature flag which is processed
later in the call chain in __skb_try_recv_from_queue().
Add MSG_PEEK to the set of valid flags to restore the feature.
Fixes: 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket")
Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1079554254
Link: https://lore.kernel.org/all/20220328113611.3691-1-socketcan@hartkopp.net
Reported-by: Derek Will <derekrobertwill@gmail.com>
Suggested-by: Derek Will <derekrobertwill@gmail.com>
Tested-by: Derek Will <derekrobertwill@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4a7f62f91933c8ae5308f9127fd8ea48188b6bc3 upstream.
The rxrpc_call struct has a timer used to handle various timed events
relating to a call. This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call. This causes the timer - which
was already stopped - to get restarted. Later, the timer dispatch code may
then oops if the timer got deallocated first.
Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer. If the timer was already
running, the ref is discarded.
The timer completion routine can then pass the ref along to the call's work
item when it queues it. If the timer or work item where already
queued/running, the extra ref is discarded.
Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html
Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7781607938c8371d4c2b243527430241c62e39c2 ]
When the link layer is terminating, x25->neighbour will be set to NULL
in x25_disconnect(). As a result, it could cause null-ptr-deref bugs in
x25_sendmsg(),x25_recvmsg() and x25_connect(). One of the bugs is
shown below.
(Thread 1) | (Thread 2)
x25_link_terminated() | x25_recvmsg()
x25_kill_by_neigh() | ...
x25_disconnect() | lock_sock(sk)
... | ...
x25->neighbour = NULL //(1) |
... | x25->neighbour->extended //(2)
The code sets NULL to x25->neighbour in position (1) and dereferences
x25->neighbour in position (2), which could cause null-ptr-deref bug.
This patch adds lock_sock() in x25_kill_by_neigh() in order to synchronize
with x25_sendmsg(), x25_recvmsg() and x25_connect(). What`s more, the
sock held by lock_sock() is not NULL, because it is extracted from x25_list
and uses x25_list_lock to synchronize.
Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2dd495a8d589371289981d5ed33e6873df94ecc ]
Do not reset IP_CT_TCP_FLAG_BE_LIBERAL flag in out-of-sync scenarios
coming before the TCP window tracking, otherwise such connections will
fail in the window check.
Update tcp_options() to leave this flag in place and add a new helper
function to reset the tcp window state.
Based on patch from Sven Auhagen.
Fixes: c4832c7bbc3f ("netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking")
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a7d8cff4a3301087dd139293e9bddcf63827282 ]
In the timer callback function tipc_sk_timeout(), we're trying to
reschedule another timeout to retransmit a setup request if destination
link is congested. But we use the incorrect timeout value
(msecs_to_jiffies(100)) instead of (jiffies + msecs_to_jiffies(100)),
so that the timer expires immediately, it's irrelevant for original
description.
In this commit we correct the timeout value in sk_reset_timer()
Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket")
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20220321042229.314288-1-hoang.h.le@dektech.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60b44ca6bd7518dd38fa2719bc9240378b6172c3 ]
During NAT, a tuple collision may occur. When this happens, openvswitch
will make a second pass through NAT which will perform additional packet
modification. This will update the skb data, but not the flow key that
OVS uses. This means that future flow lookups, and packet matches will
have incorrect data. This has been supported since
5d50aa83e2c8 ("openvswitch: support asymmetric conntrack").
That commit failed to properly update the sw_flow_key attributes, since
it only called the ovs_ct_nat_update_key once, rather than each time
ovs_ct_nat_execute was called. As these two operations are linked, the
ovs_ct_nat_execute() function should always make sure that the
sw_flow_key is updated after a successful call through NAT infrastructure.
Fixes: 5d50aa83e2c8 ("openvswitch: support asymmetric conntrack")
Cc: Dumitru Ceara <dceara@redhat.com>
Cc: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220318124319.3056455-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed0c99dc0f499ff8b6e75b5ae6092ab42be1ad39 ]
tp->rx_opt.mss_clamp is not populated, yet, during TFO send so we
rise it to the local MSS. tp->mss_cache is not updated, however:
tcp_v6_connect():
tp->rx_opt.mss_clamp = IPV6_MIN_MTU - headers;
tcp_connect():
tcp_connect_init():
tp->mss_cache = min(mtu, tp->rx_opt.mss_clamp)
tcp_send_syn_data():
tp->rx_opt.mss_clamp = tp->advmss
After recent fixes to ICMPv6 PTB handling we started dropping
PMTU updates higher than tp->mss_cache. Because of the stale
tp->mss_cache value PMTU updates during TFO are always dropped.
Thanks to Wei for helping zero in on the problem and the fix!
Fixes: c7bb4b89033b ("ipv6: tcp: drop silly ICMPv6 packet too big messages")
Reported-by: Andre Nash <alnash@fb.com>
Reported-by: Neil Spring <ntspring@fb.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220321165957.1769954-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0caf6d9922192dd1afa8dc2131abfb4df1443b9f ]
When a netlink message is received, netlink_recvmsg() fills in the address
of the sender. One of the fields is the 32-bit bitfield nl_groups, which
carries the multicast group on which the message was received. The least
significant bit corresponds to group 1, and therefore the highest group
that the field can represent is 32. Above that, the UB sanitizer flags the
out-of-bounds shift attempts.
Which bits end up being set in such case is implementation defined, but
it's either going to be a wrong non-zero value, or zero, which is at least
not misleading. Make the latter choice deterministic by always setting to 0
for higher-numbered multicast groups.
To get information about membership in groups >= 32, userspace is expected
to use nl_pktinfo control messages[0], which are enabled by NETLINK_PKTINFO
socket option.
[0] https://lwn.net/Articles/147608/
The way to trigger this issue is e.g. through monitoring the BRVLAN group:
# bridge monitor vlan &
# ip link add name br type bridge
Which produces the following citation:
UBSAN: shift-out-of-bounds in net/netlink/af_netlink.c:162:19
shift exponent 32 is too large for 32-bit type 'int'
Fixes: f7fa9b10edbb ("[NETLINK]: Support dynamic number of multicast groups per netlink family")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2bef6aabf201d1fc16cca139a744700cff9dcb04.1647527635.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 544b4dd568e3b09c1ab38a759d3187e7abda11a0 ]
The PMTU update and ICMP redirect helper functions initialise their fl4
variable with either __build_flow_key() or build_sk_flow_key(). These
initialisation functions always set ->flowi4_scope with
RT_SCOPE_UNIVERSE and might set the ECN bits of ->flowi4_tos. This is
not a problem when the route lookup is later done via
ip_route_output_key_hash(), which properly clears the ECN bits from
->flowi4_tos and initialises ->flowi4_scope based on the RTO_ONLINK
flag. However, some helpers call fib_lookup() directly, without
sanitising the tos and scope fields, so the route lookup can fail and,
as a result, the ICMP redirect or PMTU update aren't taken into
account.
Fix this by extracting the ->flowi4_tos and ->flowi4_scope sanitisation
code into ip_rt_fix_tos(), then use this function in handlers that call
fib_lookup() directly.
Note 1: We can't sanitise ->flowi4_tos and ->flowi4_scope in a central
place (like __build_flow_key() or flowi4_init_output()), because
ip_route_output_key_hash() expects non-sanitised values. When called
with sanitised values, it can erroneously overwrite RT_SCOPE_LINK with
RT_SCOPE_UNIVERSE in ->flowi4_scope. Therefore we have to be careful to
sanitise the values only for those paths that don't call
ip_route_output_key_hash().
Note 2: The problem is mostly about sanitising ->flowi4_tos. Having
->flowi4_scope initialised with RT_SCOPE_UNIVERSE instead of
RT_SCOPE_LINK probably wasn't really a problem: sockets with the
SOCK_LOCALROUTE flag set (those that'd result in RTO_ONLINK being set)
normally shouldn't receive ICMP redirects or PMTU updates.
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fa6b4cda3b414e990f008f45f9bcecbcb54d4d1 ]
hci_le_conn_failed function's documentation says that the caller must
hold hdev->lock. The only callsite that does not hold that lock is
hci_le_conn_failed. The other 3 callsites hold the hdev->lock very
locally. The solution is to hold the lock during the call to
hci_le_conn_failed.
Fixes: 3c857757ef6e ("Bluetooth: Add directed advertising support through connect()")
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42bf50a1795a1854d48717b7361dbdbce496b16b ]
When providing the MSG_TRUNC flag via recvmsg() syscall the return value
provides the real length of the packet or datagram, even when it was longer
than the passed buffer.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1065932671
Link: https://lore.kernel.org/all/20220316164258.54155-3-socketcan@hartkopp.net
Suggested-by: Derek Will <derekrobertwill@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30ffd5332e06316bd69a654c06aa033872979b7c ]
When reading from an unbound can-isotp socket the syscall blocked
indefinitely. As unbound sockets (without given CAN address information)
do not make sense anyway we directly return -EADDRNOTAVAIL on read()
analogue to the known behavior from sendmsg().
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/linux-can/can-utils/issues/349
Link: https://lore.kernel.org/all/20220316164258.54155-2-socketcan@hartkopp.net
Suggested-by: Derek Will <derekrobertwill@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2486ab434b2c2a14e9237296db00b1e1b7ae3273 ]
If tcp_bpf_sendmsg is running during a tear down operation, psock may be
freed.
tcp_bpf_sendmsg()
tcp_bpf_send_verdict()
sk_msg_return()
tcp_bpf_sendmsg_redir()
unlikely(!psock))
sk_msg_free()
The mem of msg has been uncharged in tcp_bpf_send_verdict() by
sk_msg_return(), and would be uncharged by sk_msg_free() again. When psock
is null, we can simply returning an error code, this would then trigger
the sk_msg_free_nocharge in the error path of __SK_REDIRECT and would have
the side effect of throwing an error up to user space. This would be a
slight change in behavior from user side but would look the same as an
error if the redirect on the socket threw an error.
This issue can cause the following info:
WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-5-wangyufen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c34e38c4a870eb30b13f42f5b44f42e9d19ccb8 ]
If tcp_bpf_sendmsg() is running while sk msg is full. When sk_msg_alloc()
returns -ENOMEM error, tcp_bpf_sendmsg() goes to wait_for_memory. If partial
memory has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is
greater than osize after sk_msg_alloc(), memleak occurs. To fix we use
sk_msg_trim() to release the allocated memory, then goto wait for memory.
Other call paths of sk_msg_alloc() have the similar issue, such as
tls_sw_sendmsg(), so handle sk_msg_trim logic inside sk_msg_alloc(),
as Cong Wang suggested.
This issue can cause the following info:
WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
__tcp_close+0x279/0x470
tcp_close+0x1f/0x60
inet_release+0x3f/0x80
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x92/0x250
task_work_run+0x6a/0xa0
do_exit+0x33b/0xb60
do_group_exit+0x2f/0xa0
get_signal+0xb6/0x950
arch_do_signal_or_restart+0xac/0x2a0
exit_to_user_mode_prepare+0xa9/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
kthread+0xe6/0x110
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-3-wangyufen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a4a600dd301ccde6ea239804ec1f19364a39d643 upstream.
When enabling encap for a ipv6 socket without udp_encap_needed_key
increased, UDP GRO won't work for v4 mapped v6 address packets as
sk will be NULL in udp4_gro_receive().
This patch is to enable it by increasing udp_encap_needed_key for
v6 sockets in udp_tunnel_encap_enable(), and correspondingly
decrease udp_encap_needed_key in udpv6_destroy_sock().
v1->v2:
- add udp_encap_disable() and export it.
v2->v3:
- add the change for rxrpc and bareudp into one patch, as Alex
suggested.
v3->v4:
- move rxrpc part to another patch.
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Antonio Quartulli <antonio@openvpn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ea566422cbde9610c2734980d1286ab681bb40e upstream.
Syzbot created an environment that lead to a state machine status that
can not be reached with a compliant CAN ID address configuration.
The provided address information consisted of CAN ID 0x6000001 and 0xC28001
which both boil down to 11 bit CAN IDs 0x001 in sending and receiving.
Sanitize the SFF/EFF CAN ID values before performing the address checks.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220316164258.54155-1-socketcan@hartkopp.net
Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3848e96edf4788f772d83990022fa7023a233d83 upstream.
xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
Both xprt_unlock_connect() and xprt_release() call
->release_xprt()
which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
which calls mod_timer().
This may result in mod_timer() being called *after* del_timer_sync().
When this happens, the timer may fire long after the xprt has been freed,
and run_timer_softirq() will probably crash.
The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
always called under ->transport_lock. So if we take ->transport_lock to
call del_timer_sync(), we can be sure that mod_timer() will run first
(if it runs at all).
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9a564bccb78a76740ea9d75a259942df8143d02c ]
Add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
to initialize the buffer of supp_skb to fix a kernel-info-leak issue.
1) Function pfkey_register calls compose_sadb_supported to request
a sk_buff. 2) compose_sadb_supported calls alloc_sbk to allocate
a sk_buff, but it doesn't zero it. 3) If auth_len is greater 0, then
compose_sadb_supported treats the memory as a struct sadb_supported and
begins to initialize. But it just initializes the field sadb_supported_len
and field sadb_supported_exttype without field sadb_supported_reserved.
Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ff2980b6bd2aa6b4ded3ce3b7c0ccfab29980af ]
in tunnel mode, if outer interface(ipv4) is less, it is easily to let
inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
is received. When send again, packets are fragmentized with 1280, they
are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().
According to RFC4213 Section3.2.2:
if (IPv4 path MTU - 20) is less than 1280
if packet is larger than 1280 bytes
Send ICMPv6 "packet too big" with MTU=1280
Drop packet
else
Encapsulate but do not set the Don't Fragment
flag in the IPv4 header. The resulting IPv4
packet might be fragmented by the IPv4 layer
on the encapsulator or by some router along
the IPv4 path.
endif
else
if packet is larger than (IPv4 path MTU - 20)
Send ICMPv6 "packet too big" with
MTU = (IPv4 path MTU - 20).
Drop packet.
else
Encapsulate and set the Don't Fragment flag
in the IPv4 header.
endif
endif
Packets should be fragmentized with ipv4 outer interface, so change it.
After it is fragemtized with ipv4, there will be double fragmenation.
No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
cannot decrypt it rightly.
48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000)
51 2002::10 2002::11 180 Echo (ping) request
52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)
xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4a2d4496e15ea5bb5c8e83b94ca8ca7fb045e7d3 upstream.
While commit 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving
mesh") fixed a memory leak on mesh leave / teardown it introduced a
potential memory corruption caused by a double free when rejoining the
mesh:
ieee80211_leave_mesh()
-> kfree(sdata->u.mesh.ie);
...
ieee80211_join_mesh()
-> copy_mesh_setup()
-> old_ie = ifmsh->ie;
-> kfree(old_ie);
This double free / kernel panics can be reproduced by using wpa_supplicant
with an encrypted mesh (if set up without encryption via "iw" then
ifmsh->ie is always NULL, which avoids this issue). And then calling:
$ iw dev mesh0 mesh leave
$ iw dev mesh0 mesh join my-mesh
Note that typically these commands are not used / working when using
wpa_supplicant. And it seems that wpa_supplicant or wpa_cli are going
through a NETDEV_DOWN/NETDEV_UP cycle between a mesh leave and mesh join
where the NETDEV_UP resets the mesh.ie to NULL via a memcpy of
default_mesh_setup in cfg80211_netdev_notifier_call, which then avoids
the memory corruption, too.
The issue was first observed in an application which was not using
wpa_supplicant but "Senf" instead, which implements its own calls to
nl80211.
Fixing the issue by removing the kfree()'ing of the mesh IE in the mesh
join function and leaving it solely up to the mesh leave to free the
mesh IE.
Cc: stable@vger.kernel.org
Fixes: 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving mesh")
Reported-by: Matthias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
Tested-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Link: https://lore.kernel.org/r/20220310183513.28589-1-linus.luessing@c0d3.blue
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 764f4eb6846f5475f1244767d24d25dd86528a4a upstream.
Whenever llc_ui_bind() and/or llc_ui_autobind()
took a reference on a netdevice but subsequently fail,
they must properly release their reference
or risk the infamous message from unregister_netdevice()
at device dismantle.
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: 赵子轩 <beraphin@gmail.com>
Reported-by: Stoyan Manolov <smanolov@suse.de>
Link: https://lore.kernel.org/r/20220323004147.1990845-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebe48d368e97d007bfeb76fcb065d6cfc4c96645 upstream.
The maximum message size that can be send is bigger than
the maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.
Fix this by doing a fallback to COW in that case.
v2:
Avoid get get_order() costs as suggested by Linus Torvalds.
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: valis <sec@valis.email>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb0b430b4e3acc88c85e0ad2e25f2a25a5765262 ]
The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
Fixes: 6d4e5c570c2d ("net: dsa: get port type at parse time")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4db4075f92af2b28f415fc979ab626e6b37d67b6 ]
Commit 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:
IPv6 | ext | ... | ext | UDP | ESP | ...
That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.
In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).
Fixes: 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>