IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 487082fb7bd2a32b66927d2b22e3a81b072b44f0 ]
When user application calls read() with MSG_PEEK flag to read data
of bpf sockmap socket, kernel panic happens at
__tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
queue after read out under MSG_PEEK flag is set. Because it's not
judged whether sk_msg is the last msg of ingress_msg queue, the next
sk_msg may be the head of ingress_msg queue, whose memory address of
sg page is invalid. So it's necessary to add check codes to prevent
this problem.
[20759.125457] BUG: kernel NULL pointer dereference, address:
0000000000000008
[20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
5.4.32 #1
[20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
4.1.12 06/18/2017
[20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
[20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
[20759.276099] tcp_bpf_recvmsg+0x113/0x370
[20759.281137] inet_recvmsg+0x55/0xc0
[20759.285734] __sys_recvfrom+0xc8/0x130
[20759.290566] ? __audit_syscall_entry+0x103/0x130
[20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
[20759.301700] ? __audit_syscall_exit+0x1e4/0x290
[20759.307235] __x64_sys_recvfrom+0x24/0x30
[20759.312226] do_syscall_64+0x55/0x1b0
[20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: dihu <anny.hu@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200605084625.9783-1-anny.hu@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32f71aa497cfb23d37149c2ef16ad71fce2e45e2 ]
The user ID value isn't actually much use - and leaks a kernel pointer or a
userspace value - so replace it with the call debug ID, which appears in trace
points.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 118917d696dc59fd3e1741012c2f9db2294bed6f ]
Fix off-by-one issues in 'rpc_ntop6':
- 'snprintf' returns the number of characters which would have been
written if enough space had been available, excluding the terminating
null byte. Thus, a return value of 'sizeof(scopebuf)' means that the
last character was dropped.
- 'strcat' adds a terminating null byte to the string, thus if len ==
buflen, the null byte is written past the end of the buffer.
Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33a7c831565c43a7ee2f38c7df4c4a40e1dfdfed ]
When sockhash gets destroyed while sockets are still linked to it, we will
walk the bucket lists and delete the links. However, we are not freeing the
list elements after processing them, leaking the memory.
The leak can be triggered by close()'ing a sockhash map when it still
contains sockets, and observed with kmemleak:
unreferenced object 0xffff888116e86f00 (size 64):
comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff ...A.....i/.....
backtrace:
[<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760
[<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200
[<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990
[<00000000d0084618>] do_syscall_64+0xad/0x9a0
[<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
Fix it by freeing the list element when we're done with it.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 24c5efe41c29ee3e55bcf5a1c9f61ca8709622e8 upstream.
gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each
flavour, but gss_mech_unregister() does not call auth_domain_put().
This is unbalanced and makes it impossible to reload the module.
Change svcauth_gss_register_pseudoflavor() to return the registered
auth_domain, and save it for later release.
Cc: stable@vger.kernel.org (v2.6.12+)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d47a5dc2888fd1b94adf1553068b8dad76cec96c upstream.
There is no valid case for supporting duplicate pseudoflavor
registrations.
Currently the silent acceptance of such registrations is hiding a bug.
The rpcsec_gss_krb5 module registers 2 flavours but does not unregister
them, so if you load, unload, reload the module, it will happily
continue to use the old registration which now has pointers to the
memory were the module was originally loaded. This could lead to
unexpected results.
So disallow duplicate registrations.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651
Cc: stable@vger.kernel.org (v2.6.12+)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e91de6afa81c10e9f855c5695eb9a53168d96b73 ]
KTLS uses a stream parser to collect TLS messages and send them to
the upper layer tls receive handler. This ensures the tls receiver
has a full TLS header to parse when it is run. However, when a
socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
is enabled we end up with two stream parsers running on the same
socket.
The result is both try to run on the same socket. First the KTLS
stream parser runs and calls read_sock() which will tcp_read_sock
which in turn calls tcp_rcv_skb(). This dequeues the skb from the
sk_receive_queue. When this is done KTLS code then data_ready()
callback which because we stacked KTLS on top of the bpf stream
verdict program has been replaced with sk_psock_start_strp(). This
will in turn kick the stream parser again and eventually do the
same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
a skb from the sk_receive_queue.
At this point the data stream is broke. Part of the stream was
handled by the KTLS side some other bytes may have been handled
by the BPF side. Generally this results in either missing data
or more likely a "Bad Message" complaint from the kTLS receive
handler as the BPF program steals some bytes meant to be in a
TLS header and/or the TLS header length is no longer correct.
We've already broke the idealized model where we can stack ULPs
in any order with generic callbacks on the TX side to handle this.
So in this patch we do the same thing but for RX side. We add
a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
program is running and add a tls_sw_has_ctx_rx() helper so BPF
side can learn there is a TLS ULP on the socket.
Then on BPF side we omit calling our stream parser to avoid
breaking the data stream for the KTLS receiver. Then on the
KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
receiver is done with the packet but before it posts the
msg to userspace. This gives us symmetry between the TX and
RX halfs and IMO makes it usable again. On the TX side we
process packets in this order BPF -> TLS -> TCP and on
the receive side in the reverse order TCP -> TLS -> BPF.
Discovered while testing OpenSSL 3.0 Alpha2.0 release.
Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca2f5f21dbbd5e3a00cd3e97f728aa2ca0b2e011 ]
We will need this block of code called from tls context shortly
lets refactor the redirect logic so its easy to use. This also
cleans up the switch stmt so we have fewer fallthrough cases.
No logic changes are intended.
Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/159079360110.5745.7024009076049029819.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d7c83463fdf7841350f37960a7abadd3e650b41 ]
Instead of EINVAL which should be used for malformed netlink messages.
Fixes: eb31628e37a0 ("netfilter: nf_tables: Add support for IPv6 NAT")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ad346c90509ebd983f60da7d082f261ad329507 ]
The commit 8c46fcd78308 ("batman-adv: disable ethtool link speed detection
when auto negotiation off") disabled the usage of ethtool's link_ksetting
when auto negotation was enabled due to invalid values when used with
tun/tap virtual net_devices. According to the patch, automatic measurements
should be used for these kind of interfaces.
But there are major flaws with this argumentation:
* automatic measurements are not implemented
* auto negotiation has nothing to do with the validity of the retrieved
values
The first point has to be fixed by a longer patch series. The "validity"
part of the second point must be addressed in the same patch series by
dropping the usage of ethtool's link_ksetting (thus always doing automatic
measurements over ethernet).
Drop the patch again to have more default values for various net_device
types/configurations. The user can still overwrite them using the
batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56b5453a86203a44726f523b4133c1feca49ce7c ]
Bluetooth PTS test case HFP/AG/ACC/BI-12-I accepts SCO connection
with invalid parameter at the first SCO request expecting AG to
attempt another SCO request with the use of "safe settings" for
given codec, base on section 5.7.1.2 of HFP 1.7 specification.
This patch addresses it by adding "Invalid LMP Parameters" (0x1e)
to the SCO fallback case. Verified with below log:
< HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
Handle: 256
Transmit bandwidth: 8000
Receive bandwidth: 8000
Max latency: 13
Setting: 0x0003
Input Coding: Linear
Input Data Format: 1's complement
Input Sample Size: 8-bit
# of bits padding at MSB: 0
Air Coding Format: Transparent Data
Retransmission effort: Optimize for link quality (0x02)
Packet type: 0x0380
3-EV3 may not be used
2-EV5 may not be used
3-EV5 may not be used
> HCI Event: Command Status (0x0f) plen 4
Setup Synchronous Connection (0x01|0x0028) ncmd 1
Status: Success (0x00)
> HCI Event: Number of Completed Packets (0x13) plen 5
Num handles: 1
Handle: 256
Count: 1
> HCI Event: Max Slots Change (0x1b) plen 3
Handle: 256
Max slots: 1
> HCI Event: Synchronous Connect Complete (0x2c) plen 17
Status: Invalid LMP Parameters / Invalid LL Parameters (0x1e)
Handle: 0
Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC)
Link type: eSCO (0x02)
Transmission interval: 0x00
Retransmission window: 0x02
RX packet length: 0
TX packet length: 0
Air mode: Transparent (0x03)
< HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
Handle: 256
Transmit bandwidth: 8000
Receive bandwidth: 8000
Max latency: 8
Setting: 0x0003
Input Coding: Linear
Input Data Format: 1's complement
Input Sample Size: 8-bit
# of bits padding at MSB: 0
Air Coding Format: Transparent Data
Retransmission effort: Optimize for link quality (0x02)
Packet type: 0x03c8
EV3 may be used
2-EV3 may not be used
3-EV3 may not be used
2-EV5 may not be used
3-EV5 may not be used
> HCI Event: Command Status (0x0f) plen 4
Setup Synchronous Connection (0x01|0x0028) ncmd 1
Status: Success (0x00)
> HCI Event: Max Slots Change (0x1b) plen 3
Handle: 256
Max slots: 5
> HCI Event: Max Slots Change (0x1b) plen 3
Handle: 256
Max slots: 1
> HCI Event: Synchronous Connect Complete (0x2c) plen 17
Status: Success (0x00)
Handle: 257
Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC)
Link type: eSCO (0x02)
Transmission interval: 0x06
Retransmission window: 0x04
RX packet length: 30
TX packet length: 30
Air mode: Transparent (0x03)
Signed-off-by: Hsin-Yu Chao <hychao@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c96b6acc8f89a4a7f6258dfe1d077654c11415be ]
There are some memory leaks in dccp_init() and dccp_fini().
In dccp_fini() and the error handling path in dccp_init(), free lhash2
is missing. Add inet_hashinfo2_free_mod() to do it.
If inet_hashinfo2_init_mod() failed in dccp_init(),
percpu_counter_destroy() should be called to destroy dccp_orphan_count.
It need to goto out_free_percpu when inet_hashinfo2_init_mod() failed.
Fixes: c92c81df93df ("net: dccp: fix kernel crash on module load")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5c3e82fe159622e46e91458c1a6509c321a62820 ]
We should iterate over the datamsgs to move
all chunks(skbs) to newsk.
The following case cause the bug:
for the trouble SKB, it was in outq->transmitted list
sctp_outq_sack
sctp_check_transmitted
SKB was moved to outq->sacked list
then throw away the sack queue
SKB was deleted from outq->sacked
(but it was held by datamsg at sctp_datamsg_to_asoc
So, sctp_wfree was not called here)
then migrate happened
sctp_for_each_tx_datachunk(
sctp_clear_owner_w);
sctp_assoc_migrate();
sctp_for_each_tx_datachunk(
sctp_set_owner_w);
SKB was not in the outq, and was not changed to newsk
finally
__sctp_outq_teardown
sctp_chunk_put (for another skb)
sctp_datamsg_put
__kfree_skb(msg->frag_list)
sctp_wfree (for SKB)
SKB->sk was still oldsk (skb->sk != asoc->base.sk).
Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Acked-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 582eea230536a6f104097dd46205822005d5fe3a ]
Under certain circumstances, depending on the order of addresses on the
interfaces, it could be that sctp_v[46]_get_dst() would return a dst
with a mismatched struct flowi.
For example, if when walking through the bind addresses and the first
one is not a match, it saves the dst as a fallback (added in
410f03831c07), but not the flowi. Then if the next one is also not a
match, the previous dst will be returned but with the flowi information
for the 2nd address, which is wrong.
The fix is to use a locally stored flowi that can be used for such
attempts, and copy it to the parameter only in case it is a possible
match, together with the corresponding dst entry.
The patch updates IPv6 code mostly just to be in sync. Even though the issue
is also present there, it fallback is not expected to work with IPv6.
Fixes: 410f03831c07 ("sctp: add routing output fallback")
Reported-by: Jin Meng <meng.a.jin@nokia-sbell.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fbe4e0c1b298b4665ee6915266c9d6c5b934ef4a ]
fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head,
tb_hlist) without rcu_read_lock() will trigger a warning,
net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by proc01/115277:
#0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
fib_triestat_seq_show+0x750/0x880
seq_read+0x1a0/0x670
proc_reg_read+0x10c/0x1b0
__vfs_read+0x3c/0x70
vfs_read+0xac/0x170
ksys_read+0x7c/0x140
system_call+0x5c/0x68
Fix it by adding a pair of rcu_read_lock/unlock() and use
cond_resched_rcu() to avoid the situation where walking of a large
number of items may prevent scheduling for a long time.
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53fc685243bd6fb90d90305cea54598b78d3cbfc ]
When neighbor suppression is enabled the bridge device might reply to
Neighbor Solicitation (NS) messages on behalf of remote hosts.
In case the NS message includes the "Source link-layer address" option
[1], the bridge device will use the specified address as the link-layer
destination address in its reply.
To avoid an infinite loop, break out of the options parsing loop when
encountering an option with length zero and disregard the NS message.
This is consistent with the IPv6 ndisc code and RFC 4886 which states
that "Nodes MUST silently discard an ND packet that contains an option
with length zero" [2].
[1] https://tools.ietf.org/html/rfc4861#section-4.3
[2] https://tools.ietf.org/html/rfc4861#section-4.6
Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alla Segal <allas@mellanox.com>
Tested-by: Alla Segal <allas@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 79a1f0ccdbb4ad700590f61b00525b390cb53905 ]
Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present.
Previously the checking logic looks like:
if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE)
do_some_check;
else if (sk->sk_protocol != IPPROTO_TCP)
break;
After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP
was blocked as the logic changed to:
if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE)
do_some_check;
else if (sk->sk_protocol == IPPROTO_TCP)
do_some_check;
break;
else
break;
Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation")
UDP/UDPLITE were blocked as the logic changed to:
if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE)
do_some_check;
if (sk->sk_protocol == IPPROTO_TCP)
do_some_check;
if (sk->sk_protocol != IPPROTO_TCP)
break;
Fix it by using Eric's code and simply remove the break in TCP check, which
looks like:
if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE)
do_some_check;
else if (sk->sk_protocol == IPPROTO_TCP)
do_some_check;
else
break;
Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e0afbdfd13d1e708fe96e31c46c4897101a6a43 ]
The accept(2) is an "input" socket interface, so we should use
SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.
So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
use the right timeout in the vsock_accept().
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d9a81a225277686eb629938986d97629ea102633 ]
syzbot was able to trigger a crash after using an ISDN socket
and fool l2tp.
Fix this by making sure the UDP socket is of the proper family.
BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78
Write of size 1 at addr ffff88808ed0c590 by task syz-executor.5/3018
CPU: 0 PID: 3018 Comm: syz-executor.5 Not tainted 5.7.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:382
__kasan_report.cold+0x20/0x38 mm/kasan/report.c:511
kasan_report+0x33/0x50 mm/kasan/common.c:625
setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78
l2tp_tunnel_register+0xb15/0xdd0 net/l2tp/l2tp_core.c:1523
l2tp_nl_cmd_tunnel_create+0x4b2/0xa60 net/l2tp/l2tp_netlink.c:249
genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:718 [inline]
genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735
netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:746
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e6/0x810 net/socket.c:2352
___sys_sendmsg+0x100/0x170 net/socket.c:2406
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45ca29
Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007effe76edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004fe1c0 RCX: 000000000045ca29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000094e R14: 00000000004d5d00 R15: 00007effe76ee6d4
Allocated by task 3018:
save_stack+0x1b/0x40 mm/kasan/common.c:49
set_track mm/kasan/common.c:57 [inline]
__kasan_kmalloc mm/kasan/common.c:495 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:468
__do_kmalloc mm/slab.c:3656 [inline]
__kmalloc+0x161/0x7a0 mm/slab.c:3665
kmalloc include/linux/slab.h:560 [inline]
sk_prot_alloc+0x223/0x2f0 net/core/sock.c:1612
sk_alloc+0x36/0x1100 net/core/sock.c:1666
data_sock_create drivers/isdn/mISDN/socket.c:600 [inline]
mISDN_sock_create+0x272/0x400 drivers/isdn/mISDN/socket.c:796
__sock_create+0x3cb/0x730 net/socket.c:1428
sock_create net/socket.c:1479 [inline]
__sys_socket+0xef/0x200 net/socket.c:1521
__do_sys_socket net/socket.c:1530 [inline]
__se_sys_socket net/socket.c:1528 [inline]
__x64_sys_socket+0x6f/0xb0 net/socket.c:1528
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Freed by task 2484:
save_stack+0x1b/0x40 mm/kasan/common.c:49
set_track mm/kasan/common.c:57 [inline]
kasan_set_free_info mm/kasan/common.c:317 [inline]
__kasan_slab_free+0xf7/0x140 mm/kasan/common.c:456
__cache_free mm/slab.c:3426 [inline]
kfree+0x109/0x2b0 mm/slab.c:3757
kvfree+0x42/0x50 mm/util.c:603
__free_fdtable+0x2d/0x70 fs/file.c:31
put_files_struct fs/file.c:420 [inline]
put_files_struct+0x248/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0xb04/0x2dd0 kernel/exit.c:791
do_group_exit+0x125/0x340 kernel/exit.c:894
get_signal+0x47b/0x24e0 kernel/signal.c:2739
do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
entry_SYSCALL_64_after_hwframe+0x49/0xb3
The buggy address belongs to the object at ffff88808ed0c000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1424 bytes inside of
2048-byte region [ffff88808ed0c000, ffff88808ed0c800)
The buggy address belongs to the page:
page:ffffea00023b4300 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002838208 ffffea00015ba288 ffff8880aa000e00
raw: 0000000000000000 ffff88808ed0c000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88808ed0c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88808ed0c500: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88808ed0c580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88808ed0c600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88808ed0c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation")
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Guillaume Nault <gnault@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b49cd71b52403822731dc9f283185d1da355f97 ]
When devinet_sysctl_register() failed, the memory allocated
in neigh_parms_alloc() should be freed.
Fixes: 20e61da7ffcf ("ipv4: fail early when creating netdev named all or default")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46c1e0621a72e0469ec4edfdb6ed4d387ec34f8a upstream.
Clang warns:
net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is
uninitialized when used here [-Wuninitialized]
nf_ct_set(skb, ct, ctinfo);
^~~~~~
net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is
declared here
enum ip_conntrack_info ctinfo;
^
1 warning generated.
nf_conntrack_update was split up into nf_conntrack_update and
__nf_conntrack_update, where the assignment of ctinfo is in
nf_conntrack_update but it is used in __nf_conntrack_update.
Pass the value of ctinfo from nf_conntrack_update to
__nf_conntrack_update so that uninitialized memory is not used
and everything works properly.
Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
Link: https://github.com/ClangBuiltLinux/linux/issues/1039
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b86cb8299765688c5119fd18d5f436716c81010 upstream.
Be there a platform with the following layout:
Regular NIC
|
+----> DSA master for switch port
|
+----> DSA master for another switch port
After changing DSA back to static lockdep class keys in commit
1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:
[ 13.361198] ============================================
[ 13.366524] WARNING: possible recursive locking detected
[ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[ 13.377874] --------------------------------------------
[ 13.383201] swapper/0/0 is trying to acquire lock:
[ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.397879]
[ 13.397879] but task is already holding lock:
[ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.413593]
[ 13.413593] other info that might help us debug this:
[ 13.420140] Possible unsafe locking scenario:
[ 13.420140]
[ 13.426075] CPU0
[ 13.428523] ----
[ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.440924]
[ 13.440924] *** DEADLOCK ***
[ 13.440924]
[ 13.446860] May be due to missing lock nesting notation
[ 13.446860]
[ 13.453668] 6 locks held by swapper/0/0:
[ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.512000]
[ 13.512000] stack backtrace:
[ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[ 13.530421] Call trace:
[ 13.532871] dump_backtrace+0x0/0x1d8
[ 13.536539] show_stack+0x24/0x30
[ 13.539862] dump_stack+0xe8/0x150
[ 13.543271] __lock_acquire+0x1030/0x1678
[ 13.547290] lock_acquire+0xf8/0x458
[ 13.550873] _raw_spin_lock+0x44/0x58
[ 13.554543] __dev_queue_xmit+0x84c/0xbe0
[ 13.558562] dev_queue_xmit+0x24/0x30
[ 13.562232] dsa_slave_xmit+0xe0/0x128
[ 13.565988] dev_hard_start_xmit+0xf4/0x448
[ 13.570182] __dev_queue_xmit+0x808/0xbe0
[ 13.574200] dev_queue_xmit+0x24/0x30
[ 13.577869] neigh_resolve_output+0x15c/0x220
[ 13.582237] ip6_finish_output2+0x244/0xb10
[ 13.586430] __ip6_finish_output+0x1dc/0x298
[ 13.590709] ip6_output+0x84/0x358
[ 13.594116] mld_sendpack+0x2bc/0x560
[ 13.597786] mld_ifc_timer_expire+0x210/0x390
[ 13.602153] call_timer_fn+0xcc/0x400
[ 13.605822] run_timer_softirq+0x588/0x6e0
[ 13.609927] __do_softirq+0x118/0x590
[ 13.613597] irq_exit+0x13c/0x148
[ 13.616918] __handle_domain_irq+0x6c/0xc0
[ 13.621023] gic_handle_irq+0x6c/0x160
[ 13.624779] el1_irq+0xbc/0x180
[ 13.627927] cpuidle_enter_state+0xb4/0x4d0
[ 13.632120] cpuidle_enter+0x3c/0x50
[ 13.635703] call_cpuidle+0x44/0x78
[ 13.639199] do_idle+0x228/0x2c8
[ 13.642433] cpu_startup_entry+0x2c/0x48
[ 13.646363] rest_init+0x1ac/0x280
[ 13.649773] arch_call_rest_init+0x14/0x1c
[ 13.653878] start_kernel+0x490/0x4bc
Lockdep keys themselves were added in commit ab92d68fc22f ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.
>From Taehee's own words:
This patch was considered that all stackable devices have LLTX flag.
But the dsa doesn't have LLTX, so this splat happened.
After this patch, dsa shares the same lockdep class key.
On the nested dsa interface architecture, which you illustrated,
the same lockdep class key will be used in __dev_queue_xmit() because
dsa doesn't have LLTX.
So that lockdep detects deadlock because the same lockdep class key is
used recursively although actually the different locks are used.
There are some ways to fix this problem.
1. using NETIF_F_LLTX flag.
If possible, using the LLTX flag is a very clear way for it.
But I'm so sorry I don't know whether the dsa could have LLTX or not.
2. using dynamic lockdep again.
It means that each interface uses a separate lockdep class key.
So, lockdep will not detect recursive locking.
But this way has a problem that it could consume lockdep class key
too many.
Currently, lockdep can have 8192 lockdep class keys.
- you can see this number with the following command.
cat /proc/lockdep_stats
lock-classes: 1251 [max: 8192]
...
The [max: 8192] means that the maximum number of lockdep class keys.
If too many lockdep class keys are registered, lockdep stops to work.
So, using a dynamic(separated) lockdep class key should be considered
carefully.
In addition, updating lockdep class key routine might have to be existing.
(lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
3. Using lockdep subclass.
A lockdep class key could have 8 subclasses.
The different subclass is considered different locks by lockdep
infrastructure.
But "lock-classes" is not counted by subclasses.
So, it could avoid stopping lockdep infrastructure by an overflow of
lockdep class keys.
This approach should also have an updating lockdep class key routine.
(lockdep_set_subclass())
4. Using nonvalidate lockdep class key.
The lockdep infrastructure supports nonvalidate lockdep class key type.
It means this lockdep is not validated by lockdep infrastructure.
So, the splat will not happen but lockdep couldn't detect real deadlock
case because lockdep really doesn't validate it.
I think this should be used for really special cases.
(lockdep_set_novalidate_class())
Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.
Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fd1c768f3624a5e66766e7b4ddb9b607cd834a5 upstream.
Similar to the last path, need to fix fib_info_nh_uses_dev for
external nexthops to avoid referencing multiple nh_grp structs.
Move the device check in fib_info_nh_uses_dev to a helper and
create a nexthop version that is called if the fib_info uses an
external nexthop.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90f33bffa382598a32cc82abfeb20adc92d041b6 upstream.
We must avoid modifying published nexthop groups while they might be
in use, otherwise we might see NULL ptr dereferences. In order to do
that we allocate 2 nexthoup group structures upon nexthop creation
and swap between them when we have to delete an entry. The reason is
that we can't fail nexthop group removal, so we can't handle allocation
failure thus we move the extra allocation on creation where we can
safely fail and return ENOMEM.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac21753a5c2c9a6a2019997481a2ac12bbde48c8 upstream.
Move nh_grp dereference and check for removing nexthop group due to
all members gone into remove_nh_grp_entry.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b16a87d0aef7a6be766f6618976dc5ff2c689291 upstream.
The npgs member of struct xdp_umem is an u32 entity, and stores the
number of pages the UMEM consumes. The calculation of npgs
npgs = size / PAGE_SIZE
can overflow.
To avoid overflow scenarios, the division is now first stored in a
u64, and the result is verified to fit into 32b.
An alternative would be storing the npgs as a u64, however, this
wastes memory and is an unrealisticly large packet area.
Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt")
Reported-by: "Minh Bùi Quang" <minhquangbui99@gmail.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/CACtPs=GGvV-_Yj6rbpzTVnopgi5nhMoCcTkSkYrJHGQHJWFZMQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20200525080400.13195-1-bjorn.topel@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c96ec56828922e3fe5477f75eb3fc02f98f98b5 upstream.
For transport mode, when ipv6 nexthdr is set, the packet format might
be like:
----------------------------------------------------
| | dest | | | | ESP | ESP |
| IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
----------------------------------------------------
What it wants to get for x-proto in esp6_gso_encap() is the proto that
will be set in ESP nexthdr. So it should skip all ipv6 nexthdrs and
get the real transport protocol. Othersize, the wrong proto number
will be set into ESP nexthdr.
This patch is to skip all ipv6 nexthdrs by calling ipv6_skip_exthdr()
in esp6_gso_encap().
Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c559f15efcc43b996f4da528cd7f9483aaca36d upstream.
Dan Carpenter says: "Smatch complains that the value for "cmd" comes
from the network and can't be trusted."
Add pptp_msg_name() helper function that checks for the array boundary.
Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 703acd70f2496537457186211c2f03e792409e68 upstream.
Restore helper data size initialization and fix memcopy of the helper
data size.
Fixes: 157ffffeb5dc ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests")
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee04805ff54a63ffd90bc6749ebfe73473734ddb upstream.
Florian Westphal says:
"Problem is that after the helper hook was merged back into the confirm
one, the queueing itself occurs from the confirm hook, i.e. we queue
from the last netfilter callback in the hook-list.
Therefore, on return, the packet bypasses the confirm action and the
connection is never committed to the main conntrack table.
To fix this there are several ways:
1. revert the 'Fixes' commit and have a extra helper hook again.
Works, but has the drawback of adding another indirect call for
everyone.
2. Special case this: split the hooks only when userspace helper
gets added, so queueing occurs at a lower priority again,
and normal enqueue reinject would eventually call the last hook.
3. Extend the existing nf_queue ct update hook to allow a forced
confirmation (plus run the seqadj code).
This goes for 3)."
Fixes: 827318feb69cb ("netfilter: conntrack: remove helper hook again")
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a164b95ad6055c50612795882f35e0efda1f1390 upstream.
If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not
update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE
must be set, not unset.
Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9c284ec4b41c827f4369973d2792992849e4fa5 upstream.
Currently, using the bridge reject target with tagged packets
results in untagged packets being sent back.
Fix this by mirroring the vlan id as well.
Fixes: 85f5b3086a04 ("netfilter: bridge: add reject support")
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 976eba8ab596bab94b9714cd46d38d5c6a2c660d upstream.
In Commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in
'IPCOMP' virtual tunnel"), it tries to receive IPIP packets in vti
by calling xfrm_input(). This case happens when a small packet or
frag sent by peer is too small to get compressed.
However, xfrm_input() will still get to the IPCOMP path where skb
sec_path is set, but never dropped while it should have been done
in vti_ipcomp4_protocol.cb_handler(vti_rcv_cb), as it's not an
ipcomp4 packet. This will cause that the packet can never pass
xfrm4_policy_check() in the upper protocol rcv functions.
So this patch is to call ip_tunnel_rcv() to process IPIP packets
instead.
Fixes: dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6a23d85d078c2ffde79c66ca81d0a1dde451649 upstream.
This patch is to fix a crash:
[ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ ] general protection fault: 0000 [#1] SMP KASAN PTI
[ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
[ ] Call Trace:
[ ] xfrm6_local_error+0x1eb/0x300
[ ] xfrm_local_error+0x95/0x130
[ ] __xfrm6_output+0x65f/0xb50
[ ] xfrm6_output+0x106/0x46f
[ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
[ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
[ ] vxlan_xmit+0x6a0/0x4276 [vxlan]
[ ] dev_hard_start_xmit+0x165/0x820
[ ] __dev_queue_xmit+0x1ff0/0x2b90
[ ] ip_finish_output2+0xd3e/0x1480
[ ] ip_do_fragment+0x182d/0x2210
[ ] ip_output+0x1d0/0x510
[ ] ip_send_skb+0x37/0xa0
[ ] raw_sendmsg+0x1b4c/0x2b80
[ ] sock_sendmsg+0xc0/0x110
This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
to get ipv6 info from a ipv4 sk.
This issue was actually fixed by Commit 628e341f319f ("xfrm: make local
error reporting more robust"), but brought back by Commit 844d48746e4b
("xfrm: choose protocol family by skb protocol").
So to fix it, we should call xfrm6_local_error() only when skb->protocol
is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed17b8d377eaf6b4a01d46942b4c647378a79bdd upstream.
This waring can be triggered simply by:
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 1 mark 0 mask 0x10 #[1]
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 2 mark 0 mask 0x1 #[2]
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 2 mark 0 mask 0x10 #[3]
Then dmesg shows:
[ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548
[ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030
[ ] Call Trace:
[ ] xfrm_policy_inexact_insert+0x85/0xe50
[ ] xfrm_policy_insert+0x4ba/0x680
[ ] xfrm_add_policy+0x246/0x4d0
[ ] xfrm_user_rcv_msg+0x331/0x5c0
[ ] netlink_rcv_skb+0x121/0x350
[ ] xfrm_netlink_rcv+0x66/0x80
[ ] netlink_unicast+0x439/0x630
[ ] netlink_sendmsg+0x714/0xbf0
[ ] sock_sendmsg+0xe2/0x110
The issue was introduced by Commit 7cb8a93968e3 ("xfrm: Allow inserting
policies with matching mark and different priorities"). After that, the
policies [1] and [2] would be able to be added with different priorities.
However, policy [3] will actually match both [1] and [2]. Policy [1]
was matched due to the 1st 'return true' in xfrm_policy_mark_match(),
and policy [2] was matched due to the 2nd 'return true' in there. It
caused WARN_ON() in xfrm_policy_insert_list().
This patch is to fix it by only (the same value and priority) as the
same policy in xfrm_policy_mark_match().
Thanks to Yuehaibing, we could make this fix better.
v1->v2:
- check policy->mark.v == pol->mark.v only without mask.
Fixes: 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a204aef9fd77dce1efd9066ca4e44eede99cd858 upstream.
An use-after-free crash can be triggered when sending big packets over
vxlan over esp with esp offload enabled:
[] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
[] Call Trace:
[] dump_stack+0x75/0xa0
[] kasan_report+0x37/0x50
[] ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
[] ipv6_gso_segment+0x2c8/0x13c0
[] skb_mac_gso_segment+0x1cb/0x420
[] skb_udp_tunnel_segment+0x6b5/0x1c90
[] inet_gso_segment+0x440/0x1380
[] skb_mac_gso_segment+0x1cb/0x420
[] esp4_gso_segment+0xae8/0x1709 [esp4_offload]
[] inet_gso_segment+0x440/0x1380
[] skb_mac_gso_segment+0x1cb/0x420
[] __skb_gso_segment+0x2d7/0x5f0
[] validate_xmit_skb+0x527/0xb10
[] __dev_queue_xmit+0x10f8/0x2320 <---
[] ip_finish_output2+0xa2e/0x1b50
[] ip_output+0x1a8/0x2f0
[] xfrm_output_resume+0x110e/0x15f0
[] __xfrm4_output+0xe1/0x1b0
[] xfrm4_output+0xa0/0x200
[] iptunnel_xmit+0x5a7/0x920
[] vxlan_xmit_one+0x1658/0x37a0 [vxlan]
[] vxlan_xmit+0x5e4/0x3ec8 [vxlan]
[] dev_hard_start_xmit+0x125/0x540
[] __dev_queue_xmit+0x17bd/0x2320 <---
[] ip6_finish_output2+0xb20/0x1b80
[] ip6_output+0x1b3/0x390
[] ip6_xmit+0xb82/0x17e0
[] inet6_csk_xmit+0x225/0x3d0
[] __tcp_transmit_skb+0x1763/0x3520
[] tcp_write_xmit+0xd64/0x5fe0
[] __tcp_push_pending_frames+0x8c/0x320
[] tcp_sendmsg_locked+0x2245/0x3500
[] tcp_sendmsg+0x27/0x40
As on the tx path of vxlan over esp, skb->inner_network_header would be
set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can
overwrite the former one. It causes skb_udp_tunnel_segment() to use a
wrong skb->inner_network_header, then the issue occurs.
This patch is to fix it by calling xfrm_output_gso() instead when the
inner_protocol is set, in which gso_segment of inner_protocol will be
done first.
While at it, also improve some code around.
Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db87668ad1e4917cfe04e217307ba6ed9390716e upstream.
This xfrm_state_put call in esp4/6_gro_receive() will cause
double put for state, as in out_reset path secpath_reset()
will put all states set in skb sec_path.
So fix it by simply remove the xfrm_state_put call.
Fixes: 6ed69184ed9c ("xfrm: Reset secpath in xfrm failure")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06a0afcfe2f551ff755849ea2549b0d8409fd9a0 upstream.
For transport mode, when ipv6 nexthdr is set, the packet format might
be like:
----------------------------------------------------
| | dest | | | | ESP | ESP |
| IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
----------------------------------------------------
and in __xfrm_transport_prep():
pskb_pull(skb, skb->mac_len + sizeof(ip6hdr) + x->props.header_len);
it will pull the data pointer to the wrong position, as it missed the
nexthdrs/dest opts.
This patch is to fix it by using:
pskb_pull(skb, skb_transport_offset(skb) + x->props.header_len);
as we can be sure transport_header points to ESP header at that moment.
It also fixes a panic when packets with ipv6 nexthdr are sent over
esp6 transport mode:
[ 100.473845] kernel BUG at net/core/skbuff.c:4325!
[ 100.478517] RIP: 0010:__skb_to_sgvec+0x252/0x260
[ 100.494355] Call Trace:
[ 100.494829] skb_to_sgvec+0x11/0x40
[ 100.495492] esp6_output_tail+0x12e/0x550 [esp6]
[ 100.496358] esp6_xmit+0x1d5/0x260 [esp6_offload]
[ 100.498029] validate_xmit_xfrm+0x22f/0x2e0
[ 100.499604] __dev_queue_xmit+0x589/0x910
[ 100.502928] ip6_finish_output2+0x2a5/0x5a0
[ 100.503718] ip6_output+0x6c/0x120
[ 100.505198] xfrm_output_resume+0x4bf/0x530
[ 100.508683] xfrm6_output+0x3a/0xc0
[ 100.513446] inet6_csk_xmit+0xa1/0xf0
[ 100.517335] tcp_sendmsg+0x27/0x40
[ 100.517977] sock_sendmsg+0x3e/0x60
[ 100.518648] __sys_sendto+0xee/0x160
Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afcaf61be9d1dbdee5ec186d1dcc67b6b692180f upstream.
For beet mode, when it's ipv6 inner address with nexthdrs set,
the packet format might be:
----------------------------------------------------
| outer | | dest | | | ESP | ESP |
| IP hdr | ESP | opts.| TCP | Data | Trailer | ICV |
----------------------------------------------------
The nexthdr from ESP could be NEXTHDR_HOP(0), so it should
continue processing the packet when nexthdr returns 0 in
xfrm_input(). Otherwise, when ipv6 nexthdr is set, the
packet will be dropped.
I don't see any error cases that nexthdr may return 0. So
fix it by removing the check for nexthdr == 0.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bbab5f0301587cad4e923ccc49bb910db86162c upstream.
Removing the "if (IS_ERR(dir)) dir = NULL;" check only works
if we adjust the remaining code to not rely on it being NULL.
Check IS_ERR_OR_NULL() before attempting to dereference it.
I'm not actually entirely sure this fixes the syzbot crash as
the kernel config indicates that they do have DEBUG_FS in the
kernel, but this is what I found when looking there.
Cc: stable@vger.kernel.org
Fixes: d82574a8e5a4 ("cfg80211: no need to check return value of debugfs_create functions")
Reported-by: syzbot+fd5332e429401bf42d18@syzkaller.appspotmail.com
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200525113816.fc4da3ec3d4b.Ica63a110679819eaa9fb3bc1b7437d96b1fd187d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 890bd0f8997ae6ac0a367dd5146154a3963306dd ]
OSD client should ignore cache/overlay flag if got redirect reply.
Otherwise, the client hangs when the cache tier is in forward mode.
[ idryomov: Redirects are effectively deprecated and no longer
used or tested. The original tiering modes based on redirects
are inherently flawed because redirects can race and reorder,
potentially resulting in data corruption. The new proxy and
readproxy tiering modes should be used instead of forward and
readforward. Still marking for stable as obviously correct,
though. ]
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/23296
URL: https://tracker.ceph.com/issues/36406
Signed-off-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 635d9398178659d8ddba79dd061f9451cec0b4d1 upstream.
We cannot free record on any transient error because it leads to
losing previos data. Check socket error to know whether record must
be freed or not.
Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7bff11f6f9afa87c25711db8050c9b5324db0e2 upstream.
bpf_exec_tx_verdict() can return negative value for copied
variable. In that case this value will be pushed back to caller
and the real error code will be lost. Fix it using signed type and
checking for positive value.
Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error")
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3e8e4c11870413789f029a71e72ae6e971fe678 ]
Commit bdf6fa52f01b ("sctp: handle association restarts when the
socket is closed.") starts shutdown when an association is restarted,
if in SHUTDOWN-PENDING state and the socket is closed. However, the
rationale stated in that commit applies also when in SHUTDOWN-SENT
state - we don't want to move an association to ESTABLISHED state when
the socket has been closed, because that results in an association
that is unreachable from user space.
The problem scenario:
1. Client crashes and/or restarts.
2. Server (using one-to-one socket) calls close(). SHUTDOWN is lost.
3. Client reconnects using the same addresses and ports.
4. Server's association is restarted. The association and the socket
move to ESTABLISHED state, even though the server process has
closed its descriptor.
Also, after step 4 when the server process exits, some resources are
leaked in an attempt to release the underlying inet sock structure in
ESTABLISHED state:
IPv4: Attempt to release TCP socket in state 1 00000000377288c7
Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if
an association is restarted in SHUTDOWN-SENT state and the socket is
closed, then start shutdown and don't move the association or the
socket to ESTABLISHED state.
Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>