IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b upstream.
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.
Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[backport note: 4.4 and 4.9 don't have SO_PEERGROUPS, only SO_PEERCRED]
[backport note: got rid of sk_get_peer_cred(), no users in 4.4/4.9]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c5dc070ff3d6246d22ddd931f23a6266249e3db upstream.
Ilja reported that, simply putting it, nothing was validating that
from_addr_param functions were operating on initialized memory. That is,
the parameter itself was being validated by sctp_walk_params, but it
doesn't check for types and their specific sizes and it could be a 0-length
one, causing from_addr_param to potentially work over the next parameter or
even uninitialized memory.
The fix here is to, in all calls to from_addr_param, check if enough space
is there for the wanted IP address type.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e04480920d1eec9c061841399aa6f35b6f987d8b ]
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1].
It seems that history of this locking problem is a trial and error.
Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in
hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to
lock_sock() as an attempt to fix lockdep warning.
Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in
hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the
sleep in atomic context warning.
Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from
hci_sock.c") in 3.3-rc1 removed local_bh_disable().
Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF
of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
lock_sock() as an attempt to fix CVE-2021-3573.
This difficulty comes from current implementation that
hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all
references from sockets because hci_unregister_dev() immediately
reclaims resources as soon as returning from
hci_sock_dev_event(HCI_DEV_UNREG).
But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not
doing what it should do.
Therefore, instead of trying to detach sockets from device, let's accept
not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG),
by moving actual cleanup of resources from hci_unregister_dev() to
hci_cleanup_dev() which is called by bt_host_release() when all
references to this unregistered device (which is a kobject) are gone.
Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets
hci_pi(sk)->hdev, we need to check whether this device was unregistered
and return an error based on HCI_UNREGISTER flag. There might be subtle
behavioral difference in "monitor the hdev" functionality; please report
if you found something went wrong due to this patch.
Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object")
Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7c9d2102c9c098916ab9e0ab248006107d00d6c ]
Syzbot reported skb_over_panic() in llc_pdu_init_as_xid_cmd(). The
problem was in wrong LCC header manipulations.
Syzbot's reproducer tries to send XID packet. llc_ui_sendmsg() is
doing following steps:
1. skb allocation with size = len + header size
len is passed from userpace and header size
is 3 since addr->sllc_xid is set.
2. skb_reserve() for header_len = 3
3. filling all other space with memcpy_from_msg()
Ok, at this moment we have fully loaded skb, only headers needs to be
filled.
Then code comes to llc_sap_action_send_xid_c(). This function pushes 3
bytes for LLC PDU header and initializes it. Then comes
llc_pdu_init_as_xid_cmd(). It initalizes next 3 bytes *AFTER* LLC PDU
header and call skb_push(skb, 3). This looks wrong for 2 reasons:
1. Bytes rigth after LLC header are user data, so this function
was overwriting payload.
2. skb_push(skb, 3) call can cause skb_over_panic() since
all free space was filled in llc_ui_sendmsg(). (This can
happen is user passed 686 len: 686 + 14 (eth header) + 3 (LLC
header) = 703. SKB_DATA_ALIGN(703) = 704)
So, in this patch I added 2 new private constansts: LLC_PDU_TYPE_U_XID
and LLC_PDU_LEN_U_XID. LLC_PDU_LEN_U_XID is used to correctly reserve
header size to handle LLC + XID case. LLC_PDU_TYPE_U_XID is used by
llc_pdu_header_init() function to push 6 bytes instead of 3. And finally
I removed skb_push() call from llc_pdu_init_as_xid_cmd().
This changes should not affect other parts of LLC, since after
all steps we just transmit buffer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+5e5a981ad7cc54c4b2b4@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d11fa231cabeae09a95cb3e4cf1d9dd34e00f08 ]
The doc draft-stewart-tsvwg-sctp-ipv4-00 that restricts 198 addresses
was never published. These addresses as private addresses should be
allowed to use in SCTP.
As Michael Tuexen suggested, this patch is to move 198 addresses from
unusable to private scope.
Reported-by: Sérgio <surkamp@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f4e65870e5cede5ca1ec0006b6c9803994e5f7b8 upstream.
We need this functionality for the io_uring file registration, but
we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers
to a separate file, that's always builtin to the kernel if CONFIG_UNIX is
m/y.
No functional changes in this patch, just moving code around.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ backported to older kernels to get access to unix_gc_lock - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67a9c94317402b826fc3db32afc8f39336803d97 upstream.
skb_tunnel_info() returns pointer of lwtstate->data as ip_tunnel_info
type without validation. lwtstate->data can have various types such as
mpls_iptunnel_encap, etc and these are not compatible.
So skb_tunnel_info() should validate before returning that pointer.
Splat looks like:
BUG: KASAN: slab-out-of-bounds in vxlan_get_route+0x418/0x4b0 [vxlan]
Read of size 2 at addr ffff888106ec2698 by task ping/811
CPU: 1 PID: 811 Comm: ping Not tainted 5.13.0+ #1195
Call Trace:
dump_stack_lvl+0x56/0x7b
print_address_description.constprop.8.cold.13+0x13/0x2ee
? vxlan_get_route+0x418/0x4b0 [vxlan]
? vxlan_get_route+0x418/0x4b0 [vxlan]
kasan_report.cold.14+0x83/0xdf
? vxlan_get_route+0x418/0x4b0 [vxlan]
vxlan_get_route+0x418/0x4b0 [vxlan]
[ ... ]
vxlan_xmit_one+0x148b/0x32b0 [vxlan]
[ ... ]
vxlan_xmit+0x25c5/0x4780 [vxlan]
[ ... ]
dev_hard_start_xmit+0x1ae/0x6e0
__dev_queue_xmit+0x1f39/0x31a0
[ ... ]
neigh_xmit+0x2f9/0x940
mpls_xmit+0x911/0x1600 [mpls_iptunnel]
lwtunnel_xmit+0x18f/0x450
ip_finish_output2+0x867/0x2040
[ ... ]
Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40fc3054b45820c28ea3c65e2c86d041dc244a8a upstream.
Commit 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE") introduced
ip6_skb_dst_mtu with return value of signed int which is inconsistent
with actually returned values. Also 2 users of this function actually
assign its value to unsigned int variable and only __xfrm6_output
assigns result of this function to signed variable but actually uses
as unsigned in further comparisons and calls. Change this function
to return unsigned int value.
Fixes: 628a5c561890 ("[INET]: Add IP(V6)_PMTUDISC_RPOBE")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b71eaed8c04f72a919a9c44e83e4ee254e69e7f3 ]
UDP sendmsg() path can be lockless, it is possible for another
thread to re-connect an change sk->sk_txhash under us.
There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE()
pair to document the race.
BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w
write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1:
sk_set_txhash include/net/sock.h:1937 [inline]
__ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75
__ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189
ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580
__sys_connect_file net/socket.c:1837 [inline]
__sys_connect+0x245/0x280 net/socket.c:1854
__do_sys_connect net/socket.c:1864 [inline]
__se_sys_connect net/socket.c:1861 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1861
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0:
skb_set_hash_from_sk include/net/sock.h:2211 [inline]
skb_set_owner_w+0x118/0x220 net/core/sock.c:2101
sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359
sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373
__ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621
ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983
udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xbca3c43d -> 0xfdb309e0
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a2805dca5107d5603f4bbc027e81e20d93476e96 upstream.
caif_enroll_dev() can fail in some cases. Ingnoring
these cases can lead to memory leak due to not assigning
link_support pointer to anywhere.
Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1d5ff5651ea592c67054233b14b30bf4452999c upstream.
Properly parse A-MSDUs whose first 6 bytes happen to equal a rfc1042
header. This can occur in practice when the destination MAC address
equals AA:AA:03:00:00:00. More importantly, this simplifies the next
patch to mitigate A-MSDU injection attacks.
Cc: stable@vger.kernel.org
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20210511200110.0b2b886492f0.I23dd5d685fe16d3b0ec8106e8f01b59f499dffed@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c4c8c9544099bb9043a10a5318130a943e32fc3 upstream.
hci_chan can be created in 2 places: hci_loglink_complete_evt() if
it is an AMP hci_chan, or l2cap_conn_add() otherwise. In theory,
Only AMP hci_chan should be removed by a call to
hci_disconn_loglink_complete_evt(). However, the controller might mess
up, call that function, and destroy an hci_chan which is not initiated
by hci_loglink_complete_evt().
This patch adds a verification that the destroyed hci_chan must have
been init'd by hci_loglink_complete_evt().
Example crash call trace:
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe3/0x144 lib/dump_stack.c:118
print_address_description+0x67/0x22a mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report mm/kasan/report.c:412 [inline]
kasan_report+0x251/0x28f mm/kasan/report.c:396
hci_send_acl+0x3b/0x56e net/bluetooth/hci_core.c:4072
l2cap_send_cmd+0x5af/0x5c2 net/bluetooth/l2cap_core.c:877
l2cap_send_move_chan_cfm_icid+0x8e/0xb1 net/bluetooth/l2cap_core.c:4661
l2cap_move_fail net/bluetooth/l2cap_core.c:5146 [inline]
l2cap_move_channel_rsp net/bluetooth/l2cap_core.c:5185 [inline]
l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5464 [inline]
l2cap_sig_channel net/bluetooth/l2cap_core.c:5799 [inline]
l2cap_recv_frame+0x1d12/0x51aa net/bluetooth/l2cap_core.c:7023
l2cap_recv_acldata+0x2ea/0x693 net/bluetooth/l2cap_core.c:7596
hci_acldata_packet net/bluetooth/hci_core.c:4606 [inline]
hci_rx_work+0x2bd/0x45e net/bluetooth/hci_core.c:4796
process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175
worker_thread+0x4fc/0x670 kernel/workqueue.c:2321
kthread+0x2f0/0x304 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
Allocated by task 38:
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0x8d/0x9a mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x102/0x129 mm/slub.c:2787
kmalloc include/linux/slab.h:515 [inline]
kzalloc include/linux/slab.h:709 [inline]
hci_chan_create+0x86/0x26d net/bluetooth/hci_conn.c:1674
l2cap_conn_add.part.0+0x1c/0x814 net/bluetooth/l2cap_core.c:7062
l2cap_conn_add net/bluetooth/l2cap_core.c:7059 [inline]
l2cap_connect_cfm+0x134/0x852 net/bluetooth/l2cap_core.c:7381
hci_connect_cfm+0x9d/0x122 include/net/bluetooth/hci_core.h:1404
hci_remote_ext_features_evt net/bluetooth/hci_event.c:4161 [inline]
hci_event_packet+0x463f/0x72fa net/bluetooth/hci_event.c:5981
hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791
process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175
worker_thread+0x4fc/0x670 kernel/workqueue.c:2321
kthread+0x2f0/0x304 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
Freed by task 1732:
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free mm/kasan/kasan.c:521 [inline]
__kasan_slab_free+0x106/0x128 mm/kasan/kasan.c:493
slab_free_hook mm/slub.c:1409 [inline]
slab_free_freelist_hook+0xaa/0xf6 mm/slub.c:1436
slab_free mm/slub.c:3009 [inline]
kfree+0x182/0x21e mm/slub.c:3972
hci_disconn_loglink_complete_evt net/bluetooth/hci_event.c:4891 [inline]
hci_event_packet+0x6a1c/0x72fa net/bluetooth/hci_event.c:6050
hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791
process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175
worker_thread+0x4fc/0x670 kernel/workqueue.c:2321
kthread+0x2f0/0x304 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
The buggy address belongs to the object at ffff8881d7af9180
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 24 bytes inside of
128-byte region [ffff8881d7af9180, ffff8881d7af9200)
The buggy address belongs to the page:
page:ffffea00075ebe40 count:1 mapcount:0 mapping:ffff8881da403200 index:0x0
flags: 0x8000000000000200(slab)
raw: 8000000000000200 dead000000000100 dead000000000200 ffff8881da403200
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881d7af9080: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8881d7af9100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8881d7af9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8881d7af9200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881d7af9280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reported-by: syzbot+98228e7407314d2d4ba2@syzkaller.appspotmail.com
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream.
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.
CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
ip netns add foo
ip link set can0 netns foo
ip netns delete foo
WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.
The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.
Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee576c47db60432c37e54b1e2b43a8ca6d3a8dca upstream.
The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
it with IPCB or IP6CB, assuming the skb to have come directly from the
inet layer. But when the packet comes from the ndo layer, especially
when forwarded, there's no telling what might be in skb->cb at that
point. As a result, the icmp sending code risks reading bogus memory
contents, which can result in nasty stack overflows such as this one
reported by a user:
panic+0x108/0x2ea
__stack_chk_fail+0x14/0x20
__icmp_send+0x5bd/0x5c0
icmp_ndo_send+0x148/0x160
In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
from it. The optlen parameter there is of particular note, as it can
induce writes beyond bounds. There are quite a few ways that can happen
in __ip_options_echo. For example:
// sptr/skb are attacker-controlled skb bytes
sptr = skb_network_header(skb);
// dptr/dopt points to stack memory allocated by __icmp_send
dptr = dopt->__data;
// sopt is the corrupt skb->cb in question
if (sopt->rr) {
optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
// this now writes potentially attacker-controlled data, over
// flowing the stack:
memcpy(dptr, sptr+sopt->rr, optlen);
}
In the icmpv6_send case, the story is similar, but not as dire, as only
IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
worse than the iif case, but it is passed to ipv6_find_tlv, which does
a bit of bounds checking on the value.
This is easy to simulate by doing a `memset(skb->cb, 0x41,
sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
good fortune and the rarity of icmp sending from that context that we've
avoided reports like this until now. For example, in KASAN:
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
Write of size 38 at addr ffff888006f1f80e by task ping/89
CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
Call Trace:
dump_stack+0x9a/0xcc
print_address_description.constprop.0+0x1a/0x160
__kasan_report.cold+0x20/0x38
kasan_report+0x32/0x40
check_memory_region+0x145/0x1a0
memcpy+0x39/0x60
__ip_options_echo+0xa0e/0x12b0
__icmp_send+0x744/0x1700
Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
the v4 case, while the rest did not. So this commit actually removes the
gtp-specific zeroing, while putting the code where it belongs in the
shared infrastructure of icmp{,v6}_ndo_send.
This commit fixes the issue by passing an empty IPCB or IP6CB along to
the functions that actually do the work. For the icmp_send, this was
already trivial, thanks to __icmp_send providing the plumbing function.
For icmpv6_send, this required a tiny bit of refactoring to make it
behave like the v4 case, after which it was straight forward.
Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
Reported-by: SinYu <liuxyon@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b41713b606694257b90d61ba7e2712d8457648b upstream.
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd1248f1ddbc48b0c30565fce897a3b6423313b8 ]
Check Scell_log shift size in red_check_params() and modify all callers
of red_check_params() to pass Scell_log.
This prevents a shift out-of-bounds as detected by UBSAN:
UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
shift exponent 72 is too large for 32-bit type 'int'
Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
Cc: Nogah Frankel <nogahf@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 469aceddfa3ed16e17ee30533fae45e90f62efd8 ]
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.
Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.
v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)
Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02a1b175b0e92d9e0fa5df3957ade8d733ceb6a0 ]
Documentation/networking/ip-sysctl.txt:46 says:
ip_forward_use_pmtu - BOOLEAN
By default we don't trust protocol path MTUs while forwarding
because they could be easily forged and can lead to unwanted
fragmentation by the router.
You only need to enable this if you have user-space software
which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the case.
Default: 0 (disabled)
Possible values:
0 - disabled
1 - enabled
Which makes it pretty clear that setting it to 1 is a potential
security/safety/DoS issue, and yet it is entirely reasonable to want
forwarded traffic to honour explicitly administrator configured
route mtus (instead of defaulting to device mtu).
Indeed, I can't think of a single reason why you wouldn't want to.
Since you configured a route mtu you probably know better...
It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.
Note that ipv6 forwarding uses device mtu unless the route is locked
(in which case it will use the route mtu).
This approach is not usable for IPv4 where an 'mtu lock' on a route
also has the side effect of disabling TCP path mtu discovery via
disabling the IPv4 DF (don't frag) bit on all outgoing frames.
I'm not aware of a way to lock a route from an IPv6 RA, so that also
potentially seems wrong.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
Cc: Tyler Wear <twear@quicinc.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 339ddaa626995bc6218972ca241471f3717cc5f4 upstream.
Starting with the upgrade to v5.8-rc3, I've noticed I wasn't able to
connect to my Bluetooth headset properly anymore. While connecting to
the device would eventually succeed, bluetoothd seemed to be confused
about the current connection state where the state was flapping hence
and forth. Bisecting this issue led to commit 3ca44c16b0dc (Bluetooth:
Consolidate encryption handling in hci_encrypt_cfm, 2020-05-19), which
refactored `hci_encrypt_cfm` to also handle updating the connection
state.
The commit in question changed the code to call `hci_connect_cfm` inside
`hci_encrypt_cfm` and to change the connection state. But with the
conversion, we now only update the connection state if a status was set
already. In fact, the reverse should be true: the status should be
updated if no status is yet set. So let's fix the isuse by reversing the
condition.
Fixes: 3ca44c16b0dc ("Bluetooth: Consolidate encryption handling in hci_encrypt_cfm")
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ca44c16b0dcc764b641ee4ac226909f5c421aa3 upstream.
This makes hci_encrypt_cfm calls hci_connect_cfm in case the connection
state is BT_CONFIG so callers don't have to check the state.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Hans-Christian Noren Egtvedt <hegtvedt@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f19425641cb2572a33cb074d5e30283720bd4d22 upstream.
Only sockets will have the chan->data set to an actual sk, channels
like A2MP would have its own data which would likely cause a crash when
calling sk_filter, in order to fix this a new callback has been
introduced so channels can implement their own filtering if necessary.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 91a46c6d1b4fcbfa4773df9421b8ad3e58088101 ]
XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new.
Migrate this attribute during XFRMA_MSG_MIGRATE
v1->v2:
- move curleft cloning to a separate patch
Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 62ffc589abb176821662efc4525ee4ac0b9c3894 upstream.
Refactor the fastreuse update code in inet_csk_get_port into a small
helper function that can be called from other places.
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e105e6afa6c3d32bfb52c00ffa393894a525c27 ]
Following bug was reported via irc:
nft list ruleset
set knock_candidates_ipv4 {
type ipv4_addr . inet_service
size 65535
elements = { 127.0.0.1 . 123,
127.0.0.1 . 123 }
}
..
udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }
It should not have been possible to add a duplicate set entry.
After some debugging it turned out that the problem is the immediate
value (123) in the second-to-last rule.
Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
not 6 and it turns out the kernel inserted
inet firewall @knock_candidates_ipv4
element 0100007f ffff7b00 : 0 [end]
element 0100007f 00007b00 : 0 [end]
Note the non-zero upper bits of the first element. It turns out that
nft_immediate doesn't zero the destination register, but this is needed
when the length isn't a multiple of 4.
Furthermore, the zeroing in nft_payload is broken. We can't use
[len / 4] = 0 -- if len is a multiple of 4, index is off by one.
Skip zeroing in this case and use a conditional instead of (len -1) / 4.
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d9539752d23283db4692384a634034f451261e29 upstream.
Add missed sock updates to compat path via a new helper, which will be
used more in coming patches. (The net/core/scm.c code is left as-is here
to assist with -stable backports for the compat path.)
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c0de6e96c9794cb523a516c465991a70245da1c ]
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
to IPv4, particularly struct ipv6_ac_socklist. Similar to
struct ipv6_mc_socklist, we should just close it on this path.
This bug can be easily reproduced with the following C program:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
int main()
{
int s, value;
struct sockaddr_in6 addr;
struct ipv6_mreq m6;
s = socket(AF_INET6, SOCK_DGRAM, 0);
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(5000);
inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
connect(s, (struct sockaddr *)&addr, sizeof(addr));
inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
m6.ipv6mr_interface = 5;
setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));
value = AF_INET;
setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));
close(s);
return 0;
}
Reported-by: ch3332xr@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e82a62fec613844da9e558f3493540a5b7a7b67 ]
A potential deadlock can occur during registering or unregistering a
new generic netlink family between the main nl_table_lock and the
cb_lock where each thread wants the lock held by the other, as
demonstrated below.
1) Thread 1 is performing a netlink_bind() operation on a socket. As part
of this call, it will call netlink_lock_table(), incrementing the
nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
genl_(un)register_family() API. The cb_lock semaphore will be taken for
writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
subscribing to GENL multicast groups at the request of the user. It will
attempt to take the cb_lock semaphore for reading, but it will fail and
be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
call. However, as Thread 1 has incremented nl_table_users, it will not
be able to proceed, and both threads will be stuck waiting for the
other.
genl_bind() is a noop, unless a genl_family implements the mcast_bind()
function to handle setting up family-specific multicast operations. Since
no one in-tree uses this functionality as Cong pointed out, simply removing
the genl_bind() function will remove the possibility for deadlock, as there
is no attempt by Thread 1 above to take the cb_lock semaphore.
Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]
Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.
This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.
Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.
Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 471e39df96b9a4c4ba88a2da9e25a126624d7a9c ]
If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.
The fix is to not add IPv4 addresses to ipv6only sockets.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b15e62631c5f19fea9895f7632dae9c1b27fe0cd ]
When a new action is installed, firstuse field of 'tcf_t' is explicitly set
to 0. Value of zero means "new action, not yet used"; as a packet hits the
action, 'firstuse' is stamped with the current jiffies value.
tcf_tm_dump() should return 0 for firstuse if action has not yet been hit.
Fixes: 48d8ee1694dd ("net sched actions: aggregate dumping of actions timeinfo")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c407aca64977ede9b9f35158e919773cae2082f ]
gcc-10 warns around a suspicious access to an empty struct member:
net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
1522 | memset(&ct->__nfct_init_offset[0], 0,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from net/netfilter/nf_conntrack_core.c:37:
include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
90 | u8 __nfct_init_offset[0];
| ^~~~~~~~~~~~~~~~~~
The code is correct but a bit unusual. Rework it slightly in a way that
does not trigger the warning, using an empty struct instead of an empty
array. There are probably more elegant ways to do this, but this is the
smallest change.
Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream.
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.
All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().
This requires some changes in all the callers, as these two functions
take different arguments and have different return types.
Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.9:
- Drop changes in lwt_bpf.c and mlx5
- Initialise "dst" in drivers/infiniband/core/addr.c:addr_resolve()
to avoid introducing a spurious "may be used uninitialised" warning
- Adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e upstream.
This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
as some modules currently pass a net argument without a socket to
ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
ipv6_stub_impl.ipv6_dst_lookup to take net argument").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bacd256f1354883d3c1402655153367982bba49 ]
TCP stack is dumb in how it cooks its output packets.
Depending on MAX_HEADER value, we might chose a bad ending point
for the headers.
If we align the end of TCP headers to cache line boundary, we
make sure to always use the smallest number of cache lines,
which always help.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 03e2a984b6165621f287fadf5f4b5cd8b58dcaba ]
The behaviour for what is considered an anycast address changed in
commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after
encountering pmtu exception"). This now considers the first
address in a subnet where there is a route via a gateway
to be an anycast address.
This breaks path MTU discovery and traceroutes when a host in a
remote network uses the address at the start of a prefix
(eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
will not be sent to anycast addresses.
This patch excludes any routes with a gateway, or via point to
point links, like the behaviour previously from
rt6_is_gw_or_nonexthop in net/ipv6/route.c.
This can be tested with:
ip link add v1 type veth peer name v2
ip netns add test
ip netns exec test ip link set lo up
ip link set v2 netns test
ip link set v1 up
ip netns exec test ip link set v2 up
ip addr add 2001:db8::1/64 dev v1 nodad
ip addr add 2001:db8:100:: dev lo nodad
ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
ip netns exec test ip route add unreachable 2001:db8:1::1
ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::1 via 2001:db8::2
ping -I 2001:db8::1 2001:db8:1::1 -c1
ping -I 2001:db8:100:: 2001:db8:1::1 -c1
ip addr delete 2001:db8:100:: dev lo
ip netns delete test
Currently the first ping will get back a destination unreachable ICMP
error, but the second will never get a response, with "icmp6_send:
acast source" logged. After this patch, both get destination
unreachable ICMP replies.
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Tim Stallard <code@timstallard.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c16d64ea04056f1b1b324ab6916019f6a064114 ]
Add missing netlink policy entry for FRA_TUN_ID.
Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]
tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.
Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.
Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30ca1aa536211f5ac3de0173513a7a99a98a97f3 upstream.
Make ieee80211_send_layer2_update() a common function so other drivers
can re-use it.
Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 4.9 as dependency of commit 3e493173b784
"mac80211: Do not send Layer 2 Update frame before authorization":
- Retain type-casting of skb_put() return value
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 upstream.
Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().
Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.
They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.
Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[stable-4.9: we also need to update code in __inet_lookup_listener() and
inet6_lookup_listener() which has been removed in 5.0-rc1.]
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 258a980d1ec23e2c786e9536a7dd260bea74bae6 ]
When storing a pointer to a dst_metrics structure in dst_entry._metrics,
two flags are added in the least significant bits of the pointer value.
Hence this assumes all pointers to dst_metrics structures have at least
4-byte alignment.
However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not
4 bytes. Hence in some kernel builds, dst_default_metrics may be only
2-byte aligned, leading to obscure boot warnings like:
WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261
Stack from 10835e6c:
10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea
00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000
04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c
00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001
003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84
003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a
Call Trace: [<00023fa6>] __warn+0xb2/0xb4
[<00023fea>] warn_slowpath_fmt+0x42/0x64
[<001a70f8>] refcount_warn_saturate+0x44/0x9a
[<00043aa8>] printk+0x0/0x18
[<001a70f8>] refcount_warn_saturate+0x44/0x9a
[<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e
[<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e
[<00025e84>] __local_bh_enable_ip+0x0/0x8e
[<002416a8>] dst_destroy+0x40/0xae
Fix this by forcing 4-byte alignment of all dst_metrics structures.
Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.
Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.
That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.
Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().
However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.
In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.
Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>