63493 Commits

Author SHA1 Message Date
David S. Miller
b7e4ba9a91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
   from Subash Abhinov Kasiviswanathan.

2) Fix netlink dump of dynset timeouts later than 23 days.

3) Add comment for the indirect serialization of the nft commit mutex
   with rtnl_mutex.

4) Remove bogus check for confirmed conntrack when matching on the
   conntrack ID, from Brett Mastbergen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09 18:55:46 -08:00
Zheng Yongjun
a319aedde4 net: rxrpc: convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09 16:23:07 -08:00
Neal Cardwell
299bcb55ec tcp: fix cwnd-limited bug for TSO deferral where we send nothing
When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:

(1) ACK for full-sized skb of N*MSS arrives
  -> tcp_write_xmit() transmit full-sized skb with N*MSS
  -> move pacing release time forward
  -> exit tcp_write_xmit() because pacing time is in the future

(2) TSQ callback or TCP internal pacing timer fires
  -> try to transmit next skb, but TSO deferral finds remainder of
     available cwnd is not big enough to trigger an immediate send
     now, so we defer sending until the next ACK.

(3) repeat...

So we can get into a case where we never mark ourselves as
cwnd-limited for many seconds at a time, even with
bulk/infinite-backlog senders, because:

o In case (1) above, every time in tcp_write_xmit() we have enough
cwnd to send a full-sized skb, we are not fully using the cwnd
(because cwnd is not a multiple of the TSO skb size). So every time we
send data, we are not cwnd limited, and so in the cwnd-limited
tracking code in tcp_cwnd_validate() we mark ourselves as not
cwnd-limited.

o In case (2) above, every time in tcp_write_xmit() that we try to
transmit the "remainder" of the cwnd but defer, we set the local
variable is_cwnd_limited to true, but we do not send any packets, so
sent_pkts is zero, so we don't call the cwnd-limited logic to update
tp->is_cwnd_limited.

Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Reported-by: Ingemar Johansson <ingemar.s.johansson@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09 16:15:54 -08:00
Chris Mi
5137d30365 net: flow_offload: Fix memory leak for indirect flow block
The offending commit introduces a cleanup callback that is invoked
when the driver module is removed to clean up the tunnel device
flow block. But it returns on the first iteration of the for loop.
The remaining indirect flow blocks will never be freed.

Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2020-12-09 16:08:33 -08:00
Wei Wang
8ef44b6fe4 tcp: Retain ECT bits for tos reflection
For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Reported-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09 16:08:23 -08:00
Michal Kubecek
a770bf5156 ethtool: fix stack overflow in ethnl_parse_bitset()
Syzbot reported a stack overflow in bitmap_from_arr32() called from
ethnl_parse_bitset() when bitset from netlink message is longer than
target bitmap length. While ethnl_compact_sanity_checks() makes sure that
trailing part is all zeros (i.e. the request does not try to touch bits
kernel does not recognize), we also need to cap change_bits to nbits so
that we don't try to write past the prepared bitmaps.

Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper")
Reported-by: syzbot+9d39fa49d4df294aab93@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09 15:50:38 -08:00
Pablo Neira Ayuso
102e2c0723 net: sched: incorrect Kconfig dependencies on Netfilter modules
- NET_ACT_CONNMARK and NET_ACT_CTINFO only require conntrack support.
- NET_ACT_IPT only requires NETFILTER_XTABLES symbols, not
  IP_NF_IPTABLES. After this patch, NET_ACT_IPT becomes consistent
  with NET_EMATCH_IPT. NET_ACT_IPT dependency on IP_NF_IPTABLES predates
  Linux-2.6.12-rc2 (initial git repository build).

Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action")
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20201208204707.11268-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09 15:49:29 -08:00
Oliver Hartkopp
323a391a22 can: isotp: isotp_setsockopt(): block setsockopt on bound sockets
The isotp socket can be widely configured in its behaviour regarding addressing
types, fill-ups, receive pattern tests and link layer length. Usually all
these settings need to be fixed before bind() and can not be changed
afterwards.

This patch adds a check to enforce the common usage pattern.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Thomas Wagner <thwa1@web.de>
Link: https://lore.kernel.org/r/20201203140604.25488-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20201204133508.742120-3-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09 08:44:15 -08:00
Toke Høiland-Jørgensen
998f172962 xdp: Remove the xdp_attachment_flags_ok() callback
Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
2020-12-09 16:27:42 +01:00
Chuck Lever
5e54dafbe0 SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
There's no need to defer allocation of pages for the receive buffer.

- This upcall is quite infrequent
- gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL,
  unlike the transport
- gssp_alloc_receive_pages() knows exactly how many pages are needed

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
2020-12-09 09:38:34 -05:00
Roberto Bergantinos Corpas
4b5cff7ed8 sunrpc: clean-up cache downcall
We can simplify code around cache_downcall unifying memory
allocations using kvmalloc. This has the benefit of getting rid of
cache_slow_downcall (and queue_io_mutex), and also matches userland
allocation size and limits.

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:38:34 -05:00
Brett Mastbergen
2d94b20b95 netfilter: nft_ct: Remove confirmation check for NFT_CT_ID
Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id
hash calculation") the ct id will not change from initialization to
confirmation.  Removing the confirmation check allows for things like
adding an element to a 'typeof ct id' set in prerouting upon reception
of the first packet of a new connection, and then being able to
reference that set consistently both before and after the connection
is confirmed.

Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation")
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-09 10:31:58 +01:00
Florent Revest
b60da4955f bpf: Only provide bpf_sock_from_file with CONFIG_NET
This moves the bpf_sock_from_file definition into net/core/filter.c
which only gets compiled with CONFIG_NET and also moves the helper proto
usage next to other tracing helpers that are conditional on CONFIG_NET.

This avoids
  ld: kernel/trace/bpf_trace.o: in function `bpf_sock_from_file':
  bpf_trace.c:(.text+0xe23): undefined reference to `sock_from_file'
When compiling a kernel with BPF and without NET.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20201208173623.1136863-1-revest@chromium.org
2020-12-08 18:23:36 -08:00
Eric Dumazet
72d05c00d7 tcp: select sane initial rcvq_space.space for big MSS
Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.

This is no longer the case, and Hazem Mohamed Abuelfotoh reported
that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.

Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
of rcvq_space.space computation, while it can select later a smaller
value for tp->rcv_ssthresh and tp->window_clamp.

ss -temoi on receiver would show :

skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596

This means that TCP can not increase its window in tcp_grow_window(),
and that DRS can never kick.

Fix this by making sure that rcvq_space.space is not bigger than number of bytes
that can be held in TCP receive queue.

People unable/unwilling to change their kernel can work around this issue by
selecting a bigger tcp_rmem[1] value as in :

echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem

Based on an initial report and patch from Hazem Mohamed Abuelfotoh
 https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/

Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:27:48 -08:00
Zheng Yongjun
5e359044c1 net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:22:54 -08:00
Zheng Yongjun
8daa76a52d net: core: devlink: simplify the return expression of devlink_nl_cmd_trap_set_doit()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:22:54 -08:00
Zheng Yongjun
9faad250ce net: ipv6: rpl_iptunnel: simplify the return expression of rpl_do_srh()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:22:54 -08:00
Zheng Yongjun
57b0637d00 net/sched: cls_u32: simplify the return expression of u32_reoffload_knode()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:22:53 -08:00
Colin Ian King
8354bcbebd net: sched: fix spelling mistake in Kconfig "trys" -> "tries"
There is a spelling mistake in the Kconfig help text. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:01:56 -08:00
David S. Miller
e1be4b5990 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2020-12-07

Here's the main bluetooth-next pull request for the 5.11 kernel.

 - Updated Bluetooth entries in MAINTAINERS to include Luiz von Dentz
 - Added support for Realtek 8822CE and 8852A devices
 - Added support for MediaTek MT7615E device
 - Improved workarounds for fake CSR devices
 - Fix Bluetooth qualification test case L2CAP/COS/CFD/BV-14-C
 - Fixes for LL Privacy support
 - Enforce 16 byte encryption key size for FIPS security level
 - Added new mgmt commands for extended advertising support
 - Multiple other smaller fixes & improvements

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 15:58:49 -08:00
Julian Wiedmann
97f8841e04 net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr
This gets us compile-time size checking.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 15:56:53 -08:00
Cengiz Can
0398ba9e5a net: tipc: prevent possible null deref of link
`tipc_node_apply_property` does a null check on a `tipc_link_entry`
pointer but also accesses the same pointer out of the null check block.

This triggers a warning on Coverity Static Analyzer because we're
implying that `e->link` can BE null.

Move "Update MTU for node link entry" line into if block to make sure
that we're not in a state that `e->link` is null.

Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 15:53:41 -08:00
Pablo Neira Ayuso
42f1c27120 netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex
Add an explicit comment in the code to describe the indirect
serialization of the holders of the commit_mutex with the rtnl_mutex.
Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
netdevice from preparation phase") already describes this, but a comment
in this case is better for reference.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08 21:53:48 +01:00
Pablo Neira Ayuso
917d80d376 netfilter: nft_dynset: fix timeouts later than 23 days
Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
days"), otherwise ruleset listing breaks.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08 20:42:11 +01:00
Rasmus Villemoes
bdc40a3f4b net: dsa: print the MTU value that could not be set
These warnings become somewhat more informative when they include the
MTU value that could not be set and not just the errno.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-08 11:24:07 -08:00
Björn Töpel
3546b9b8ec xsk: Validate socket state in xsk_recvmsg, prior touching socket members
In AF_XDP the socket state needs to be checked, prior touching the
members of the socket. This was not the case for the recvmsg
implementation. Fix that by moving the xsk_is_bound() call.

Fixes: 45a86681844e ("xsk: Add support for recvmsg()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20201207082008.132263-1-bjorn.topel@gmail.com
2020-12-08 17:11:58 +01:00
Subash Abhinov Kasiviswanathan
cc00bcaa58 netfilter: x_tables: Switch synchronization to RCU
When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.

The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.

However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-

Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c

This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.

Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti <stranche@codeaurora.org>
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08 12:57:39 +01:00
Jakub Kicinski
819f56bad1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2020-12-07

1) Sysbot reported fixes for the new 64/32 bit compat layer.
   From Dmitry Safonov.

2) Fix a memory leak in xfrm_user_policy that was introduced
   by adding the 64/32 bit compat layer. From Yu Kuai.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  net: xfrm: fix memory leak in xfrm_user_policy()
  xfrm/compat: Don't allocate memory with __GFP_ZERO
  xfrm/compat: memset(0) 64-bit padding at right place
  xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
====================

Link: https://lore.kernel.org/r/20201207093937.2874932-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07 18:29:54 -08:00
Jianguo Wu
f55628b3e7 mptcp: print new line in mptcp_seq_show() if mptcp isn't in use
When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this:
[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]#

This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL,
so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'.

After this patch:

[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0
[root@localhost ~]#

Fixes: fc518953bc9c8d7d ("mptcp: add and use MIB counter infrastructure")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/142e2fd9-58d9-bb13-fb75-951cccc2331e@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07 17:45:29 -08:00
Joseph Huang
851d0a73c9 bridge: Fix a deadlock when enabling multicast snooping
When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.

The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.

The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.

Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
   ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

A typical call trace looks like the following:

[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]

Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07 17:14:43 -08:00
Cong Wang
e3366884b3 lwt_bpf: Replace preempt_disable() with migrate_disable()
migrate_disable() is just a wrapper for preempt_disable() in
non-RT kernel. It is safe to replace it, and RT kernel will
benefit.

Note that it is introduced since Feb 2020.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201205075946.497763-2-xiyou.wangcong@gmail.com
2020-12-07 11:53:40 -08:00
Dongdong Wang
d9054a1ff5 lwt: Disable BH too in run_lwt_bpf()
The per-cpu bpf_redirect_info is shared among all skb_do_redirect()
and BPF redirect helpers. Callers on RX path are all in BH context,
disabling preemption is not sufficient to prevent BH interruption.

In production, we observed strange packet drops because of the race
condition between LWT xmit and TC ingress, and we verified this issue
is fixed after we disable BH.

Although this bug was technically introduced from the beginning, that
is commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure"),
at that time call_rcu() had to be call_rcu_bh() to match the RCU context.
So this patch may not work well before RCU flavor consolidation has been
completed around v5.0.

Update the comments above the code too, as call_rcu() is now BH friendly.

Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com
2020-12-07 11:53:39 -08:00
Marcel Holtmann
e6ed8b78ea Bluetooth: Increment management interface revision
Increment the mgmt revision due to the recently added new commands.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:02:00 +02:00
Abhishek Pandit-Subedi
dce0a4be80 Bluetooth: Set missing suspend task bits
When suspending, mark SUSPEND_SCAN_ENABLE and SUSPEND_SCAN_DISABLE tasks
correctly when either classic or le scanning is modified.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:46 +02:00
Daniel Winkler
4d9b952857 Bluetooth: Change MGMT security info CMD to be more generic
For advertising, we wish to know the LE tx power capabilities of the
controller in userspace, so this patch edits the Security Info MGMT
command to be more generic, such that other various controller
capabilities can be included in the EIR data. This change also includes
the LE min and max tx power into this newly-named command.

The change was tested by manually verifying that the MGMT command
returns the tx power range as expected in userspace.

Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:42 +02:00
Daniel Winkler
7c395ea521 Bluetooth: Query LE tx power on startup
Queries tx power via HCI_LE_Read_Transmit_Power command when the hci
device is initialized, and stores resulting min/max LE power in hdev
struct. If command isn't available (< BT5 support), min/max values
both default to HCI_TX_POWER_INVALID.

This patch is manually verified by ensuring BT5 devices correctly query
and receive controller tx power range.

Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:38 +02:00
Daniel Winkler
9bf9f4b630 Bluetooth: Use intervals and tx power from mgmt cmds
This patch takes the min/max intervals and tx power optionally provided
in mgmt interface, stores them in the advertisement struct, and uses
them when configuring the hci requests. While tx power is not used if
extended advertising is unavailable, software rotation will use the min
and max advertising intervals specified by the client.

This change is validated manually by ensuring the min/max intervals are
propagated to the controller on both hatch (extended advertising) and
kukui (no extended advertising) chromebooks, and that tx power is
propagated correctly on hatch. These tests are performed with multiple
advertisements simultaneously.

Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:33 +02:00
Daniel Winkler
1241057283 Bluetooth: Break add adv into two mgmt commands
This patch adds support for the new advertising add interface, with the
first command setting advertising parameters and the second to set
advertising data. The set parameters command allows the caller to leave
some fields "unset", with a params bitfield defining which params were
purposefully set. Unset parameters will be given defaults when calling
hci_add_adv_instance. The data passed to the param mgmt command is
allowed to be flexible, so in the future if bluetoothd passes a larger
structure with new params, the mgmt command will ignore the unknown
members at the end.

This change has been validated on both hatch (extended advertising) and
kukui (no extended advertising) chromebooks running bluetoothd that
support this new interface. I ran the following manual tests:
- Set several (3) advertisements using modified test_advertisement.py
- For each, validate correct data and parameters in btmon trace
- Verified both for software rotation and extended adv

Automatic test suite also run, testing many (25) scenarios of single and
multi-advertising for data/parameter correctness.

Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:28 +02:00
Daniel Winkler
31aab5c22e Bluetooth: Add helper to set adv data
We wish to handle advertising data separately from advertising
parameters in our new MGMT requests. This change adds a helper that
allows the advertising data and scan response to be updated for an
existing advertising instance.

Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org>
Signed-off-by: Daniel Winkler <danielwinkler@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:25 +02:00
Howard Chung
80af16a3e4 Bluetooth: Add toggle to switch off interleave scan
This patch add a configurable parameter to switch off the interleave
scan feature.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:00 +02:00
Howard Chung
3bc615fa93 Bluetooth: Refactor read default sys config for various types
Refactor read default system configuration function so that it's capable
of returning different types than u16

Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:56 +02:00
Howard Chung
422bb17f8a Bluetooth: Handle active scan case
This patch adds code to handle the active scan during interleave
scan. The interleave scan will be canceled when users start active scan,
and it will be restarted after active scan stopped.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:52 +02:00
Howard Chung
36afe87ac1 Bluetooth: Handle system suspend resume case
This patch adds code to handle the system suspension during interleave
scan. The interleave scan will be canceled when the system is going to
sleep, and will be restarted after waking up.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:47 +02:00
Howard Chung
c4f1f40816 Bluetooth: Interleave with allowlist scan
This patch implements the interleaving between allowlist scan and
no-filter scan. It'll be used to save power when at least one monitor is
registered and at least one pending connection or one device to be
scanned for.

The durations of the allowlist scan and the no-filter scan are
controlled by MGMT command: Set Default System Configuration. The
default values are set randomly for now.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:43 +02:00
Edward Vear
a31489d2a3 Bluetooth: Fix attempting to set RPA timeout when unsupported
During controller initialization, an LE Set RPA Timeout command is sent
to the controller if supported. However, the value checked to determine
if the command is supported is incorrect. Page 1921 of the Bluetooth
Core Spec v5.2 shows that bit 2 of octet 35 of the Supported_Commands
field corresponds to the LE Set RPA Timeout command, but currently
bit 6 of octet 35 is checked. This patch checks the correct value
instead.

This issue led to the error seen in the following btmon output during
initialization of an adapter (rtl8761b) and prevented initialization
from completing.

< HCI Command: LE Set Resolvable Private Address Timeout (0x08|0x002e) plen 2
        Timeout: 900 seconds
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 2
        Status: Unsupported Remote Feature / Unsupported LMP Feature (0x1a)
= Close Index: 00:E0:4C:6B:E5:03

The error did not appear when running with this patch.

Signed-off-by: Edward Vear <edwardvear@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:38 +02:00
Luiz Augusto von Dentz
aeeae47d34 Bluetooth: Rename get_adv_instance_scan_rsp
This renames get_adv_instance_scan_rsp to adv_instance_is_scannable and
make it return a bool since it was not actually properly return the size
of the scan response as one could expect.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:33 +02:00
Luiz Augusto von Dentz
a76a0d3650 Bluetooth: Fix not sending Set Extended Scan Response
Current code is actually failing on the following tests of mgmt-tester
because get_adv_instance_scan_rsp_len did not account for flags that
cause scan response data to be included resulting in non-scannable
instance when in fact it should be scannable.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:29 +02:00
Jimmy Wahlberg
5b8ec15d02 Bluetooth: Fix for Bluetooth SIG test L2CAP/COS/CFD/BV-14-C
This test case is meant to verify that multiple
unknown options is included in the response.

BLUETOOTH CORE SPECIFICATION Version 5.2 | Vol 3, Part A
page 1057

'On an unknown option failure (Result=0x0003),
the option(s) that contain anoption type field that is not
understood by the recipient of the L2CAP_CONFIGURATION_REQ
packet shall be included in the L2CAP_CONFIGURATION_RSP
packet unless they are hints.'

Before this patch:

> ACL Data RX: Handle 11 flags 0x02 dlen 24
      L2CAP: Configure Request (0x04) ident 18 len 16
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [mandatory]
        10 00 11 02 11 00 12 02 12 00
< ACL Data TX: Handle 11 flags 0x00 dlen 17
      L2CAP: Configure Response (0x05) ident 18 len 9
        Source CID: 64
        Flags: 0x0000
        Result: Failure - unknown options (0x0003)
        Option: Unknown (0x10) [mandatory]
        12

After this patch:

> ACL Data RX: Handle 11 flags 0x02 dlen 24
      L2CAP: Configure Request (0x04) ident 5 len 16
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [mandatory]
        10 00 11 02 11 00 12 02 12 00
< ACL Data TX: Handle 11 flags 0x00 dlen 23
      L2CAP: Configure Response (0x05) ident 5 len 15
        Source CID: 64
        Flags: 0x0000
        Result: Failure - unknown options (0x0003)
        Option: Unknown (0x10) [mandatory]
        10 11 01 11 12 01 12

Signed-off-by: Jimmy Wahlberg <jimmywa@spotify.com>
Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:23 +02:00
Wei Yongjun
f6b8c6b554 Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option
This commit add the invalid check for connected socket, without it will
causes the following crash due to sco_pi(sk)->conn being NULL:

KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
CPU: 3 PID: 4284 Comm: test_sco Not tainted 5.10.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:sco_sock_getsockopt+0x45d/0x8e0
Code: 48 c1 ea 03 80 3c 02 00 0f 85 ca 03 00 00 49 8b 9d f8 04 00 00 48 b8 00
      00 00 00 00 fc ff df 48 8d 7b 50 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e b5 03 00 00 8b 43 50 48 8b 0c
RSP: 0018:ffff88801bb17d88 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff83a4ecdf
RDX: 000000000000000a RSI: ffffc90002fce000 RDI: 0000000000000050
RBP: 1ffff11003762fb4 R08: 0000000000000001 R09: ffff88810e1008c0
R10: ffffffffbd695dcf R11: fffffbfff7ad2bb9 R12: 0000000000000000
R13: ffff888018ff1000 R14: dffffc0000000000 R15: 000000000000000d
FS:  00007fb4f76c1700(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555e3b7a938 CR3: 00000001117be001 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 ? sco_skb_put_cmsg+0x80/0x80
 ? sco_skb_put_cmsg+0x80/0x80
 __sys_getsockopt+0x12a/0x220
 ? __ia32_sys_setsockopt+0x150/0x150
 ? syscall_enter_from_user_mode+0x18/0x50
 ? rcu_read_lock_bh_held+0xb0/0xb0
 __x64_sys_getsockopt+0xba/0x150
 ? syscall_enter_from_user_mode+0x1d/0x50
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 0fc1a726f897 ("Bluetooth: sco: new getsockopt options BT_SNDMTU/BT_RCVMTU")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:15 +02:00
Reo Shiseki
353021588c Bluetooth: fix typo in struct name
Signed-off-by: Reo Shiseki <reoshiseki@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 16:51:22 +02:00