68744 Commits

Author SHA1 Message Date
Luiz Augusto von Dentz
008ee9eb8a Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work
hci_cmd_sync_queue can be called multiple times, each adding a
hci_cmd_sync_work_entry, before hci_cmd_sync_work is run so this makes
sure they are all dequeued properly otherwise it creates a backlog of
entries that are never run.

Link: https://lore.kernel.org/all/CAJCQCtSeUtHCgsHXLGrSTWKmyjaQDbDNpP4rb0i+RE+L2FTXSA@mail.gmail.com/T/
Fixes: 6a98e3836fa20 ("Bluetooth: Add helper for serialized HCI command execution")
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03 13:30:03 +01:00
Krzysztof Kozlowski
44cd576549 nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent"
This reverts commit 17f7ae16aef1f58bc4af4c7a16b8778a91a30255.

The commit brought a new socket state LLCP_DISCONNECTING, which was
never set, only read, so socket could never set to such state.

Remove the dead code.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Krzysztof Kozlowski
a06b804416 nfc: llcp: protect nfc_llcp_sock_unlink() calls
nfc_llcp_sock_link() is called in all paths (bind/connect) as a last
action, still protected with lock_sock().  When cleaning up in
llcp_sock_release(), call nfc_llcp_sock_unlink() in a mirrored way:
earlier and still under the lock_sock().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Krzysztof Kozlowski
a736491239 nfc: llcp: use test_bit()
Use test_bit() instead of open-coding it, just like in other places
touching the bitmap.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Krzysztof Kozlowski
4dbbf673f7 nfc: llcp: use centralized exiting of bind on errors
Coding style encourages centralized exiting of functions, so rewrite
llcp_sock_bind() error paths to use such pattern.  This reduces the
duplicated cleanup code, make success path visually shorter and also
cleans up the errors in proper order (in reversed way from
initialization).

No functional impact expected.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Krzysztof Kozlowski
ec10fd154d nfc: llcp: simplify llcp_sock_connect() error paths
The llcp_sock_connect() error paths were using a mixed way of central
exit (goto) and cleanup

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Krzysztof Kozlowski
13a3585b26 nfc: llcp: nullify llcp_sock->dev on connect() error paths
Nullify the llcp_sock->dev on llcp_sock_connect() error paths,
symmetrically to the code llcp_sock_bind().  The non-NULL value of
llcp_sock->dev is used in a few places to check whether the socket is
still valid.

There was no particular issue observed with missing NULL assignment in
connect() error path, however a similar case - in the bind() error path
- was triggereable.  That one was fixed in commit 4ac06a1e013c ("nfc:
fix NULL ptr dereference in llcp_sock_getname() after failed connect"),
so the change here seems logical as well.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:43:37 +00:00
Petr Machata
5fd0b838ef net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
which should be carried by the RTM_SETSTATS message, and expresses a desire
to toggle L3 offload xstats on or off.

As part of the above, add an exported function rtnl_offload_xstats_notify()
that drivers can use when they have installed or deinstalled the counters
backing the HW stats.

At this point, it is possible to enable, disable and query L3 offload
xstats on netdevices. (However there is no driver actually implementing
these.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:23 +00:00
Petr Machata
03ba356670 net: rtnetlink: Add RTM_SETSTATS
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. These stats are only accessible through RTM_GETSTATS, and
therefore should be toggled by a RTM_SETSTATS message. Add it, and the
necessary skeleton handler.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:23 +00:00
Petr Machata
0e7788fd76 net: rtnetlink: Add UAPI for obtaining L3 offload xstats
Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
place in a HW router.

The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Additionally, as a netdevice is configured, it may become or
cease being suitable for binding of a HW counter. Both of these aspects
need to be communicated to the userspace. To that end, add another child
attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:

    - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
	- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
 	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
	      - {0,1} as u8
 	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
	      - {0,1} as u8

Thus this one attribute is a nest that can be used to carry information
about various types of HW statistics, and indexing is very simply done by
wrapping the information for a given statistics suite into the attribute
that carries the suite is the RTM_GETSTATS query. At the same time, because
_HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
possible through filtering to request only the metadata about individual
statistics suites, without having to hit the HW to get the actual counters.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:23 +00:00
Petr Machata
9309f97aef net: dev: Add hardware stats support
Offloading switch device drivers may be able to collect statistics of the
traffic taking place in the HW datapath that pertains to a certain soft
netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
these statistics to the offloaded netdevice in question. The API was shaped
by the following considerations:

- Collection of HW statistics is not free: there may be a finite number of
  counters, and the act of counting may have a performance impact. It is
  therefore necessary to allow toggling whether HW counting should be done
  for any particular SW netdevice.

- As the drivers are loaded and removed, a particular device may get
  offloaded and unoffloaded again. At the same time, the statistics values
  need to stay monotonic (modulo the eventual 64-bit wraparound),
  increasing only to reflect traffic measured in the device.

  To that end, the netdevice keeps around a lazily-allocated copy of struct
  rtnl_link_stats64. Device drivers then contribute to the values kept
  therein at various points. Even as the driver goes away, the struct stays
  around to maintain the statistics values.

- Different HW devices may be able to count different things. The
  motivation behind this patch in particular is exposure of HW counters on
  Nvidia Spectrum switches, where the only practical approach to counting
  traffic on offloaded soft netdevices currently is to use router interface
  counters, and count L3 traffic. Correspondingly that is the statistics
  suite added in this patch.

  Other devices may be able to measure different kinds of traffic, and for
  that reason, the APIs are built to allow uniform access to different
  statistics suites.

- Because soft netdevices and offloading drivers are only loosely bound, a
  netdevice uses a notifier chain to communicate with the drivers. Several
  new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
  to the offloading drivers.

- Devices can have various conditions for when a particular counter is
  available. As the device is configured and reconfigured, the device
  offload may become or cease being suitable for counter binding. A
  netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
  ping offloading drivers and determine whether anyone currently implements
  a given statistics suite. This information can then be propagated to user
  space.

  When the driver decides to unoffload a netdevice, it can use a
  newly-added function, netdev_offload_xstats_report_delta(), to record
  outstanding collected statistics, before destroying the HW counter.

This patch adds a helper, call_netdevice_notifiers_info_robust(), for
dispatching a notifier with the possibility of unwind when one of the
consumers bails. Given the wish to eventually get rid of the global
notifier block altogether, this helper only invokes the per-netns notifier
block.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:23 +00:00
Petr Machata
216e690631 net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns
Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW
access, and can fail for more reasons than just netlink message size
exhaustion. Therefore do not always return -EMSGSIZE on the failure path,
but respect the error code provided by the callee. Set the error explicitly
where it is reasonable to assume -EMSGSIZE as the failure reason.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:22 +00:00
Petr Machata
05415bccbb net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill()
Later patches add handlers for more HW-backed statistics. An extack will be
useful when communicating HW / driver errors to the client. Add the
arguments as appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:22 +00:00
Petr Machata
46efc97b73 net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests
The filter_mask field of RTM_GETSTATS header determines which top-level
attributes should be included in the netlink response. This saves
processing time by only including the bits that the user cares about
instead of always dumping everything. This is doubly important for
HW-backed statistics that would typically require a trip to the device to
fetch the stats.

So far there was only one HW-backed stat suite per attribute. However,
IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
the following patches. It would therefore be advantageous to be able to
filter within that nest, and select just one or the other HW-backed
statistics suite.

Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
The scheme is as follows:

    - RTM_GETSTATS
	- struct if_stats_msg
	- attr nest IFLA_STATS_GET_FILTERS
	    - attr IFLA_STATS_LINK_OFFLOAD_XSTATS
		- u32 filter_mask

This scheme reuses the existing enumerators by nesting them in a dedicated
context attribute. This is covered by policies as usual, therefore a
gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
nest has filtering enabled, because for the SW counters the issue does not
seem to be that important.

rtnl_offload_xstats_get_size() and _fill() are extended to observe the
requested filters.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:22 +00:00
Petr Machata
f6e0fb8129 net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed
The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
attributes carry various special hardware statistics. The code that handles
this nest was written with the idea that all these statistics would be
exposed by the device driver of a physical netdevice.

In the following patches, a new attribute is added to the abovementioned
nest, which however can be defined for some soft netdevices. The NDO-based
approach to querying these does not work, because it is not the soft
netdevice driver that exposes these statistics, but an offloading NIC
driver that does so.

The current code does not scale well to this usage. Simply rewrite it back
to the pattern seen in other fill-like and get_size-like functions
elsewhere.

Extract to helpers the code that is concerned with handling specifically
NDO-backed statistics so that it can be easily reused should more such
statistics be added.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:22 +00:00
Petr Machata
6b524a1d01 net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_*
The currently used names rtnl_get_offload_stats() and
rtnl_get_offload_stats_size() do not clearly show the namespace. The former
function additionally seems to have been named this way in accordance with
the NDO name, as opposed to the naming used in the rtnetlink.c file (and
indeed elsewhere in the netlink handling code). As more and
differently-flavored attributes are introduced, a common clear prefix is
needed for all related functions.

Rename the functions to follow the rtnl_offload_xstats_* naming scheme.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:37:22 +00:00
Hans de Goede
815d512192 Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()
There is only one "goto done;" in set_device_flags() and this happens
*before* hci_dev_lock() is called, move the done label to after the
hci_dev_unlock() to fix the following unlock balance:

[   31.493567] =====================================
[   31.493571] WARNING: bad unlock balance detected!
[   31.493576] 5.17.0-rc2+ #13 Tainted: G         C  E
[   31.493581] -------------------------------------
[   31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at:
[   31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth]
[   31.493684] but there are no more locks to release!

Note this bug has been around for a couple of years, but before
commit fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags")
supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so
the check for unsupported flags which does the "goto done;" never
triggered.

Fixes: fe92ee6425a2 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03 11:35:10 +01:00
D. Wythe
4940a1fdf3 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
Based on the fact that whether a new SMC connection can be accepted or
not depends on not only the limit of conn nums, but also the available
entries of rtoken. Since the rtoken release is trigger by peer, while
the conn nums is decrease by local, tons of thing can happen in this
time difference.

This only thing that needs to be mentioned is that now all connection
creations are completely protected by smc_server_lgr_pending lock, it's
enough to check only the available entries in rtokens_used_mask.

Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:34:18 +00:00
D. Wythe
0537f0a215 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
dues to following execution sequence:

Server Conn A:           Server Conn B:			Client Conn B:

smc_lgr_unregister_conn
                        smc_lgr_register_conn
                        smc_clc_send_accept     ->
                                                        smc_rtoken_add
smcr_buf_unuse
		->		Client Conn A:
				smc_rtoken_delete

smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.

Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:34:18 +00:00
Joe Damato
6b95e3388b page_pool: Add function to batch and return stats
Adds a function page_pool_get_stats which can be used by drivers to obtain
stats for a specified page_pool.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 09:55:28 +00:00
Joe Damato
ad6fa1e1ab page_pool: Add recycle stats
Add per-cpu stats tracking page pool recycling events:
	- cached: recycling placed page in the page pool cache
	- cache_full: page pool cache was full
	- ring: page placed into the ptr ring
	- ring_full: page released from page pool because the ptr ring was full
	- released_refcnt: page released (and not recycled) because refcnt > 1

Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 09:55:28 +00:00
Joe Damato
8610037e81 page_pool: Add allocation stats
Add per-pool statistics counters for the allocation path of a page pool.
These stats are incremented in softirq context, so no locking or per-cpu
variables are needed.

This code is disabled by default and a kernel config option is provided for
users who wish to enable them.

The statistics added are:
	- fast: successful fast path allocations
	- slow: slow path order-0 allocations
	- slow_high_order: slow path high order allocations
	- empty: ptr ring is empty, so a slow path allocation was forced.
	- refill: an allocation which triggered a refill of the cache
	- waive: pages obtained from the ptr ring that cannot be added to
	  the cache due to a NUMA mismatch.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 09:55:28 +00:00
Eric Dumazet
e3d5ea2c01 tcp: make tcp_read_sock() more robust
If recv_actor() returns an incorrect value, tcp_read_sock()
might loop forever.

Instead, issue a one time warning and make sure to make progress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:49:03 -08:00
Eric Dumazet
60ce37b039 bpf, sockmap: Do not ignore orig_len parameter
Currently, sk_psock_verdict_recv() returns skb->len

This is problematic because tcp_read_sock() might have
passed orig_len < skb->len, due to the presence of TCP urgent data.

This causes an infinite loop from tcp_read_sock()

Followup patch will make tcp_read_sock() more robust vs bad actors.

Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:49:03 -08:00
Kurt Kanzenbach
bf08824a0f flow_dissector: Add support for HSR
Network drivers such as igb or igc call eth_get_headlen() to determine the
header length for their to be constructed skbs in receive path.

When running HSR on top of these drivers, it results in triggering BUG_ON() in
skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
correctly. skb_pull() notices that.

For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
is not correct. The problem is, the flow dissection code does not take HSR into
account. Therefore, add support for it.

Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:44:49 -08:00
Yang Li
cb1d8fba91 net: openvswitch: remove unneeded semicolon
Eliminate the following coccicheck warning:
./net/openvswitch/flow.c:379:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220227132208.24658-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:22:18 -08:00
Baowen Zheng
d922a99b96 flow_offload: improve extack msg for user when adding invalid filter
Add extack message to return exact message to user when adding invalid
filter with conflict flags for TC action.

In previous implement we just return EINVAL which is confusing for user.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Link: https://lore.kernel.org/r/1646191769-17761-1-git-send-email-baowen.zheng@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:16:10 -08:00
lena wang
224102de2f net: fix up skbs delta_truesize in UDP GRO frag_list
The truesize for a UDP GRO packet is added by main skb and skbs in main
skb's frag_list:
skb_gro_receive_list
        p->truesize += skb->truesize;

The commit 53475c5dd856 ("net: fix use-after-free when UDP GRO with
shared fraglist") introduced a truesize increase for frag_list skbs.
When uncloning skb, it will call pskb_expand_head and trusesize for
frag_list skbs may increase. This can occur when allocators uses
__netdev_alloc_skb and not jump into __alloc_skb. This flow does not
use ksize(len) to calculate truesize while pskb_expand_head uses.
skb_segment_list
err = skb_unclone(nskb, GFP_ATOMIC);
pskb_expand_head
        if (!skb->sk || skb->destructor == sock_edemux)
                skb->truesize += size - osize;

If we uses increased truesize adding as delta_truesize, it will be
larger than before and even larger than previous total truesize value
if skbs in frag_list are abundant. The main skb truesize will become
smaller and even a minus value or a huge value for an unsigned int
parameter. Then the following memory check will drop this abnormal skb.

To avoid this error we should use the original truesize to segment the
main skb.

Fixes: 53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist")
Signed-off-by: lena wang <lena.wang@mediatek.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:04:10 -08:00
Jakub Kicinski
fa452e0a60 This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Remove redundant 'flush_workqueue()' calls, by Christophe JAILLET
 
  - Migrate to linux/container_of.h, by Sven Eckelmann
 
  - Demote batadv-on-batadv skip error message, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmIfnFgWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeofBZEACtLe1VvUbNi00KMFWE7N32/C6v
 X7snbt9HeoWJUAGQ8C89Eu80sAa0Jpig99qnQNNFRT6UR0T/DkFUYtUVkd5HV1TV
 OwiZag6PvROck4FyN2YYde5NA96PvMm6/70NlVWL4dXB1IVWoQvGBtoWNmuom/hA
 EkCIXt7IE1T1Y+OrAyeRM5KXcxK8nNYQbL2fKvampELAu8SRcq/cF7vfUQYq9OTz
 7PNxTRqbZ2EOzp57A0EyYqYSzNpoKgQxyJsMjRGBZ6mooJB/GHNhj6B7qxyva/70
 O942Twq9HY0F+XPhUxVDD5W2W8g2Mai1FFYpXlMpHOhiQQuVHqp9g6SLNOxGjEhC
 O1UrPRHdC4KKQoEqqJdYwdyFBE7yNvkJkgF1dUIpoAAjn6xcYo9uWUq+hxItbW2k
 OxmhNA9xLkiEtffT1sEJxf0rAyUj6WK88PsBVaVwxMSnSgRq87s3b926EnaxOnkx
 Te7V8ZnNFk+kvJQHtAmW1ZylAeAMAOvJ7m8f3+RzS4h7C5hiYYFP4B3QRJ8uIVAO
 klKohohPvGIuann1fyu3qRB2tm4Op+PurakGzusryVDkrPm70Gvtdy34M3s68vH+
 y41pWZuSwz5HjBHvXrVDgXPK8Jo7KfxUM3Xrt7sd7mJJ6ik6GMbe5PkM04cjxVks
 4kZKB4oCk72u8DtknA==
 =OG/5
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Remove redundant 'flush_workqueue()' calls, by Christophe JAILLET

 - Migrate to linux/container_of.h, by Sven Eckelmann

 - Demote batadv-on-batadv skip error message, by Sven Eckelmann

* tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
  batman-adv: Demote batadv-on-batadv skip error message
  batman-adv: Migrate to linux/container_of.h
  batman-adv: Remove redundant 'flush_workqueue()' calls
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20220302163522.102842-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 21:58:03 -08:00
Jakub Kicinski
ea97ab9889 Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
 
  - Don't expect inter-netns unique iflink indices, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmIfmmUWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSy0D/9XfRgiSQH9DuyzXjlRMV1qMgNO
 9O0HOVqZagRhtydGNsBhu1mSFcBIFUhyRD20BvyKrulsgb9Cpkojk9zv31kV0W9S
 PIK81oNVplAiDnz3aZmGWqC1GEZHziJOezvr8eBIvdRay+F/M5oqWqqtQmDyZFXc
 JEnQcJyNZTGAmfYKM8Nq5iAnqoIkPIxChvipR7P12g/cjd0owbzl/QShdlLw0nDe
 zK8Pj/zXSA+hXdIsWrh9n7yU7m7sTWWMOo0ElkOhwb6DyDxAhEvIe59jcv0SE+HL
 7zHFf1hJluO7mswK9pJSIm80+P5WtyEBskUjCL0rv7fdAzvuLsv5YLoZvEmYf39h
 /5Wo3rM5iXwNuT9hB9+teUZaPBF1dXAYOusN1Y/vuvNdG+YCq8BJn5UDmiXyYEWF
 23N1R1LQrYs+JoitIS01cpTi69rjZFN5xqtFs+gN84jgfb4vkP2JxjHyz2eBB47Q
 UTYaUh60uYtMLYXZdVFwUP7DbJnN5fGlPg9zIPAU+yHHvyqmrGjY80vn+voJ0Y5w
 wNOzTpyg7+KFpM1CpCccqivWxuJeVbgVS5jL+xegpkf7PgOy5oepkMyThZ24yy9o
 K3uTHCl0HH+YrvVFWZVrqEVQoMZ3ErQmihHXwZZy/p71MeQ/m8gnT2iEMaOlxX3k
 Efrh9mM6JI+wLCtZLA==
 =kOlX
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Remove redundant iflink requests, by Sven Eckelmann (2 patches)

 - Don't expect inter-netns unique iflink indices, by Sven Eckelmann

* tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
  batman-adv: Don't expect inter-netns unique iflink indices
  batman-adv: Request iflink once in batadv_get_real_netdevice
  batman-adv: Request iflink once in batadv-on-batadv check
====================

Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 21:53:35 -08:00
Sreeramya Soratkal
e50b88c4f0 nl80211: Update bss channel on channel switch for P2P_CLIENT
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.

Update the bss channel after CSA channel switch completion for P2P client
interface as well.

Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com>
Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02 22:37:05 +01:00
Sven Eckelmann
6c1f41afc1 batman-adv: Don't expect inter-netns unique iflink indices
The ifindex doesn't have to be unique for multiple network namespaces on
the same machine.

  $ ip netns add test1
  $ ip -net test1 link add dummy1 type dummy
  $ ip netns add test2
  $ ip -net test2 link add dummy2 type dummy

  $ ip -net test1 link show dev dummy1
  6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
  $ ip -net test2 link show dev dummy2
  6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff

But the batman-adv code to walk through the various layers of virtual
interfaces uses this assumption because dev_get_iflink handles it
internally and doesn't return the actual netns of the iflink. And
dev_get_iflink only documents the situation where ifindex == iflink for
physical devices.

But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
because ipoib_get_iflink implements it even when it sometimes returns an
iflink != ifindex and sometimes iflink == ifindex. The caller must
therefore make sure itself to check both netns and iflink + ifindex for
equality. Only when they are equal, a "physical" interface was detected
which should stop the traversal. On the other hand, vxcan_get_iflink can
also return 0 in case there was currently no valid peer. In this case, it
is still necessary to stop.

Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:24:55 +01:00
Sven Eckelmann
6116ba0942 batman-adv: Request iflink once in batadv_get_real_netdevice
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_get_real_netdevice. And since some of the
ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.

Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:01:28 +01:00
Sven Eckelmann
690bb6fb64 batman-adv: Request iflink once in batadv-on-batadv check
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_is_on_batman_iface. And since some of the
.ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.

Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:01:25 +01:00
Sven Eckelmann
6ee3c393ee batman-adv: Demote batadv-on-batadv skip error message
The error message "Cannot find parent device" was shown for users of
macvtap (on batadv devices) whenever the macvtap was moved to a different
netns. This happens because macvtap doesn't provide an implementation for
rtnl_link_ops->get_link_net.

The situation for which this message is printed is actually not an error
but just a warning that the optional sanity check was skipped. So demote
the message from error to warning and adjust the text to better explain
what happened.

Reported-by: Leonardo Mörlein <freifunk@irrelefant.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:00:17 +01:00
Sven Eckelmann
eb7da4f17d batman-adv: Migrate to linux/container_of.h
The commit d2a8ebbf8192 ("kernel.h: split out container_of() and
typeof_member() macros")  introduced a new header for the container_of
related macros from (previously) linux/kernel.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:00:13 +01:00
Vladimir Oltean
0b0e2ff103 net: dsa: restore error path of dsa_tree_change_tag_proto
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process
which initiated the protocol change exits the kernel processing while
still holding the rtnl_mutex. So any other process attempting to lock
the rtnl_mutex would deadlock after such event.

The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed
by the blamed commit, introducing this regression. We must still call
rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old
protocol. The latter is due to the limiting design of notifier chains
for cross-chip operations, which don't have a built-in error recovery
mechanism - we should look into using notifier_call_chain_robust for that.

Fixes: dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 18:26:21 -08:00
Jakub Kicinski
2e77551c61 bluetooth pull request for net:
- Fix regression with scanning not working in some systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmIevJQZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKTWBD/0ZN7d1sClyVmH/ymc5l+77
 8HXYPCbhDez1Tix4m2cS5demyAJj+T1YQE5tAmtiTZT2NyWphWNgmLOKC4VLTjmu
 YVSjyvJfn7lWgBFPuv2G8XatqELnsYNaq86urPmxHpwohbNFsT4IVPMWgUicPEjj
 iFLU6MY2Pskkinh72FaRHmOGpDN8v7KIL4RnFbR1DZq2fNLVYTVexQcmAsqOIoP5
 2GLuTkmXnYlTplbqbvJMxFbUBOQO8MBaQlR2+n7nEctZ9NhhmyrayrCoCQ8PxLEQ
 9zhgzJFtgpJ4Nr45TOCBPZ+hBk3hNaqYM42stJ+3nydYKupSNu4ccCiJ181FxNuy
 R2YweVz+/1MkTbhFAuhxOLh0QQ3JWlKCRPqduT6q/VxyTR19hOrGP2eIiPxHr8k6
 b1u0i4ExgyjdJJo29Mw4UqFs2mGXDYo8FvVz0FV5xabfyto2QuFi4dV1V7NxKsCI
 XSrpgPBntOR/WRV36bO+64NknEGIwyfg/BBTAu7kQvVfGH9mDKX26C3UxfBr2/s+
 pK7bRALyYpVceI1rZA9kHFVCrX4iLSkUyTNF9voj0khifnRaRR7xfWumdG8PEJpf
 2kE9Bn4Vi4czz3kwLK4AXsoLABHI3aiyt30CMw7xCqoDD1FSzebbfzaJDKdelrgJ
 BUnmOMQgkj6OhMlMz/THvg==
 =qugn
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix regression with scanning not working in some systems.

* tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: Fix not checking MGMT cmd pending queue
====================

Link: https://lore.kernel.org/r/20220302004330.125536-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 17:16:46 -08:00
Jakub Kicinski
ef739f1dd3 net: smc: fix different types in min()
Fix build:

 include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp’
   45 | #define min(x, y)       __careful_cmp(x, y, <)
      |                         ^~~~~~~~~~~~~
 net/smc/smc_tx.c:150:24: note: in expansion of macro ‘min’
  150 |         corking_size = min(sock_net(&smc->sk)->smc.sysctl_autocorking_size,
      |                        ^~~

Fixes: 12bbb0d163a9 ("net/smc: add sysctl for autocorking")
Link: https://lore.kernel.org/r/20220301222446.1271127-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 16:43:27 -08:00
Brian Gix
275f3f6487 Bluetooth: Fix not checking MGMT cmd pending queue
A number of places in the MGMT handlers we examine the command queue for
other commands (in progress but not yet complete) that will interact
with the process being performed. However, not all commands go into the
queue if one of:

1. There is no negative side effect of consecutive or redundent commands
2. The command is entirely perform "inline".

This change examines each "pending command" check, and if it is not
needed, deletes the check. Of the remaining pending command checks, we
make sure that the command is in the pending queue by using the
mgmt_pending_add/mgmt_pending_remove pair rather than the
mgmt_pending_new/mgmt_pending_free pair.

Link: https://lore.kernel.org/linux-bluetooth/f648f2e11bb3c2974c32e605a85ac3a9fac944f1.camel@redhat.com/T/
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-01 16:10:58 -08:00
Stanislav Fomichev
530e214c5b bpf, test_run: Fix overflow in XDP frags bpf_test_finish
Syzkaller reports another issue:

WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
check_copy_size include/linux/thread_info.h:230 [inline]
WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
copy_to_user include/linux/uaccess.h:199 [inline]
WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230
bpf_test_finish.isra.0+0x4b2/0x680 net/bpf/test_run.c:171

This can happen when the userspace buffer is smaller than head + frags.
Return ENOSPC in this case.

Fixes: 7855e0db150a ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
Reported-by: syzbot+5f81df6205ecbbc56ab5@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/bpf/20220228232332.458871-1-sdf@google.com
2022-03-02 01:09:15 +01:00
Jakub Kicinski
4761df52f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not
   intentional. From Eric Dumazet.

2) Use-after-free in netfilter hook core, from Eric Dumazet.

3) Missing rcu read lock side for netfilter egress hook,
   from Florian Westphal.

4) nf_queue assume state->sk is full socket while it might not be.
   Invoke sock_gen_put(), from Florian Westphal.

5) Add selftest to exercise the reported KASAN splat in 4)

6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0.
   Also from Florian.

7) Use input interface index only for hardware offload, not for
   the software plane. This breaks tc ct action. Patch from Paul Blakey.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
  netfilter: nf_queue: handle socket prefetch
  netfilter: nf_queue: fix possible use-after-free
  selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
  netfilter: nf_queue: don't assume sk is full socket
  netfilter: egress: silence egress hook lockdep splats
  netfilter: fix use-after-free in __nf_register_net_hook()
  netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
====================

Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 15:13:47 -08:00
Paul Blakey
db6140e5e3 net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
After cited commit optimizted hw insertion, flow table entries are
populated with ifindex information which was intended to only be used
for HW offload. This tuple ifindex is hashed in the flow table key, so
it must be filled for lookup to be successful. But tuple ifindex is only
relevant for the netfilter flowtables (nft), so it's not filled in
act_ct flow table lookup, resulting in lookup failure, and no SW
offload and no offload teardown for TCP connection FIN/RST packets.

To fix this, add new tc ifindex field to tuple, which will
only be used for offloading, not for lookup, as it will not be
part of the tuple hash.

Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-01 22:08:31 +01:00
Jeff Layton
27884f4bce libceph: drop else branches in prepare_read_data{,_cont}
Just call set_in_bvec in the non-conditional part.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01 18:26:36 +01:00
David S. Miller
b8d06ce712 Some last-minute fixes:
* rfkill
    - add missing rfill_soft_blocked() when disabled
 
  * cfg80211
    - handle a nla_memdup() failure correctly
    - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
      Makefile
 
  * mac80211
    - fix EAPOL handling in 802.3 RX path
    - reject setting up aggregation sessions before
      connection is authorized to avoid timeouts or
      similar
    - handle some SAE authentication steps correctly
    - fix AC selection in mesh forwarding
 
  * iwlwifi
    - remove TWT support as it causes firmware crashes
      when the AP isn't behaving correctly
    - check debugfs pointer before dereferncing it
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmIeHPcACgkQB8qZga/f
 l8QwBQ//QAXCzemdYF6PpeIvrjOdNU+lJ+ajX/bYyk+pzpW6BRJRUM/MocN+vhUH
 scDCE4Ve8I7Xqx+H6zFOm0Wr2M3qqnzJwMni/4qeQw7mV8msFw4SY2XqaE9nMXkV
 dhVYgrbrmluevBCXCm/rCu9JpWe08A5nH1IycVGJXHbxdMgvifPPHm0/gHBiEvJh
 16itDwJZcqUWZj3DswMe011HMrJubfL6wSfbGdmMgeOdRAkWJHu/bKBLrOM/sveL
 QfPx5RL6MIHcWRLwtLDdDYRTuI1DmhcGWKXOK+BlYtL1vj/zsp8EXCPzTN3uxQw0
 ld58G5pMU16o3iLpwuRlJAUWfQKE6qV1c4obiYZPLzkWpQCWJRrtjd+U4eR0Oewz
 IoQr1NYd6kFB8MFqa8xKY5JMiuEYsABWWho9udkODoaLS4Ege1J4bI7sub33ifER
 qnBE7TB+XO01a+Ys5GOWwEgO6d3t1lEW/mVVLsxdjq3qV1PpWE3ExYnXJEKd6guj
 oU4nDdtaV0AII6ByoB/uxPobqpyAEky8TDd4c2i9Z7qCs8z0O+J9kvTD5jtrGv//
 g4F/6KZ2aQAKYba9CuAoP91VLiiAC4bhagitDFx5mtVaCSj1wdcaz+PVfzYtPxAb
 Ll7HDBqCjC8jfoJx6FoVbaa8xk1rCM9sjr/EGun7iNW9Y4N9Ocg=
 =Q7qF
 -----END PGP SIGNATURE-----

Merge tag 'wireless-for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

johannes Berg says:

====================

Some last-minute fixes:
 * rfkill
   - add missing rfill_soft_blocked() when disabled

 * cfg80211
   - handle a nla_memdup() failure correctly
   - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
     Makefile

 * mac80211
   - fix EAPOL handling in 802.3 RX path
   - reject setting up aggregation sessions before
     connection is authorized to avoid timeouts or
     similar
   - handle some SAE authentication steps correctly
   - fix AC selection in mesh forwarding

 * iwlwifi
   - remove TWT support as it causes firmware crashes
     when the AP isn't behaving correctly
   - check debugfs pointer before dereferncing it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:45:55 +00:00
Dust Li
6b88af839d net/smc: don't send in the BH context if sock_owned_by_user
Send data all the way down to the RDMA device is a time
consuming operation(get a new slot, maybe do RDMA Write
and send a CDC, etc). Moving those operations from BH
to user context is good for performance.

If the sock_lock is hold by user, we don't try to send
data out in the BH context, but just mark we should
send. Since the user will release the sock_lock soon, we
can do the sending there.

Add smc_release_cb() which will be called in release_sock()
and try send in the callback if needed.

This patch moves the sending part out from BH if sock lock
is hold by user. In my testing environment, this saves about
20% softirq in the qperf 4K tcp_bw test in the sender side
with no noticeable throughput drop.

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:25:12 +00:00
Dust Li
a505cce6f7 net/smc: don't req_notify until all CQEs drained
When we are handling softirq workload, enable hardirq may
again interrupt the current routine of softirq, and then
try to raise softirq again. This only wastes CPU cycles
and won't have any real gain.

Since IB_CQ_REPORT_MISSED_EVENTS already make sure if
ib_req_notify_cq() returns 0, it is safe to wait for the
next event, with no need to poll the CQ again in this case.

This patch disables hardirq during the processing of softirq,
and re-arm the CQ after softirq is done. Somehow like NAPI.

Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:25:12 +00:00
Dust Li
6bf536eb5c net/smc: correct settings of RMB window update limit
rmbe_update_limit is used to limit announcing receive
window updating too frequently. RFC7609 request a minimal
increase in the window size of 10% of the receive buffer
space. But current implementation used:

  min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)

and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost
always less then 10% of the receive buffer space.

This causes the receiver always sending CDC message to
update its consumer cursor when it consumes more then 2K
of data. And as a result, we may encounter something like
"TCP silly window syndrome" when sending 2.5~8K message.

This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).

With this patch and SMC autocorking enabled, qperf 2K/4K/8K
tcp_bw test shows 45%/75%/40% increase in throughput respectively.

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:25:12 +00:00
Dust Li
b70a5cc045 net/smc: send directly on setting TCP_NODELAY
In commit ea785a1a573b("net/smc: Send directly when
TCP_CORK is cleared"), we don't use delayed work
to implement cork.

This patch use the same algorithm, removes the
delayed work when setting TCP_NODELAY and send
directly in setsockopt(). This also makes the
TCP_NODELAY the same as TCP.

Cc: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:25:12 +00:00
Dust Li
12bbb0d163 net/smc: add sysctl for autocorking
This add a new sysctl: net.smc.autocorking_size

We can dynamically change the behaviour of autocorking
by change the value of autocorking_size.
Setting to 0 disables autocorking in SMC

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:25:12 +00:00