1091377 Commits

Author SHA1 Message Date
David S. Miller
c908565eec This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - remove unnecessary type castings, by Yu Zhe
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmJ3xIgWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoa6GD/wPif4zkO407J0jU9xnf8F5w8Mz
 5flPXyFu/YlecXAZjLb2BTl+6xI3KLf6lbDIRvcJ3rga2P69IL2JQvqNf9r8HtpP
 zl95frZaxKT3S3DoMBdPyU9HgtrmYVnUXtrQGuL0Sm9CukW0p08SlhqlRgp1NVRy
 XqNhmYhPJhhHk38rwOnNvMdUPsjzOCECV2Lbe4e1lX/503Q17DcYEL+Y+0GpQZwW
 tDSwAUYT0Us0HJSEMwouag+x7uom/BlZml2iLuSTNKNYxoPb3JIOZiVyJ+2fWN0C
 qlxy8+mOVJoKPRBlzRvX/Dr5GFJbav+Sg1tTR3gDYQvPB4jIPaZcAnI0OuPuRER+
 FdJPk9VqEHTVuaMljpRJJVDTHX084IjgFRuF7xhrZf/oB076GoGzXG6XPJuOWcoN
 BbX5ai4Vtj2aHq5RdbAaxHdrUoGM5+XKHvSXJmp4MgTC2dcEWlKqZMI39t/6BYBH
 n9UHUMDgy1NL/C1olVpfTN9OUYwWec971/cQmU4mmuHpybVL1gACEO+vFx9jabO1
 w5lmCbvMcl6SiN/pxi/+tsOOU5TB5PH8M5oAvbinBU9kCqi6g1ldj0jc5TUBttsj
 YFYDpqnn/TnEfVjnqMsKEX9E30WevledW8Nms5jzxIzncjWGEse2l7CEwjEcNTtI
 qkukIJYrbzvwR4OGDg==
 =Cg7C
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20220508' of git://git.open-mesh.org/linux-merge

This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - remove unnecessary type castings, by Yu Zhe

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 17:10:59 +01:00
David S. Miller
eb60020411 Merge branch 'mlxsw-dedicated-router-notification-block'
Ido Schimmel says:

====================
mlxsw: A dedicated notifier block for router code

Petr says:

Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.

In this patchset, move bulk of the router-related event handling to the
router code. Some of the knowledge has to stay: spectrum.c cannot veto
events that the router supports, and vice versa. But beyond that, the two
can ignore each other's details, which leads to more focused and simpler
code.

As a side effect, this fixes L3 HW stats support on tunnel netdevices.

The patch set progresses as follows:

- In patch #1, change spectrum code to not bounce L3 enslavement, which the
  router code supports.

- In patch #2, add a new do-nothing notifier block to the router code.

- In patches #3-#6, move router-specific event handling to the router
  module. In patch #7, clean up a comment.

- In patch #8, use the advantage that all router event handling is in the
  router code and clean up taking router lock.

- mlxsw supports L3 HW stats on tunnels as of this patchset. Patches #9 and
  #10 therefore add a selftest for L3 HW stats support on tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:21 +01:00
Petr Machata
813f97a268 selftests: forwarding: Add a tunnel-based test for L3 HW stats
Add a selftest that uses an IPIP topology and tests that L3 HW stats
reflect the traffic in the tunnel.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:21 +01:00
Petr Machata
32fb67a3e7 selftests: lib: Add a generic helper for obtaining HW stats
The function get_l3_stats() from the test hw_stats_l3.sh will be useful for
any test that wishes to work with L3 stats. Furthermore, it is easy to
generalize to other HW stats suites (for when such are added). Therefore,
move the code to lib.sh, rewrite it to have the same interface as the other
stats-collecting functions, and generalize to take the name of the HW stats
suite to collect as an argument.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
c353fb0d4c mlxsw: spectrum_router: Take router lock in router notifier handler
For notifications that the router needs to handle, router lock is taken.
Further, at least to determine whether an event is related to a tunnel
underlay, router lock also needs to be taken. Due to this, the router lock
is always taken for each unhandled event, and also for some handled events,
even if they are not related to underlay. Thus each event implies at least
one router lock, sometimes two.

Instead of deferring the locking to the leaf handlers, take the lock in the
router notifier handler always. This simplifies thinking about the locking
state, and in some cases saves one lock cycle.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
05a8d7d4fa mlxsw: spectrum: Update a comment
The position of netdevice notifier registration no longer depends on the
router initialization, because the event handler no longer dispatches to
the router code. Update the comment at the registration to that effect.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
75ef434228 mlxsw: spectrum: Move handling of tunnel events to router code
The events related to IPIP tunnels are handled by the router code. Move the
handling from the central dispatcher in spectrum.c to the new notifier
handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
ba81954cd5 mlxsw: spectrum: Move handling of router events to router code
The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU
have implications for in-ASIC router interface objects, and as such are
handled in the router module. Move the handling from the central dispatcher
in spectrum.c to the new notifier handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
f40e600b36 mlxsw: spectrum: Move handling of HW stats events to router code
L3 HW stats are implemented in mlxsw as RIF counters, and therefore the
code resides in spectrum_router. Exclude the offload xstats events from the
mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the
glue code in the router module.

Previously, the order of dispatch was that for events on tunnels, a
dedicated handler was called, which however did not handle HW stats events.
But there is nothing special about tunnel devices as far as HW stats: there
is a RIF associated with the tunnel netdevice, and that RIF is where the
counter should be installed. Therefore now, HW stats events are tested
first, independent of netdevice type. The upshot is that as of this commit,
mlxsw supports L3 HW stats work on GRE tunnels.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
4f8afb680f mlxsw: spectrum: Move handling of VRF events to router code
Events involving VRF, as L3 concern, are handled in the router code, by the
helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked
from the centralized dispatcher in spectrum.c. Instead, move the call to
the newly-introduced router-specific notifier handler.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
0a27cb1692 mlxsw: spectrum_router: Add a dedicated notifier block
Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.

To simplify the notifier handlers, introduce a new notifier into the router
module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
Petr Machata
7cf0f96df1 mlxsw: spectrum: Tolerate enslaving of various devices to VRF
Enslaving netdevices to VRF is currently handled through an
mlxsw_sp_is_vrf_event() conditional in mlxsw_sp_netdevice_event(). In the
following patch sets, VRF enslavement will be handled purely in the router
code. Therefore make handlers of NETDEV_PRECHANGEUPPER tolerant of
enslaving to VRF, so that they do not bounce the change.

For NETDEV_CHANGEUPPER, drop the WARN_ON(1) and bounce from
mlxsw_sp_netdevice_port_vlan_event(). This is the only handler that warned
and bounces even in the CHANGEUPPER code, other handler quietly do nothing
when they encounter an unfamiliar upper.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:46:20 +01:00
David S. Miller
9f88af2252 Merge branch 'switch-drivers-to-netif_napi_add_weight'
Jakub Kicinski says:

====================
net: switch drivers to netif_napi_add_weight()

The minority of drivers pass a custom weight to netif_napi_add().
Switch those away to the new netif_napi_add_weight(). All drivers
(which can go thru net-next) calling netif_napi_add() will now
be calling it with NAPI_POLL_WEIGHT or 64.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
6f83cb8cbf net: wan: switch to netif_napi_add_weight()
A handful of WAN drivers use custom napi weights,
switch them to the new API.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
d484735dcf net: virtio: switch to netif_napi_add_weight()
virtio netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
8ded532cd1 r8152: switch to netif_napi_add_weight()
r8152 uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
b707b89f7b eth: switch to netif_napi_add_weight()
Switch all Ethernet drivers which use custom napi weights
to the new API.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
be8af67fab caif_virtio: switch to netif_napi_add_weight()
caif_virtio uses a custom napi weight, switch to the new
API for setting custom weights.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
4d92c62755 um: vector: switch to netif_napi_add_weight()
UM's netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:56 +01:00
Jakub Kicinski
8fc0b6992a Merge branch 'simplify-migration-of-host-filtered-addresses-in-felix-driver'
Vladimir Oltean says:

====================
Simplify migration of host filtered addresses in Felix driver

The purpose of this patch set is to remove the functions
dsa_port_walk_fdbs() and dsa_port_walk_mdbs() from the DSA core, which
were introduced when the Felix driver gained support for unicast
filtering on standalone ports. They get called when changing the tagging
protocol back and forth between "ocelot" and "ocelot-8021q".
I did not realize we could get away without having them.

The patch set was regression-tested using the local_termination.sh
selftest using both tagging protocols.
====================

Link: https://lore.kernel.org/r/20220505162213.307684-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 21:00:15 -07:00
Vladimir Oltean
fe5233b0ba net: dsa: delete dsa_port_walk_{fdbs,mdbs}
All the users of these functions are gone, delete them before they gain
new ones.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 21:00:12 -07:00
Vladimir Oltean
28de0f9fec net: dsa: felix: perform MDB migration based on ocelot->multicast list
The felix driver is the only user of dsa_port_walk_mdbs(), and there
isn't even a good reason for it, considering that the host MDB entries
are already saved by the ocelot switch lib in the ocelot->multicast list.

Rewrite the multicast entry migration procedure around the
ocelot->multicast list so we can delete dsa_port_walk_mdbs().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 21:00:12 -07:00
Vladimir Oltean
a51c1c3f32 net: dsa: felix: stop migrating FDBs back and forth on tag proto change
I just realized we don't need to migrate the host-filtered FDB entries
when the tagging protocol changes from "ocelot" to "ocelot-8021q".

Host-filtered addresses are learned towards the PGID_CPU "multicast"
port group, reserved by software, which contains BIT(ocelot->num_phys_ports).
That is the "special" port entry in the analyzer block for the CPU port
module.

In "ocelot" mode, the CPU port module's packets are redirected to the
NPI port.

In "ocelot-8021q" mode, felix_8021q_cpu_port_init() does something funny
anyway, and changes PGID_CPU to stop pointing at the CPU port module and
start pointing at the physical port where the DSA master is attached.

The fact that we can alter the destination of packets learned towards
PGID_CPU without altering the MAC table entries themselves means that it
is pointless to walk through the FDB entries, forget that they were
learned towards PGID_CPU, and re-learn them towards the "unicast" PGID
associated with the physical port connected to the DSA master. We can
let the PGID_CPU value change simply alter the destination of the
host-filtered unicast packets in one fell swoop.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 21:00:12 -07:00
Vladimir Oltean
2c110abc46 net: dsa: felix: use PGID_CPU for FDB entry migration on NPI port
ocelot_fdb_add() redirects FDB entries installed on the NPI port towards
the special reserved PGID_CPU used for host-filtered addresses. PGID_CPU
contains BIT(ocelot->num_phys_ports) in the destination port mask, which
is code name for the CPU port module.

Whereas felix_migrate_fdbs_to_*_port() uses the ocelot->num_phys_ports
PGID directly, and it appears that this works too. Even if this PGID is
set to zero, apparently its number is special and packets still reach
the CPU port module.

Nonetheless, in the end, these addresses end up in the same place
regardless of whether they go through an extra indirection layer or not.
Use PGID_CPU across to have more uniformity.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 21:00:11 -07:00
David Thompson
0a02e282ba mlxbf_gige: increase MDIO polling rate to 5us
This patch increases the polling rate used by the
mlxbf_gige driver on the MDIO bus.  The previous
polling rate was every 100us, and the new rate is
every 5us.  With this change the amount of time
spent waiting for the MDIO BUSY signal to de-assert
drops from ~100us to ~27us for each operation.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20220505162309.20050-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 15:54:20 -07:00
Jakub Kicinski
53e2cb3b2a Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-05-05

This series contains updates to ixgbe and igb drivers.

Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the
associated struct value for ixgbe.

Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for
igb.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: Convert kmap() to kmap_local_page()
  ixgbe: Fix module_param allow_unsupported_sfp type
====================

Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 15:39:28 -07:00
David S. Miller
95730d6570 Merge branch 'tso-gso-limit-split'
Jakub Kicinski says:

====================
net: disambiguate the TSO and GSO limits

This series separates the device-reported TSO limitations
from the user space-controlled GSO limits. It used to be that
we only had the former (HW limits) but they were named GSO.
This probably lead to confusion and letting user override them.

The problem came up in the BIG TCP discussion between Eric and
Alex, and seems like something we should address.

Targeting net-next because (a) nobody is reporting problems;
and (b) there is a tiny but non-zero chance that some actually
wants to lift the HW limitations.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
Jakub Kicinski
744d49daf8 net: move netif_set_gso_max helpers
These are now internal to the core, no need to expose them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
Jakub Kicinski
ee8b7a1156 net: make drivers set the TSO limit not the GSO limit
Drivers should call the TSO setting helper, GSO is controllable
by user space.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
Jakub Kicinski
14d7b8122f net: don't allow user space to lift the device limits
Up until commit 46e6b992c250 ("rtnetlink: allow GSO maximums to
be set on device creation") the gso_max_segs and gso_max_size
of a device were not controlled from user space.

The quoted commit added the ability to control them because of
the following setup:

 netns A  |  netns B
     veth<->veth   eth0

If eth0 has TSO limitations and user wants to efficiently forward
traffic between eth0 and the veths they should copy the TSO
limitations of eth0 onto the veths. This would happen automatically
for macvlans or ipvlan but veth users are not so lucky (given the
loose coupling).

Unfortunately the commit in question allowed users to also override
the limits on real HW devices.

It may be useful to control the max GSO size and someone may be using
that ability (not that I know of any user), so create a separate set
of knobs to reliably record the TSO limitations. Validate the user
requests.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
Jakub Kicinski
6df6398f7c net: add netif_inherit_tso_max()
To make later patches smaller create a helper for inheriting
the TSO limitations of a lower device. The TSO in the name
is not an accident, subsequent patches will replace GSO
with TSO in more names.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
David S. Miller
beb21e3e8e Merge branch 'nfp-flower-rework'
Simon Horman says:

====================
nfp: flower: decap neighbour table rework

Louis Peens says:

This patch series reworks the way in which flow rules that outputs to
OVS internal ports gets handled by the nfp driver.

Previously this made use of a small pre_tun_table, but this only used
destination MAC addresses, and made the implicit assumption that there is
only a single source MAC":"destination MAC" mapping per tunnel. In
hindsight this seems to be a pretty obvious oversight, but this was hidden
in plain sight for quite some time.

This series changes the implementation to make use of the same Neighbour
table for decap that is in use for the tunnel encap solution. It stores
any new Neighbour updates in this table. Previously this path was only
triggered for encapsulation candidates, and the entries were send and
forget, not saved on the host as it is after this series. It also keeps
track of any flow rule that outputs to OVS internal ports (and some
other criteria not worth mentioning here), very similar to how it was
done previously, except now these flows are kept track of in a list.

When a new Neighbour entry gets added this list gets iterated for
potential matches, in which case the table gets updated with a reference
to the flow, and the Neighbour entry on the card gets updated with the
relevant host_ctx. The same happens when a new qualifying flow gets
added - the Neighbour table gets iterated for applicable matches, and
once again the firmware gets updated with the host_ctx when any matches
are found.

Since this also requires a firmware change we add a new capability bit,
and keep the old behaviour in case of older firmware without this bit
set.

This series starts by doing some preparation, then adding the new list
and table entries. Next the functionality to link/unlink these entries
are added, and finally this new functionality is enabled by adding the
DECAP_V2 bit to the driver feature list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
a7da2a864a nfp: flower: enable decap_v2 bit
Finally enable the decap_v2 feature bit now that all the
other bits are in place to configure it correctly.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
c83a0fbe97 nfp: flower: remove unused neighbour cache
With the neighbour entries now stored in a dedicated table there
is no use to make use of the tunnel route cache anymore, so remove
this.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
591c90a1d0 nfp: flower: link pre_tun flow rules with neigh entries
Add helper functions that can create links between flow rules
and cached neighbour entries. Also add the relevant calls to
these functions.

* When a new neighbour entry gets added cycle through the saved
  pre_tun flow list and link any relevant matches. Update the
  neighbour table on the nfp with this new information.
* When a new pre_tun flow rule gets added iterate through the
  save neighbour entries and link any relevant matches. Once
  again update the nfp neighbour table with any new links.
* Do the inverse when deleting - remove any created links and
  also inform the nfp of this.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
f1df7956c1 nfp: flower: rework tunnel neighbour configuration
This patch updates the way in which the tunnel neighbour entries
are handled. Previously they were mostly send-and-forget, with
just the destination IP's cached in a list. This update changes
to a scheme where the neighbour entry information is stored in
a hash table.

The reason for this is that the neighbour table will now also
be used on the decapsulation path, whereas previously it was
only used for encapsulation. We need to save more of the neighbour
information in order to link them with flower flows in follow
up patches.

Updating of the neighbour table is now also handled by the same
function, instead of separate  *_write_neigh_vX functions.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
9ee7c42183 nfp: flower: update nfp_tun_neigh structs
Prepare for more rework in following patches by updating
the existing nfp_neigh_structs. The update allows for
the same headers to be used for both old and new firmware,
with a slight length adjustment when sending the control message
to the firmware.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
9d5447ed44 nfp: flower: fixup ipv6/ipv4 route lookup for neigh events
When a callback is received to invalidate a neighbour entry
there is no need to try and populate any other flow information.
Only the flowX->daddr information is needed as lookup key to delete
an entry from the NFP neighbour table. Fix this by only doing the
lookup if the callback is for a new entry.

As part of this cleanup remove the setting of flow6.flowi6_proto, as
this is not needed either, it looks to be a possible leftover from a
previous implementation.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
38fc158e17 nfp: flower: enforce more strict pre_tun checks
Make sure that the rule also matches on source MAC address. On top
of that also now save the src and dst MAC addresses similar to how
vlan_tci is saved - this will be used in later comparisons with
neighbour entries. Indicate if the flow matched on ipv4 or ipv6.
Populate the vlan_tpid field that got added to the pre_run_rule
struct as well.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:34 +01:00
Louis Peens
e30b2b68c1 nfp: flower: add/remove predt_list entries
Add calls to add and remove flows to the predt_table. This very simply
just allocates and add a new pretun entry if detected as such, and
removes it when encountered on a delete flow.

Compatibility for older firmware is kept in place through the
DECAP_V2 feature bit.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:33 +01:00
Louis Peens
29c691347e nfp: flower: add infrastructure for pre_tun rework
The previous implementation of using a pre_tun_table for decap has
some limitations, causing flows to end up unoffloaded when in fact
we are able to offload them. This is because the pre_tun_table does
not have enough matching resolution. The next step is to instead make
use of the neighbour table which already exists for the encap direction.
This patch prepares for this by:

- Moving nfp_tun_neigh/_v6 to main.h.
- Creating two new "wrapping" structures, one to keep track of neighbour
  entries (previously they were send-and-forget), and another to keep
  track of pre_tun flows.
- Create a new list in nfp_flower_priv to keep track of pre_tunnel flows
- Create a new table in nfp_flower_priv to keep track of next neighbour
  entries
- Initialising and destroying these new list/tables
- Extending nfp_fl_payload->pre_tun_rule to save more information for
  future use.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 11:21:33 +01:00
David S. Miller
76a8426959 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-05-05
This series contains updates to ice driver only.

Wan Jiabing converts an open coded min selection to min_t().

Maciej commonizes on a single find VSI function and removes the
duplicated implementation.

Wojciech adjusts the return value when exceeding ICE_MAX_CHAIN_WORDS to,
a more appropriate, -ENOSPC and allows for the error to be propagated.

Michal adds support for ndo_get_devlink_port().

Jake does some cleanup related to virtualization code. Mainly involving
function header comments and wording changes. NULL checks are added to
ice_get_vf_vsi() calls in order to prevent static analysis tools from
complaining that a NULL value could be dereferenced.
---
v2: Dropped patch 1: "ice: Add support for classid based queue selection"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 10:50:05 +01:00
Jakub Kicinski
949dfdcf34 Merge branch 'mptcp-improve-mptcp-level-window-tracking'
Mat Martineau says:

====================
mptcp: Improve MPTCP-level window tracking

This series improves MPTCP receive window compliance with RFC 8684 and
helps increase throughput on high-speed links. Note that patch 3 makes a
change in tcp_output.c

For the details, Paolo says:

I've been chasing bad/unstable performance with multiple subflows
on very high speed links.

It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:

- the rcv_wnd is not effectively shared on the tx side, as each
  subflow takes in account only the value received by the underlaying
  TCP connection. This is addressed in patch 1/5

- The mptcp-level offered wnd right edge is currently allowed to shrink.
  Reading section 3.3.4.:

"""
   The receive window is relative to the DATA_ACK.  As in TCP, a
   receiver MUST NOT shrink the right edge of the receive window (i.e.,
   DATA_ACK + receive window).  The receiver will use the data sequence
   number to tell if a packet should be accepted at the connection
   level.
"""

I read the above as we need to reflect window right-edge tracking
on the wire, see patch 4/5.

- The offered window right edge tracking can happen concurrently on
  multiple subflows, but there is no mutex protection. We need an
  additional atomic operation - still patch 4/5

This series additionally bumps a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).

I could not access again the host where the issue was so
noticeable, still in the current setup the tput changes from
[6-18] Gbps to 19Gbps very stable.
====================

Link: https://lore.kernel.org/r/20220504215408.349318-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:20 -07:00
Paolo Abeni
38acb6260f mptcp: add more offered MIBs counter
Track the exceptional handling of MPTCP-level offered window
with a few more counters for observability.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:16 -07:00
Paolo Abeni
f3589be0c4 mptcp: never shrink offered window
As per RFC, the offered MPTCP-level window should never shrink.
While we currently track the right edge, we don't enforce the
above constraint on the wire.
Additionally, concurrent xmit on different subflows can end-up in
erroneous right edge update.
Address the above explicitly updating the announced window and
protecting the update with an additional atomic operation (sic)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:15 -07:00
Paolo Abeni
ea66758c17 tcp: allow MPTCP to update the announced window
The MPTCP RFC requires that the MPTCP-level receive window's
right edge never moves backward. Currently the MPTCP code
enforces such constraint while tracking the right edge, but it
does not reflects it on the wire, as MPTCP lacks a suitable hook
to update accordingly the TCP header.

This change modifies the existing mptcp_write_options() hook,
providing the current packet's TCP header to the MPTCP protocol,
so that the next patch could implement the above mentioned
constraint.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:15 -07:00
Paolo Abeni
92be2f5227 mptcp: add mib for xmit window sharing
Bump a counter for counter when snd_wnd is shared among subflow,
for observability's sake.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:15 -07:00
Paolo Abeni
b713d00675 mptcp: really share subflow snd_wnd
As per RFC, mptcp subflows use a "shared" snd_wnd: the effective
window is the maximum among the current values received on all
subflows. Without such feature a data transfer using multiple
subflows could block.

Window sharing is currently implemented in the RX side:
__tcp_select_window uses the mptcp-level receive buffer to compute
the announced window.

That is not enough: the TCP stack will stick to the window size
received on the given subflow; we need to propagate the msk window
value on each subflow at xmit time.

Change the packet scheduler to ignore the subflow level window
and use instead the msk level one

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 19:00:14 -07:00
Andy Shevchenko
10b4a11fe7 firmware: tee_bnxt: Use UUID API for exporting the UUID
There is export_uuid() function which exports uuid_t to the u8 array.
Use it instead of open coding variant.

This allows to hide the uuid_t internals.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220504091407.70661-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 18:14:29 -07:00
David Ahern
c67b627e99 net: Make msg_zerocopy_alloc static
msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the
export and make static in skbuff.c

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 17:02:50 -07:00