1156206 Commits

Author SHA1 Message Date
Jiawen Wu
850b971110 net: libwx: Allocate Rx and Tx resources
Setup Rx and Tx descriptors for specefic rings.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:22:48 +00:00
Jiawen Wu
18b5b8a9f1 net: libwx: Configure Rx and Tx unit on hardware
Configure hardware for preparing to process packets. Including configure
receive and transmit unit of the MAC layer, and setup the specific rings.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:22:48 +00:00
Jiawen Wu
5d3ac705c2 net: txgbe: Add interrupt support
Determine proper interrupt scheme to enable and handle interrupt.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:22:48 +00:00
Mengyuan Lou
e7956139a6 net: ngbe: Add irqs request flow
Add request_irq for tx/rx rings and misc other events.
If the application is successful, config vertors for interrupts.
Enable some base interrupts mask in ngbe_irq_enable.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:22:48 +00:00
Mengyuan Lou
3f70318611 net: libwx: Add irq flow functions
Add irq flow functions for ngbe and txgbe.
Alloc pcie msix irqs for drivers, otherwise fall back to msi/legacy.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:22:48 +00:00
Qingfang DENG
542bcea4be net: page_pool: use in_softirq() instead
We use BH context only for synchronization, so we don't care if it's
actually serving softirq or not.

As a side node, in case of threaded NAPI, in_serving_softirq() will
return false because it's in process context with BH off, making
page_pool_recycle_in_cache() unreachable.

Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Tested-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:15:22 +00:00
David S. Miller
637bc8f09b mlx5-updates-2023-02-04
This series provides misc updates to mlx5 driver:
 
 1) Trivial LAG code cleanup patches from Roi
 
 2) Rahul improves mlx5's documentation structure
 Separates the documentation into multiple pages related to different
 components in the device driver. Adds Kconfig parameters, devlink
 parameters, and tracepoints that were previously introduced but not added
 to the documentation. Introduces a new page on ethtool statistics counters
 with information about counters previously implemented in the mlx5_core
 driver but not documented in the kernel tree.
 
 3) From Raed, policy/state selector support for IPSec.
 
 4) From Fragos, add support for XDR speed in IPoIB mlx5 netdev
 
 5) Few more misc cleanups and trivial changes
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmPeLkgACgkQSD+KveBX
 +j4VKwgArkTC19UUa/w3wFFDDgtm5b3InQPuflr2lZwBVBlERLJYYegftUbc1KmP
 aqBS5FPpWGIWjH8yLOaDkPrDdsIUd8bGZdWfHu4jJJ9NYdbXdjiObyvuXo44QgC2
 cp4/EAHknHhCL3TuKxi2W//fbOhRiUbMpPRHKFKwh65jre/vlbryfSVcmL6BvlI5
 n3bAIlS/SvjDdcizdsUr66fEih1sNe+0nnthce8Pq/Q8iaf+3iFzpVU61NEYz+qn
 J21/ToUWHmytUQ4Odm45AuTBhjp4LHWLp1u0qumLmi1HY3DQ6VNwpIHYNbBx1eAN
 Izl6RfRYKqRVeSBsp8yWYiXIs8zqKQ==
 =h+01
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-02-04

This series provides misc updates to mlx5 driver:

1) Trivial LAG code cleanup patches from Roi

2) Rahul improves mlx5's documentation structure
Separates the documentation into multiple pages related to different
components in the device driver. Adds Kconfig parameters, devlink
parameters, and tracepoints that were previously introduced but not added
to the documentation. Introduces a new page on ethtool statistics counters
with information about counters previously implemented in the mlx5_core
driver but not documented in the kernel tree.

3) From Raed, policy/state selector support for IPSec.

4) From Fragos, add support for XDR speed in IPoIB mlx5 netdev

5) Few more misc cleanups and trivial changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 09:09:23 +00:00
Parav Pandit
27369c9c2b virtio-net: Maintain reverse cleanup order
To easily audit the code, better to keep the device stop()
sequence to be mirror of the device open() sequence.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:55:06 +00:00
David S. Miller
cb3086cee6 Merge branch 'bridge-mdb-limit'
Petr Machata says:

====================
bridge: Limit number of MDB entries per port, port-vlan

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN number of MDB entries that the port is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

Per-port number of entries keeps track of the total number of MDB entries
configured on a given port. The per-port-VLAN value then keeps track of the
subset of MDB entries configured specifically for the given VLAN, on that
port. The number is adjusted as port_groups are created and deleted, and
therefore under multicast lock.

A maximum value, if non-zero, then places a limit on the number of entries
that can be configured in a given context. Attempts to add entries above
the maximum are rejected.

Rejection reason of netlink-based requests to add MDB entries is
communicated through extack. This channel is unavailable for rejections
triggered from the control path. To address this lack of visibility, the
patchset adds a tracepoint, bridge:br_mdb_full:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10

Another option to consume the tracepoint is e.g. through the bpftrace tool:

	# bpftrace -e ' tracepoint:bridge:br_mdb_full /args->af != 0/ {
			    printf("dev %s src %s grp %s vid %u\n",
				   str(args->dev), ntop(args->src),
				   ntop(args->grp), args->vid);
			}
			tracepoint:bridge:br_mdb_full /args->af == 0/ {
			    printf("dev %s grp %s vid %u\n",
				   str(args->dev),
				   macaddr(args->grpmac), args->vid);
			}'

This tracepoint is triggered for mcast_hash_max exhaustions as well.

The following is an example of how the feature is used. A more extensive
example is available in patch #8:

	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
	Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.

The patchset progresses as follows:

- In patch #1, set strict_start_type at two bridge-related policies. The
  reason is we are adding a new attribute to one of these, and want the new
  attribute to be parsed strictly. The other was adjusted for completeness'
  sake.

- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
  following additions smoother.

- In patch #6, add the tracepoint.

- In patch #7, the code to maintain number of MDB entries is added as
  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
  there is no way to set the value yet, and since 0 is treated as "no
  limit", the functionality doesn't change at this point. Note however,
  that mcast_hash_max violations already do trigger at this point.

- In patch #8, netlink plumbing is added: reading of number of entries, and
  reading and writing of maximum.

  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.

  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
  BRIDGE_VLANDB_ENTRY.

The following patches deal with the selftest:

- Patches #9 and #10 clean up and move around some selftest code.

- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
  support to allow generating packets with configurable group addresses and
  varying source lists for (S,G) memberships.

- Patch #15 adds code to generate IGMP leave and MLD done packets.

- Patch #16 finally adds the selftest itself.

v3:
- Patch #7:
    - Access mdb_max_/_n_entries through READ_/WRITE_ONCE
    - Move extack setting to br_multicast_port_ngroups_inc_one().
      Since we use NL_SET_ERR_MSG_FMT_MOD, the correct context
      (port / port-vlan) can be passed through an argument.
      This also removes the need for more READ/WRITE_ONCE's
      at the extack-setting site.
- Patch #8:
    - Move the br_multicast_port_ctx_vlan_disabled() check
      out to the _vlan_ helpers callers. Thus these helpers
      cannot fail, which makes them very similar to the
      _port_ helpers. Have them take the MC context directly
      and unify them.

v2:
- Cover letter:
    - Add an example of a bpftrace-based probe script
- Patch #6:
    - Report IPv4 as an IPv6-mapped address through the IPv6 buffer
      as well, to save ring buffer space.
- Patch #7:
    - In br_multicast_port_ngroups_inc_one(), bounce
      if n>=max, not if n==max
    - Adjust extack messages to mention ngroups, now
      that the bounces appear when n>=max, not n==max
    - In __br_multicast_enable_port_ctx(), do not reset
      max to 0. Also do not count number of entries by
      going through _inc, as that would end up incorrectly
      bouncing the entries.
- Patch #8:
    - Drop locks around accesses in
      br_multicast_{port,vlan}_ngroups_{get,set_max}(),
    - Drop bounces due to max<n in
      br_multicast_{port,vlan}_ngroups_set_max().
- Patch #12:
    - In the comment at payload_template_calc_checksum(),
      s/%#02x/%02x/, that's the mausezahn payload format.
- Patch #16:
    - Adjust the tests that check setting max below n and
      reset of max on VLAN snooping enablement
    - Make test naming uniform
    - Enable testing of control path (IGMP/MLD) in
      mcast_vlan_snooping bridge
    - Reorganize the code so that test instances (per bridge
      type and configuration type) always come right after
      the test, in order of {d,q,qvs}{4,6}{cfg,ctl}.
      Then groups of selftests are at the end of the file.
      Similarly adjust invocation order of the tests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:27 +00:00
Petr Machata
3446dcd7df selftests: forwarding: bridge_mdb_max: Add a new selftest
Add a suite covering mcast_n_groups and mcast_max_groups bridge features.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:27 +00:00
Petr Machata
9ae8546973 selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets
The testsuite that checks for mcast_max_groups functionality will need to
wipe the added groups as well. Add helpers to build an IGMP or MLD packets
announcing that host is leaving a given group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:27 +00:00
Petr Machata
705d4bc7b6 selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2
The testsuite that checks for mcast_max_groups functionality will need
to generate IGMP and MLD packets with configurable number of (S,G)
addresses. To that end, further extend igmpv3_is_in_get() and
mldv2_is_in_get() to allow a list of IP addresses instead of one
address.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:27 +00:00
Petr Machata
506a1ac9d3 selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation
In order to generate IGMPv3 and MLDv2 packets on the fly, the
functions that generate these packets need to be able to generate
packets for different groups and different sources. Generating MLDv2
packets further needs the source address of the packet for purposes of
checksum calculation. Add the necessary parameters, and generate the
payload accordingly by dispatching to helpers added in the previous
patches.

Adjust the sole client, bridge_mdb.sh, as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
952e0ee38c selftests: forwarding: lib: Add helpers for checksum handling
In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to calculate the packet checksum.

The approach presented in this patch revolves around payload templates
for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
with possibly one 2-byte sequence replaced with the word PAYLOAD. The
main function is payload_template_calc_checksum(), which calculates
RFC 1071 checksum of the message. There are further helpers to then
convert the checksum to the payload format, and to expand it.

For IPv6, MLDv2 message checksum is computed using a pseudoheader that
differs from the header used in the payload itself. The fact that the
two messages are different means that the checksum needs to be
returned as a separate quantity, instead of being expanded in-place in
the payload itself. Furthermore, the pseudoheader includes a length of
the message. Much like the checksum, this needs to be expanded in
mausezahn format. And likewise for number of addresses for (S,G)
entries. Thus we have several places where a computed quantity needs
to be presented in the payload format. Add a helper u16_to_bytes(),
which will be used in all these cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
fcf4927632 selftests: forwarding: lib: Add helpers for IP address handling
In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to expand IPv4 and IPv6 addresses given as parameters in
mausezahn payload notation. Add helpers that do it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
f7ccf60c4a selftests: forwarding: bridge_mdb: Fix a typo
Add the letter missing from the word "INCLUDE".

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
344dd2c9e7 selftests: forwarding: Move IGMP- and MLD-related functions to lib
These functions will be helpful for other testsuites as well. Extract them
to a common place.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
a1aee20d5d net: bridge: Add netlink knobs for number / maximum MDB entries
The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values stay within
configured bounds. However it didn't provide means to actually configure
those bounds or read the occupancy. This patch does that.

Two new netlink attributes are added for the MDB occupancy:
IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
And another two for the maximum number of MDB entries:
IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.

Note that the two new IFLA_BRPORT_ attributes prompt bumping of
RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.

The new attributes are used like this:

 # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
                                      mcast_vlan_snooping 1 mcast_querier 1
 # ip link set dev v1 master br
 # bridge vlan add dev v1 vid 2

 # bridge vlan set dev v1 vid 1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.

 # bridge link set dev v1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1.

 # bridge -d link show
 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
     [...] mcast_n_groups 1 mcast_max_groups 1

 # bridge -d vlan show
 port              vlan-id
 br                1 PVID Egress Untagged
                     state forwarding mcast_router 1
 v1                1 PVID Egress Untagged
                     [...] mcast_n_groups 1 mcast_max_groups 1
                   2
                     [...] mcast_n_groups 0 mcast_max_groups 0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
b57e8d870d net: bridge: Maintain number of MDB entries in net_bridge_mcast_port
The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
  is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:26 +00:00
Petr Machata
d47230a348 net: bridge: Add a tracepoint for MDB overflows
The following patch will add two more maximum MDB allowances to the global
one, mcast_hash_max, that exists today. In all these cases, attempts to add
MDB entries above the configured maximums through netlink, fail noisily and
obviously. Such visibility is missing when adding entries through the
control plane traffic, by IGMP or MLD packets.

To improve visibility in those cases, add a trace point that reports the
violation, including the relevant netdevice (be it a slave or the bridge
itself), and the MDB entry parameters:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10

CC: Steven Rostedt <rostedt@goodmis.org>
CC: linux-trace-kernel@vger.kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
Petr Machata
eceb30854f net: bridge: Change a cleanup in br_multicast_new_port_group() to goto
This function is getting more to clean up in the following patches.
Structuring the cleanups in one labeled block will allow reusing the same
cleanup from several places.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
Petr Machata
976b3858dd net: bridge: Add br_multicast_del_port_group()
Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g() inlines the corresponding code. In the following
patches, number of per-port and per-port-VLAN MDB entries is going to be
maintained, and that counter will have to be updated. Because that logic
is going to be hidden in the br_multicast module, introduce a new hook
intended to again remove a newly-created group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
Petr Machata
1c85b80b20 net: bridge: Move extack-setting to br_multicast_new_port_group()
Now that br_multicast_new_port_group() takes an extack argument, move
setting the extack there. The downside is that the error messages end
up being less specific (the function cannot distinguish between (S,G)
and (*,G) groups). However, the alternative is to check in the caller
whether the callee set the extack, and if it didn't, set it. But that
is only done when the callee is not exactly known. (E.g. in case of a
notifier invocation.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
Petr Machata
60977a0c63 net: bridge: Add extack to br_multicast_new_port_group()
Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB maximums, and will use the extack to communicate the reason for
the bounce.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
Petr Machata
c00041cf1c net: bridge: Set strict_start_type at two policies
Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
parsed strictly, to prevent userspace from passing garbage. Note that this
patchset only touches the former policy. The latter was adjusted for
completeness' sake. There do not appear to be other _deprecated calls
with non-NULL policies.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:48:25 +00:00
David S. Miller
8b7018fa71 Merge branch 'sparx5-PSFP-support'
Daniel Machon says:

====================
net: Add support for PSFP in Sparx5

================================================================================
Add support for Per-Stream Filtering and Policing (802.1Q-2018, 8.6.5.1).
================================================================================

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to streams.

Flow meters are also classified by ISDX, and implemented using service
policers (Service Dual Leacky Buckets, SDLB). Leacky buckets are linked
together in a leak chain of a leak group. Leak groups a preconfigured to serve
buckets within a certain rate interval.

Stream gates are time-based policers used by PSFP. Frames are dropped
based on the gate state (OPEN/ CLOSE), whose state will be altered based
on the Gate Control List (GCL) and current PTP time. Apart from
time-based policing, stream gates can alter egress queue selection for
the frames that pass through the Gate. This is done through Internal
Priority Selector (IPS). Stream gates are mapped from stream filters.

Support for tc actions gate and police, have been added to the VCAP IS0 set of
supported actions.

Examples:

// tc filter with gate action
$ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1001 protocol \
802.1q flower skip_sw vlan_id 100 action gate base-time 0 sched-entry open \
700000 7 8m sched-entry close 300000 action goto chain 1200000

// tc filter with police action
$ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1002 protocol \
802.1q flower skip_sw vlan_id 100 action police rate 1gbit burst 8096      \
conform-exceed drop action goto chain 1200000

================================================================================
Patches
================================================================================
Patch #1:  Adds new register needed for PSFP.
Patch #2:  Adds resource pools to control PSFP needed chip resources.
Patch #3:  Adds support for SDLB's needed for flow-meters.
Patch #4:  Adds support for service policers.
Patch #5:  Adds support for PSFP flow-meters, using service policers.
Patch #6:  Adds a new function to calculate basetime, required by flow-meters.
Patch #7:  Adds support for PSFP stream gates.
Patch #8:  Adds support for PSFP stream filters.
Patch #9:  Adds a function to initialize flow-meters, stream gates and stream
           filters.
Patch #10: Adds the required flower code to configure PSFP using the tc command.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
6ebf182bfd sparx5: add support for configuring PSFP via tc
Add support for tc actions gate and police, in order to implement
support for configuring PSFP through tc.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
e116b19db2 net: microchip: sparx5: initialize PSFP
Initialize the SDLB's, stream gates and stream filters.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
ae3e691f34 net: microchip: sparx5: add support for PSFP stream filters
Add support for configuring PSFP stream filters (IEEE 802.1Q-2018,
8.6.5.1.1).

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to streams.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
c70a5e2c3d net: microchip: sparx5: add support for PSFP stream gates
Add support for configuring PSFP stream gates (IEEE 802.1Q-2018,
8.6.5.1.2).

Stream gates are time-based policers used by PSFP. Frames are dropped
based on the gate state (OPEN/ CLOSE), whose state will be altered based
on the Gate Control List (GCL) and current PTP time. Apart from
time-based policing, stream gates can alter egress queue selection for
the frames that pass through the Gate. This is done through Internal
Priority Selector (IPS). Stream gates are mapped from stream filters.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
9e02131ec2 net: microchip: sparx5: add function for calculating PTP basetime
Add a new function for calculating PTP basetime, required by the stream
gate scheduler to calculate gate state (open / close).

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:26 +00:00
Daniel Machon
d2185e79ba net: microchip: sparx5: add support for PSFP flow-meters
Add support for configuring PSFP flow-meters (IEEE 802.1Q-2018,
8.6.5.1.3).

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to flow-meters. SDLB's provide the flow-meter parameters.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:25 +00:00
Daniel Machon
1db82abf19 net: microchip: sparx5: add support for service policers
Add initial API for configuring policers. This patch add support for
service policers.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:25 +00:00
Daniel Machon
9bf5088989 net: microchip: sparx5: add support for Service Dual Leacky Buckets
Add support for Service Dual Leacky Buckets (SDLB), used to implement
PSFP flow-meters. Buckets are linked together in a leak chain of a leak
group. Leak groups a preconfigured to serve buckets within a certain
rate interval.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:25 +00:00
Daniel Machon
bb535c0dbb net: microchip: sparx5: add resource pools
Add resource pools and accessor functions. These pools can be queried by
the driver, whenever a finite resource is required. Some resources can
be reused, in which case an index and a reference count is used to keep
track of users.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:25 +00:00
Daniel Machon
edad83e2ba net: microchip: add registers needed for PSFP
Add registers needed for PSFP. This patch also renames a single
register, shortening its name (SYS_CLK_PER_100PS). Uses have been update
accordingly.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06 08:26:25 +00:00
Maxim Mikityanskiy
79efecb41f net/mlx5e: Trigger NAPI after activating an SQ
If an SQ is deactivated and reactivated again, some packets could be
sent after MLX5E_SQ_STATE_ENABLED is cleared, but before
netif_tx_stop_queue, meaning that NAPI might miss some completions. In
order to handle them, make sure to trigger NAPI after SQ activation in
all cases where it can be relevant. Regular SQs, XDP SQs and XSK SQs are
good. Missing cases added: after recovery, after activating HTB SQs and
after activating PTP SQs.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:04 -08:00
Raed Salem
a7385187a3 net/mlx5e: IPsec, support upper protocol selector field offload
Add support to policy/state upper protocol selector field offload,
this will enable to select traffic for IPsec operation based on l4
protocol (TCP/UDP) with specific source/destination port.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:04 -08:00
Dragos Tatulea
ce231772da net/mlx5e: IPoIB, Add support for XDR speed
Add XDR IB PTYS coding and XDR speed 200Gbps.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:04 -08:00
Jack Morgenstein
7eef93003e net/mlx5: Enhance debug print in page allocation failure
Provide more details to aid debugging.

Fixes: bf0bf77f6519 ("mlx5: Support communicating arbitrary host page size to firmware")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Majd Dibbiny <majd@nvidia.com>
Signed-off-by: Jack Morgenstein <jackm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
b63636b6c1 net/mlx5: Add firmware support for MTUTC scaled_ppm frequency adjustments
When device is capable of handling scaled ppm values for adjusting
frequency, conversion to ppb will not be done by the driver. Instead, the
scaled ppm value will be passed directly to the device for the frequency
adjustment operation.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
04937a0f68 net/mlx5: Document support for RoCE HCA disablement capability
Some mlx5 devices are capable of disabling RoCE. In this situation,
disablement does not need to be handled at the driver level.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
8ce3b586fa net/mlx5: Add counter information to mlx5 driver documentation
Update rst file to contain general information about statistics counters
for the mlx5 driver. Add specifics about individual counters in list
tables.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
e12ebbf0cc net/mlx5: Document previously implemented mlx5 tracepoints
Tracepoints were previously implemented but not documented till this patch
series.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
a12ba19269 net/mlx5: Update Kconfig parameter documentation
Provide information for Kconfig flags defined but not documented till this
patch series for the mlx5 driver.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Rahul Rameshbabu
f2d51e5793 net/mlx5: Separate mlx5 driver documentation into multiple pages
The mlx5 device driver documentation page has grown in size and should be
split into multiple subpages. This change also contains a table of contents
for these new subpages.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:02 -08:00
Roi Dayan
199abf33f4 net/mlx5: Lag, Move mpesw related definitions to mpesw.h
mpesw definitions should be in mpesw.h and not lag.h.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:02 -08:00
Mark Bloch
6a80313d24 net/mlx5: Lag, Use flag to check for shared FDB mode
It's redundant and incorrect to check lag is also sriov mode.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:02 -08:00
Roi Dayan
b399b066e2 net/mlx5: Lag, Remove redundant bool allocation on the stack
There is no need to allocate the bool variable and can just return the value.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:02 -08:00
Roi Dayan
9a49a64ea7 net/mlx5: Lag, Use mlx5_lag_dev() instead of derefering pointers
Use the existing wrapper mlx5_lag_dev() to access the lag object from
dev for better maintainability and consistent code.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:02 -08:00