linux/include
David S. Miller 2027e13f62 mlx5-updates-2021-06-09
Introduce steering header insert/remove and switchdev bridge offloads
 
 1) From Yevgeny, Steering header insert/remove support
 
 ConnectX supports offloading of various encapsulations and decapsulations
 (e.g. VXLAN), which are performed by 'Packet Reformat' action.
 Starting with ConnectX-6 DX, a new reformat type is supported - INSERT_HEADER.
 This reformat allows inserting an arbitrary size buffer at a selected location
 in the packet on RX flows.
 
 The insert/remove header support are needed as a prerequisite for the
 bridge offloads vlan pop/push supprt, see below.
 
 2) From Vlad, Support for bridge offloads for switchdev mode
 
 This change implements bridge offloads with VLAN-support that works on top
 of mlx5 representors in switchdev mode.
 
 HIGH-LEVEL OVERVIEW
 
 Hardware supported by mlx5 driver doesn't provide dynamic learning or aging
 functionality and requires the driver to emulate all switch-like behavior
 in software. As such, all packets by default go through miss path, appear
 on representor and get to software bridge, if it is the upper device of the
 representor. This causes bridge to process packet in software, learn the
 MAC address to FDB and send SWITCHDEV_FDB_ADD_TO_DEVICE event to all
 subscribers. Upon reception of SWITCHDEV_FDB_ADD_TO_DEVICE notification
 mlx5 bridge offloads the FDB to hardware and sends back
 SWITCHDEV_FDB_ADD_TO_BRIDGE notification to prevent such entries from being
 aged out by kernel bridge. Leaving aging to kernel bridge would result
 deletion of offloaded dynamic FDB entries every aging_time period due to
 packets being processed by hardware and, consecutively, 'used' timestamp
 for FDB entry not being updated. Hardware aging is emulated in driver by
 running periodic workqueue task that manually updates the rules according
 to their hardware counter:
 
 - If hardware counter has changed since last update, the handler updates
 'used' timestamp in kernel bridge dynamic entry by sending
 SWITCHDEV_FDB_ADD_TO_BRIDGE notification for the entry.
 
 - If FDB entry wasn't updated for user-controllable aging_time period,
 then the FDB entry is unoffloaded from hardware and corresponding
 SWITCHDEV_FDB_DEL_TO_BRIDGE notification is sent to kernel bridge.
 
 The mlx5 bridge offload implementation fully supports port VLAN objects,
 including PVID (vlan push) and "Egress Untagged" (vlan pop).
 
 SOFTWARE ARCHITECTURE
 
 Mlx5_eswitch is extended with pointer to new mlx5_esw_bridge_offloads
 structure which has a linked list of mlx5_esw_bridge objects. Struct
 mlx5_esw_bridge is the main switch object in mlx5 that holds all data for
 offloaded FDB entries and metadata for bridge ports and their vlans. The
 mlx5_esw_bridge object is created when first representor of eswitch vport
 is added to bridge and deleted when the last representor is detached from
 it. Bridge FDB entries are saved in linked list (to iterate over all FDB
 entries in aging workqueue task) and also in hashtable for quick lookup by
 MAC+VLAN tuple. Bridge FDB entries are saved in linked list (to iterate
 over all FDB entries in aging workqueue task) and in hashtable for quick
 lookup by MAC+VLAN tuple. Port metadata is stored in struct
 mlx5_esw_bridge_port that is saved in xarray to allow quick lookup by vport
 number. Part of the port metadata is the set of port vlans that are
 represented by mlx5_esw_bridge_vlan structure. The vlan structure points to
 all FDBs on vlan/port via fdb_list linked list.
 
 Simplified diagram of mlx5 bridge objects:
 
                       +------------------+
                       |  mxl5_eswitch    |
                       |                  |
                       |  br_offloads     |
                       +--------+---------+
                                |
                       +--------v-------------------+
                       |  mlx5_esw_bridge_offloads  |
                       |                            |
                    +-->  bridges                   |
                    |  +-------+--------------------+
                    |          |
                    |          |
                    |      +---v---------------+
                    |      | mlx5_esw_bridge   |
                    |      |                   |
                    |      | vports            |
                    |      |                   |
                    |      | fdb_ht            |
                    |      +---+---------------+
                    |          |
                    |      +---v---------------+
                    +------+ mlx5_esw_bridge   |
                           |                   |
 +-------------------------+ vports            |
 |                         |                   |
 |                         | fdb_ht            +------------------------------------------+
 |                         +-------------------+                                          |
 |                                                                                        |
 |                                                                                        |
 | +----------------------+                                 +---------------------------+ |
 +-> mlx5_esw_bridge_port |                              +--> mlx5_esw_bridge_fdb_entry <-+
 | |                      |    +----------------------+  |  +--+------------------------+ |
 | | vlans                +--+-> mlx5_esw_bridge_vlan |  |     |                          |
 | |                      |  | |                      |  |  +--v------------------------+ |
 | +----------------------+  | | fdb_list             +--+  | mlx5_esw_bridge_fdb_entry <-+
 |                           | +-------^--------------+     +--+------------------------+ |
 | +----------------------+  |         |                       |                          |
 +-> mlx5_esw_bridge_port |  |         +-----------------------+                          |
   |                      |  |                                                            |
   | vlans                |  | -----------------------+                                   |
   |                      |  +-> mlx5_esw_bridge_vlan |                                   |
   +----------------------+    |                      |     +---------------------------+ |
                               | fdb_list             +-----> mlx5_esw_bridge_fdb_entry <-+
                               +-------^--------------+     +--+------------------------+
                                       |                       |
                                       +-----------------------+
 
 HARDWARE REPRESENTATION
 
 In order to adhere to kernel software datapath model bridge offloads must
 come after TC and NF FDBs. However, since netfilter offload in mlx5 is
 implemented with unmanaged tables, its miss path is not automatically
 connected to next priority and requires the code to manually connect with
 slow table. To keep bridge offloads encapsulated and not mix it with
 eswitch offloads new FDB_TC_MISS priority is created between FDB_FT_OFFLOAD
 and FDB_SLOW_PATH which allows bridge offloads to be created without
 exposing its internal tables to any other modules since miss path of
 managed TC-miss table is automatically wired to next priority.
 
 The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
 namespace. The new priority is between tc-miss and slow path priorities.
 Priority consist of two levels: the ingress table that is global per
 eswitch and matches incoming packets by src_mac/vid and redirects them to
 next level (egress table) that is chosen according to ingress port bridge
 membership and matches on dst_mac/vid in order to redirect packet to vport
 according to the following diagram:
 
                 +
                 |
       +---------v----------+
       |                    |
       |   FDB_TC_OFFLOAD   |
       |                    |
       +---------+----------+
                 |
                 |
       +---------v----------+
       |                    |
       |   FDB_FT_OFFLOAD   |
       |                    |
       +---------+----------+
                 |
                 |
       +---------v----------+
       |                    |
       |    FDB_TC_MISS     |
       |                    |
       +---------+----------+
                 |
 +--------------------------------------+
 |               |                      |
 |        +------+                      |
 |        |                             |
 | +------v--------+   FDB_BR_OFFLOAD   |
 | | INGRESS_TABLE |                    |
 | +------+---+----+                    |
 |        |   |      match              |
 |        |   +---------+               |
 |        |             |               |    +-------+
 |        |     +-------v-------+ match |    |       |
 |        |     | EGRESS_TABLE  +------------> vport |
 |        |     +-------+-------+       |    |       |
 |        |             |               |    +-------+
 |        |    miss     |               |
 |        +------+------+               |
 |               |                      |
 +--------------------------------------+
                 |
                 |
       +---------v----------+
       |                    |
       |   FDB_SLOW_PATH    |
       |                    |
       +---------+----------+
                 |
                 v
 
 PATCHES OVERVIEW
 
 1-3 - Miscellaneous refactorings and infrastructure changes.
 
 4 - Mlx5 bridge offload infrastructure and dedicated fs_core
 namespace/tables implementation.
 
 5 - FDB entry offload.
 
 6 - Dynamic FDB entry aging.
 
 7-10 - VLAN filtering offload.
 
 11 - Tracepoints for main mlx5 bridge offload events (FDB entry
 offload/unoffload, VLAN add/delete, etc.)
 
 --
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDBbIwACgkQSD+KveBX
 +j6BzwgAs7zTxCwsqYC+Zw77p0C+UwEpoq9e8aARkZXY9PExQi7SHG2LswN1JX3C
 MPf1nczNnos9D+P9VgbUWJP/3agtdYFbTu03toOl1W6pPRY7MVqrV14twT1zP7zA
 xDqSZvYJ1jZKNVsITzdwWh0u7PDrxKpYefaKYe7b3ghNbAOqCEReF61zMTg4pu4c
 LUkLx2f+diaQHY6TyQnUAMMH5O3j0bDF8JUbQK0ZX1+a1guP99t1zZKY35aBB1uQ
 GcwUSGEaThU71O8whOx4kaIjLyk2kNM4rP1WxZo8V9gFu81/FJ5XNISdd7XWjOsI
 z2Qf2Zu8xqXyjRF8cA5n0OIcK2UDrQ==
 =oe1K
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-06-09

Introduce steering header insert/remove and switchdev bridge offloads

1) From Yevgeny, Steering header insert/remove support

ConnectX supports offloading of various encapsulations and decapsulations
(e.g. VXLAN), which are performed by 'Packet Reformat' action.
Starting with ConnectX-6 DX, a new reformat type is supported - INSERT_HEADER.
This reformat allows inserting an arbitrary size buffer at a selected location
in the packet on RX flows.

The insert/remove header support are needed as a prerequisite for the
bridge offloads vlan pop/push supprt, see below.

2) From Vlad, Support for bridge offloads for switchdev mode

This change implements bridge offloads with VLAN-support that works on top
of mlx5 representors in switchdev mode.

HIGH-LEVEL OVERVIEW

Hardware supported by mlx5 driver doesn't provide dynamic learning or aging
functionality and requires the driver to emulate all switch-like behavior
in software. As such, all packets by default go through miss path, appear
on representor and get to software bridge, if it is the upper device of the
representor. This causes bridge to process packet in software, learn the
MAC address to FDB and send SWITCHDEV_FDB_ADD_TO_DEVICE event to all
subscribers. Upon reception of SWITCHDEV_FDB_ADD_TO_DEVICE notification
mlx5 bridge offloads the FDB to hardware and sends back
SWITCHDEV_FDB_ADD_TO_BRIDGE notification to prevent such entries from being
aged out by kernel bridge. Leaving aging to kernel bridge would result
deletion of offloaded dynamic FDB entries every aging_time period due to
packets being processed by hardware and, consecutively, 'used' timestamp
for FDB entry not being updated. Hardware aging is emulated in driver by
running periodic workqueue task that manually updates the rules according
to their hardware counter:

- If hardware counter has changed since last update, the handler updates
'used' timestamp in kernel bridge dynamic entry by sending
SWITCHDEV_FDB_ADD_TO_BRIDGE notification for the entry.

- If FDB entry wasn't updated for user-controllable aging_time period,
then the FDB entry is unoffloaded from hardware and corresponding
SWITCHDEV_FDB_DEL_TO_BRIDGE notification is sent to kernel bridge.

The mlx5 bridge offload implementation fully supports port VLAN objects,
including PVID (vlan push) and "Egress Untagged" (vlan pop).

SOFTWARE ARCHITECTURE

Mlx5_eswitch is extended with pointer to new mlx5_esw_bridge_offloads
structure which has a linked list of mlx5_esw_bridge objects. Struct
mlx5_esw_bridge is the main switch object in mlx5 that holds all data for
offloaded FDB entries and metadata for bridge ports and their vlans. The
mlx5_esw_bridge object is created when first representor of eswitch vport
is added to bridge and deleted when the last representor is detached from
it. Bridge FDB entries are saved in linked list (to iterate over all FDB
entries in aging workqueue task) and also in hashtable for quick lookup by
MAC+VLAN tuple. Bridge FDB entries are saved in linked list (to iterate
over all FDB entries in aging workqueue task) and in hashtable for quick
lookup by MAC+VLAN tuple. Port metadata is stored in struct
mlx5_esw_bridge_port that is saved in xarray to allow quick lookup by vport
number. Part of the port metadata is the set of port vlans that are
represented by mlx5_esw_bridge_vlan structure. The vlan structure points to
all FDBs on vlan/port via fdb_list linked list.

Simplified diagram of mlx5 bridge objects:

                      +------------------+
                      |  mxl5_eswitch    |
                      |                  |
                      |  br_offloads     |
                      +--------+---------+
                               |
                      +--------v-------------------+
                      |  mlx5_esw_bridge_offloads  |
                      |                            |
                   +-->  bridges                   |
                   |  +-------+--------------------+
                   |          |
                   |          |
                   |      +---v---------------+
                   |      | mlx5_esw_bridge   |
                   |      |                   |
                   |      | vports            |
                   |      |                   |
                   |      | fdb_ht            |
                   |      +---+---------------+
                   |          |
                   |      +---v---------------+
                   +------+ mlx5_esw_bridge   |
                          |                   |
+-------------------------+ vports            |
|                         |                   |
|                         | fdb_ht            +------------------------------------------+
|                         +-------------------+                                          |
|                                                                                        |
|                                                                                        |
| +----------------------+                                 +---------------------------+ |
+-> mlx5_esw_bridge_port |                              +--> mlx5_esw_bridge_fdb_entry <-+
| |                      |    +----------------------+  |  +--+------------------------+ |
| | vlans                +--+-> mlx5_esw_bridge_vlan |  |     |                          |
| |                      |  | |                      |  |  +--v------------------------+ |
| +----------------------+  | | fdb_list             +--+  | mlx5_esw_bridge_fdb_entry <-+
|                           | +-------^--------------+     +--+------------------------+ |
| +----------------------+  |         |                       |                          |
+-> mlx5_esw_bridge_port |  |         +-----------------------+                          |
  |                      |  |                                                            |
  | vlans                |  | -----------------------+                                   |
  |                      |  +-> mlx5_esw_bridge_vlan |                                   |
  +----------------------+    |                      |     +---------------------------+ |
                              | fdb_list             +-----> mlx5_esw_bridge_fdb_entry <-+
                              +-------^--------------+     +--+------------------------+
                                      |                       |
                                      +-----------------------+

HARDWARE REPRESENTATION

In order to adhere to kernel software datapath model bridge offloads must
come after TC and NF FDBs. However, since netfilter offload in mlx5 is
implemented with unmanaged tables, its miss path is not automatically
connected to next priority and requires the code to manually connect with
slow table. To keep bridge offloads encapsulated and not mix it with
eswitch offloads new FDB_TC_MISS priority is created between FDB_FT_OFFLOAD
and FDB_SLOW_PATH which allows bridge offloads to be created without
exposing its internal tables to any other modules since miss path of
managed TC-miss table is automatically wired to next priority.

The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
namespace. The new priority is between tc-miss and slow path priorities.
Priority consist of two levels: the ingress table that is global per
eswitch and matches incoming packets by src_mac/vid and redirects them to
next level (egress table) that is chosen according to ingress port bridge
membership and matches on dst_mac/vid in order to redirect packet to vport
according to the following diagram:

                +
                |
      +---------v----------+
      |                    |
      |   FDB_TC_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_FT_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |    FDB_TC_MISS     |
      |                    |
      +---------+----------+
                |
+--------------------------------------+
|               |                      |
|        +------+                      |
|        |                             |
| +------v--------+   FDB_BR_OFFLOAD   |
| | INGRESS_TABLE |                    |
| +------+---+----+                    |
|        |   |      match              |
|        |   +---------+               |
|        |             |               |    +-------+
|        |     +-------v-------+ match |    |       |
|        |     | EGRESS_TABLE  +------------> vport |
|        |     +-------+-------+       |    |       |
|        |             |               |    +-------+
|        |    miss     |               |
|        +------+------+               |
|               |                      |
+--------------------------------------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_SLOW_PATH    |
      |                    |
      +---------+----------+
                |
                v

PATCHES OVERVIEW

1-3 - Miscellaneous refactorings and infrastructure changes.

4 - Mlx5 bridge offload infrastructure and dedicated fs_core
namespace/tables implementation.

5 - FDB entry offload.

6 - Dynamic FDB entry aging.

7-10 - VLAN filtering offload.

11 - Tracepoints for main mlx5 bridge offload events (FDB entry
offload/unoffload, VLAN add/delete, etc.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

--
2021-06-10 13:36:37 -07:00
..
acpi Merge branches 'acpi-cppc', 'acpi-video' and 'acpi-utils' 2021-04-26 17:04:27 +02:00
asm-generic mm: remove xlate_dev_kmem_ptr() 2021-05-07 00:26:34 -07:00
clocksource ARM: platform support for Apple M1 2021-04-26 12:30:36 -07:00
crypto
drm
dt-bindings Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-05-06 23:37:55 -07:00
keys integrity-v5.13 2021-05-01 15:32:18 -07:00
kunit
kvm Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/next 2021-04-22 13:41:49 +01:00
linux net/mlx5: Bridge, add offload infrastructure 2021-06-09 18:36:09 -07:00
math-emu
media media updates for v5.13-rc1 2021-04-28 09:24:36 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-06-09 14:50:35 -07:00
pcmcia
ras
rdma RDMA/restrack: Add support to get resource tracking for SRQ 2021-04-22 10:30:27 -03:00
scsi SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
soc Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sound ASoC: Updates for v5.13 2021-04-26 16:59:21 +02:00
target
trace tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
uapi netlink: simplify NLMSG_DATA with NLMSG_HDRLEN 2021-06-10 12:51:33 -07:00
vdso
video
xen xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h 2021-05-14 15:52:05 +02:00