linux

iv/linux

Author	SHA1	Message	Date
Leon Romanovsky	6721239672	net/mlx5e: Skip IPsec encryption for TX path without matching policy Software implementation of IPsec skips encryption of packets in TX path if no matching policy is found. So align HW implementation to this behavior, by requiring matching reqid for offloaded policy and SA. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:06 +01:00
Raed Salem	81f8fba5ec	net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows Add the following statistics: RX successfully IPsec flows: ipsec_rx_pkts : Number of packets passed Rx IPsec flow ipsec_rx_bytes : Number of bytes passed Rx IPsec flow Rx dropped IPsec policy packets: ipsec_rx_drop_pkts: Number of packets dropped in Rx datapath due to IPsec drop policy ipsec_rx_drop_bytes: Number of bytes dropped in Rx datapath due to IPsec drop policy TX successfully encrypted and encapsulated IPsec packets: ipsec_tx_pkts : Number of packets encrypted and encapsulated successfully ipsec_tx_bytes : Number of bytes encrypted and encapsulated successfully Tx dropped IPsec policy packets: ipsec_tx_drop_pkts: Number of packets dropped in Tx datapath due to IPsec drop policy ipsec_tx_drop_bytes: Number of bytes dropped in Tx datapath due to IPsec drop policy The above can be seen using: ethtool -S <ifc> \|grep ipsec Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:06 +01:00
Leon Romanovsky	18f38fd267	net/mlx5e: Improve IPsec flow steering autogroup Flow steering API separates newly created rules based on their match criteria. Right now, all IPsec tables are created with one group and suffers from not-optimal FS performance. Count number of different match criteria for relevant tables, and set proper value at the table creation. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:05 +01:00
Leon Romanovsky	6b5c45e16e	net/mlx5e: Configure IPsec packet offload flow steering In packet offload mode, the HW is responsible to handle ESP headers, SPI numbers and trailers (ICV) together with different logic for RX and TX paths. In order to support packet offload mode, special logic is added to flow steering rules. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:05 +01:00
Leon Romanovsky	9af594d8a9	net/mlx5e: Use same coding pattern for Rx and Tx flows Remove intermediate variable in favor of having similar coding style for Rx and Tx add rule functions. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:04 +01:00
Leon Romanovsky	a5b8ca9471	net/mlx5e: Add XFRM policy offload logic Implement mlx5 flow steering logic and mlx5 IPsec code support XFRM policy offload. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:03 +01:00
Leon Romanovsky	8c17295bd4	net/mlx5e: Create IPsec policy offload tables Add empty table to be used for IPsec policy offload. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:36:02 +01:00
Steffen Klassert	e8a292d6a7	Merge branch 'mlx5 IPsec packet offload support (Part I)' Leon Romanovsky says: ============ This series follows previously sent "Extend XFRM core to allow packet offload configuration" series [1]. It is first part with refactoring to mlx5 allow us natively extend mlx5 IPsec logic to support both crypto and packet offloads. ============ Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-08 10:25:30 +01:00
Leon Romanovsky	a8e052932a	net/mlx5e: Generalize creation of default IPsec miss group and rule Create general function that sets miss group and rule to forward all not-matched traffic to the next table. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:07:11 +01:00
Leon Romanovsky	d7ec2b7602	net/mlx5e: Group IPsec miss handles into separate struct Move miss handles into dedicated struct, so we can reuse it in next patch when creating IPsec policy flow table. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:06:26 +01:00
Leon Romanovsky	27ebe531d2	net/mlx5e: Make clear what IPsec rx_err does Reuse existing struct what holds all information about modify header pointer and rule. This helps to reduce ambiguity from the name _err_ that doesn't describe the real purpose of that flow table, rule and function - to copy status result from HW to the stack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:05:56 +01:00
Leon Romanovsky	384298c28a	net/mlx5e: Flatten the IPsec RX add rule path Rewrote the IPsec RX add rule path to be less convoluted and don't rely on pre-initialized variables. The code now has clean linear flow with clean separation between error and success paths. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:04:55 +01:00
Leon Romanovsky	35324bbb96	net/mlx5e: Refactor FTE setup code to be more clear The policy offload logic needs to set flow steering rule that match on saddr and daddr too, so factor out this code to separate functions, together with code alignment to netdev coding pattern of relying on family type. As part of this change, let's separate more logic from setup_fte_common to make sure that the function names describe that is done in the function better than general common name. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:03:56 +01:00
Leon Romanovsky	42ba0f9d4b	net/mlx5e: Move IPsec flow table creation to separate function Even now, to support IPsec crypto, the RX and TX paths use same logic to create flow tables. In the following patches, we will add more tables to support IPsec packet offload. So reuse existing code and rewrite it to support IPsec packet offload from the beginning. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:03:27 +01:00
Leon Romanovsky	8d15f364d5	net/mlx5e: Create hardware IPsec packet offload objects Create initial hardware IPsec packet offload object and connect it to advanced steering operation (ASO) context and queue, so the data path can communicate with the stack. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:02:24 +01:00
Leon Romanovsky	8518d05b8f	net/mlx5e: Create Advanced Steering Operation object for IPsec Setup the ASO (Advanced Steering Operation) object that is needed for IPsec to interact with SW stack about various fast changing events: replay window, lifetime limits, e.t.c Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:01:25 +01:00
Leon Romanovsky	c7049ca621	net/mlx5e: Remove accesses to priv for low level IPsec FS code mlx5 priv structure is driver main structure that holds high level data. That information is not needed for IPsec flow steering logic and the pointer to mlx5e_priv was not supposed to be passed in the first place. This change "cleans" the logic to rely on internal to IPsec structures without touching global mlx5e_priv. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 14:00:26 +01:00
Leon Romanovsky	fb2caa711f	net/mlx5e: Use mlx5 print routines for low level IPsec code Low level mlx5 code needs to use mlx5_core print routines and not netdev ones, as the failures are relevant to the HW itself and not to its netdev. This change allows us to remove access to mlx5 priv structure, which holds high level driver data that isn't needed for mlx5 IPsec code. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:59:22 +01:00
Leon Romanovsky	9e5286dcbb	net/mlx5e: Create symmetric IPsec RX and TX flow steering structs Remove AF family obfuscation by creating symmetric structs for RX and TX IPsec flow steering chains. This simplifies to us low level IPsec FS creation logic without need to dig into multiple levels of structs. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:58:00 +01:00
Leon Romanovsky	e3840530b4	net/mlx5e: Remove extra layers of defines Instead of performing redefinition of XFRM core defines to same values but with MLX5_* prefix, cache the input values as is by making sure that the proper storage objects are used. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:57:31 +01:00
Leon Romanovsky	cded6d8012	net/mlx5e: Store replay window in XFRM attributes As a preparation for future extension of IPsec hardware object to allow configuration of packet offload mode, extend the XFRM validator to check replay window values. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:55:59 +01:00
Leon Romanovsky	59592cfdf8	net/mlx5e: Advertise IPsec packet offload support Add needed capabilities check to determine if device supports IPsec packet offload mode. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:55:13 +01:00
Leon Romanovsky	3afee4ed33	net/mlx5: Add HW definitions for IPsec packet offload Add all needed bits to support IPsec packet offload mode. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:54:04 +01:00
Leon Romanovsky	e77bbde73e	net/mlx5: Return ready to use ASO WQE There is no need in hiding returned ASO WQE type by providing void*, use the real type instead. Do it together with zeroing that memory, so ASO WQE will be ready to use immediately. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:49:44 +01:00
Steffen Klassert	89ae65734a	Merge branch 'Extend XFRM core to allow packet offload configuration' Leon Romanovsky says: ============ The following series extends XFRM core code to handle a new type of IPsec offload - packet offload. In this mode, the HW is going to be responsible for the whole data path, so both policy and state should be offloaded. IPsec packet offload is an improved version of IPsec crypto mode, In packet mode, HW is responsible to trim/add headers in addition to decrypt/encrypt. In this mode, the packet arrives to the stack as already decrypted and vice versa for TX (exits to HW as not-encrypted). Devices that implement IPsec packet offload mode offload policies too. In the RX path, it causes the situation that HW can't effectively handle mixed SW and HW priorities unless users make sure that HW offloaded policies have higher priorities. It means that we don't need to perform any search of inexact policies and/or priority checks if HW policy was discovered. In such situation, the HW will catch the packets anyway and HW can still implement inexact lookups. In case specific policy is not found, we will continue with packet lookup and check for existence of HW policies in inexact list. HW policies are added to the head of SPD to ensure fast lookup, as XFRM iterates over all policies in the loop. This simple solution allows us to achieve same benefits of separate HW/SW policies databases without over-engineering the code to iterate and manage two databases at the same path. To not over-engineer the code, HW policies are treated as SW ones and don't take into account netdev to allow reuse of the same priorities for policies databases without over-engineering the code to iterate and manage two databases at the same path. To not over-engineer the code, HW policies are treated as SW ones and don't take into account netdev to allow reuse of the same priorities for different devices. * No software fallback * Fragments are dropped, both in RX and TX * No sockets policies * Only IPsec transport mode is implemented ================================================================================ Rekeying: In order to support rekeying, as XFRM core is skipped, the HW/driver should do the following: * Count the handled packets * Raise event that limits are reached * Drop packets once hard limit is occurred. The XFRM core calls to newly introduced xfrm_dev_state_update_curlft() function in order to perform sync between device statistics and internal structures. On HW limit event, driver calls to xfrm_state_check_expire() to allow XFRM core take relevant decisions. This separation between control logic (in XFRM) and data plane allows us to packet reuse SW stack. ================================================================================ Configuration: iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/ Packet offload mode: ip xfrm state offload packet dev <if-name> dir <in\|out> ip xfrm policy .... offload packet dev <if-name> Crypto offload mode: ip xfrm state offload crypto dev <if-name> dir <in\|out> or (backward compatibility) ip xfrm state offload dev <if-name> dir <in\|out> ================================================================================ Performance results: TCP multi-stream, using iperf3 instance per-CPU. +----------------------+--------+--------+--------+--------+---------+---------+ \| \| 1 CPU \| 2 CPUs \| 4 CPUs \| 8 CPUs \| 16 CPUs \| 32 CPUs \| \| +--------+--------+--------+--------+---------+---------+ \| \| BW (Gbps) \| +----------------------+--------+--------+-------+---------+---------+---------+ \| Baseline \| 27.9 \| 59 \| 93.1 \| 92.8 \| 93.7 \| 94.4 \| +----------------------+--------+--------+-------+---------+---------+---------+ \| Software IPsec \| 6 \| 11.9 \| 23.3 \| 45.9 \| 83.8 \| 91.8 \| +----------------------+--------+--------+-------+---------+---------+---------+ \| IPsec crypto offload \| 15 \| 29.7 \| 58.5 \| 89.6 \| 90.4 \| 90.8 \| +----------------------+--------+--------+-------+---------+---------+---------+ \| IPsec packet offload \| 28 \| 57 \| 90.7 \| 91 \| 91.3 \| 91.9 \| +----------------------+--------+--------+-------+---------+---------+---------+ IPsec packet offload mode behaves as baseline and reaches linerate with same amount of CPUs. Setups details (similar for both sides): * NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz ================================================================================ Series together with mlx5 part: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next ================================================================================ Changelog: v10: * Added forgotten xdo_dev_state_del. Patch #4. * Moved changelog in cover letter to the end. * Added "if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {" line to newly added netronome IPsec support. Patch #2. v9: https://lore.kernel.org/all/cover.1669547603.git.leonro@nvidia.com * Added acquire support v8: https://lore.kernel.org/all/cover.1668753030.git.leonro@nvidia.com * Removed not-related blank line * Fixed typos in documentation v7: https://lore.kernel.org/all/cover.1667997522.git.leonro@nvidia.com As was discussed in IPsec workshop: * Renamed "full offload" to be "packet offload". * Added check that offloaded SA and policy have same device while sending packet * Added to SAD same optimization as was done for SPD to speed-up lookups. v6: https://lore.kernel.org/all/cover.1666692948.git.leonro@nvidia.com * Fixed misplaced "!" in sixth patch. v5: https://lore.kernel.org/all/cover.1666525321.git.leonro@nvidia.com * Rebased to latest ipsec-next. * Replaced HW priority patch with solution which mimics separated SPDs for SW and HW. See more description in this cover letter. * Dropped RFC tag, usecase, API and implementation are clear. v4: https://lore.kernel.org/all/cover.1662295929.git.leonro@nvidia.com * Changed title from "PATCH" to "PATCH RFC" per-request. * Added two new patches: one to update hard/soft limits and another initial take on documentation. * Added more info about lifetime/rekeying flow to cover letter, see relevant section. * perf traces for crypto mode will come later. v3: https://lore.kernel.org/all/cover.1661260787.git.leonro@nvidia.com * I didn't hear any suggestion what term to use instead of "packet offload", so left it as is. It is used in commit messages and documentation only and easy to rename. * Added performance data and background info to cover letter * Reused xfrm_output_resume() function to support multiple XFRM transformations * Add PMTU check in addition to driver .xdo_dev_offload_ok validation * Documentation is in progress, but not part of this series yet. v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com * Rebased to latest 6.0-rc1 * Add an extra check in TX datapath patch to validate packets before forwarding to HW. * Added policy cleanup logic in case of netdev down event v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com * Moved comment to be before if (...) in third patch. v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com ----------------------------------------------------------------------- ============ Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-06 13:41:23 +01:00
Leon Romanovsky	2b7c72e0e5	xfrm: document IPsec packet offload mode Extend XFRM device offload API description with newly added packet offload mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:40:29 +01:00
Leon Romanovsky	f3da86dc2c	xfrm: add support to HW update soft and hard limits Both in RX and TX, the traffic that performs IPsec packet offload transformation is accounted by HW. It is needed to properly handle hard limits that require to drop the packet. It means that XFRM core needs to update internal counters with the one that accounted by the HW, so new callbacks are introduced in this patch. In case of soft or hard limit is occurred, the driver should call to xfrm_state_check_expire() that will perform key rekeying exactly as done by XFRM core. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:38:31 +01:00
Leon Romanovsky	3c611d40c6	xfrm: speed-up lookup of HW policies Devices that implement IPsec packet offload mode should offload SA and policies too. In RX path, it causes to the situation that HW will always have higher priority over any SW policies. It means that we don't need to perform any search of inexact policies and/or priority checks if HW policy was discovered. In such situation, the HW will catch the packets anyway and HW can still implement inexact lookups. In case specific policy is not found, we will continue with packet lookup and check for existence of HW policies in inexact list. HW policies are added to the head of SPD to ensure fast lookup, as XFRM iterates over all policies in the loop. The same solution of adding HW SAs at the begging of the list is applied to SA database too. However, we don't need to change lookups as they are sorted by insertion order and not priority. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:37:33 +01:00
Leon Romanovsky	5958372ddf	xfrm: add RX datapath protection for IPsec packet offload mode Traffic received by device with enabled IPsec packet offload should be forwarded to the stack only after decryption, packet headers and trailers removed. Such packets are expected to be seen as normal (non-XFRM) ones, while not-supported packets should be dropped by the HW. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:36:16 +01:00
Leon Romanovsky	f8a70afafc	xfrm: add TX datapath support for IPsec packet offload mode In IPsec packet mode, the device is going to encrypt and encapsulate packets that are associated with offloaded policy. After successful policy lookup to indicate if packets should be offloaded or not, the stack forwards packets to the device to do the magic. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:34:49 +01:00
Leon Romanovsky	919e43fad5	xfrm: add an interface to offload policy Extend netlink interface to add and delete XFRM policy from the device. This functionality is a first step to implement packet IPsec offload solution. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:33:13 +01:00
Leon Romanovsky	62f6eca5de	xfrm: allow state packet offload mode Allow users to configure xfrm states with packet offload mode. The packet mode must be requested both for policy and state, and such requires us to do not implement fallback. We explicitly return an error if requested packet mode can't be configured. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:32:44 +01:00
Leon Romanovsky	d14f28b8c1	xfrm: add new packet offload flag In the next patches, the xfrm core code will be extended to support new type of offload - packet offload. In that mode, both policy and state should be specially configured in order to perform whole offloaded data path. Full offload takes care of encryption, decryption, encapsulation and other operations with headers. As this mode is new for XFRM policy flow, we can "start fresh" with flag bits and release first and second bit for future use. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-12-05 10:30:47 +01:00
Lorenzo Bianconi	65e6af6ceb	net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill In order to fix the following sleep while atomic bug always alloc pages with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in spin_lock critical section. [ 9.049719] Hardware name: MediaTek MT7986a RFB (DT) [ 9.054665] Call trace: [ 9.057096] dump_backtrace+0x0/0x154 [ 9.060751] show_stack+0x14/0x1c [ 9.064052] dump_stack_lvl+0x64/0x7c [ 9.067702] dump_stack+0x14/0x2c [ 9.071001] ___might_sleep+0xec/0x120 [ 9.074736] __might_sleep+0x4c/0x9c [ 9.078296] __alloc_pages+0x184/0x2e4 [ 9.082030] page_frag_alloc_align+0x98/0x1ac [ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234 [ 9.090974] mtk_wed_wo_init+0x174/0x2c0 [ 9.094881] mtk_wed_attach+0x7c8/0x7e0 [ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e] [ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e] [ 9.108727] pci_device_probe+0xac/0x13c [ 9.112638] really_probe.part.0+0x98/0x2f4 [ 9.116807] __driver_probe_device+0x94/0x13c [ 9.121147] driver_probe_device+0x40/0x114 [ 9.125314] __driver_attach+0x7c/0x180 [ 9.129133] bus_for_each_dev+0x5c/0x90 [ 9.132953] driver_attach+0x20/0x2c [ 9.136513] bus_add_driver+0x104/0x1fc [ 9.140333] driver_register+0x74/0x120 [ 9.144153] __pci_register_driver+0x40/0x50 [ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e] [ 9.152848] do_one_initcall+0x40/0x25c [ 9.156669] do_init_module+0x44/0x230 [ 9.160403] load_module+0x1f30/0x2750 [ 9.164135] __do_sys_init_module+0x150/0x200 [ 9.168475] __arm64_sys_init_module+0x18/0x20 [ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0 [ 9.177589] do_el0_svc+0x48/0xe0 [ 9.180889] el0_svc+0x14/0x50 [ 9.183929] el0t_64_sync_handler+0x9c/0x120 [ 9.188183] el0t_64_sync+0x158/0x15c Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-02 21:23:02 -08:00
Eric Dumazet	55fb80d518	tcp: use 2-arg optimal variant of kfree_rcu() kfree_rcu(1-arg) should be avoided as much as possible, since this is only possible from sleepable contexts, and incurr extra rcu barriers. I wish the 1-arg variant of kfree_rcu() would get a distinct name, like kfree_rcu_slow() to avoid it being abused. Fixes: 459837b522f7 ("net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20221202052847.2623997-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-02 20:44:45 -08:00
Jakub Kicinski	edd4e25a23	wireless-next patches for v6.2 Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7 devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise smaller features and fixes. Major changes: ath10k * store WLAN firmware version in SMEM image table mt76 * mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices * mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support * mt7915: add ack signal support * mt7915: enable coredump support * mt7921: remain_on_channel support * mt7921: channel context support iwlwifi * enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities * 320 MHz channels support -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmOKcMARHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZv3cgf+KjlbxtCZvEIfK+jsd2/VK635ucUdC1d5 QZB5SCHyVCqTMEsBBw0WCmFdfnqQRQUE9Qe5s0hlwhyrjLP4FQ6/jGTarFvRV43E xO8jJd7e4mnVVoQySeKIRfvtYPFKT5GpaDVs4ytfdSs+KYoCE7akMBcvHVO8Fr2M MepdqyoJakhRybFUJZMts8W8IsBikv9hdnb2Mr/E32JFLeP6ggs9tKCZKBbpxyXk BzfYkDMXffFl95prlmy4rXP223FjvgUuRNWaatseR7S6A/Ik9Xk3B1qv3mtocPZF LiTlFtmn3qkgyX5bfm6NRe/2FqgRUYfIrN0XtVw6Sy8WUe1GCf3opA== =pkqE -----END PGP SIGNATURE----- Merge tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.2 Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7 devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise smaller features and fixes. Major changes: ath10k - store WLAN firmware version in SMEM image table mt76 - mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices - mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support - mt7915: add ack signal support - mt7915: enable coredump support - mt7921: remain_on_channel support - mt7921: channel context support iwlwifi - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support * tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (144 commits) wifi: ath10k: fix QCOM_SMEM dependency wifi: mt76: mt7921e: add pci .shutdown() support wifi: mt76: mt7915: mmio: fix naming convention wifi: mt76: mt7996: add support to configure spatial reuse parameter set wifi: mt76: mt7996: enable ack signal support wifi: mt76: mt7996: enable use_cts_prot support wifi: mt76: mt7915: rely on band_idx of mt76_phy wifi: mt76: mt7915: enable per bandwidth power limit support wifi: mt76: mt7915: introduce mt7915_get_power_bound() mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2() wifi: mt76: do not send firmware FW_FEATURE_NON_DL region wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc wifi: mt76: fix coverity overrun-call in mt76_get_txpower() wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power wifi: mt76: mt7915: fix band_idx usage wifi: mt76: mt7915: enable .sta_set_txpwr support wifi: mt76: mt7915: add basedband Txpower info into debugfs wifi: mt76: mt7915: add support to configure spatial reuse parameter set wifi: mt76: mt7915: add missing MODULE_PARM_DESC ... ==================== Link: https://lore.kernel.org/r/20221202214254.D0D3DC433C1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-02 20:33:30 -08:00
Kalle Valo	d03407183d	wifi: ath10k: fix QCOM_SMEM dependency Nathan noticed that when HWSPINLOCK is disabled there's a Kconfig warning: WARNING: unmet direct dependencies detected for QCOM_SMEM Depends on [n]: (ARCH_QCOM [=y] \|\| COMPILE_TEST [=n]) && HWSPINLOCK [=n] Selected by [m]: - ATH10K_SNOC [=m] && NETDEVICES [=y] && WLAN [=y] && WLAN_VENDOR_ATH [=y] && ATH10K [=m] && (ARCH_QCOM [=y] \|\| COMPILE_TEST [=n]) The problem here is that QCOM_SMEM depends on HWSPINLOCK so we cannot select QCOM_SMEM and instead we neeed to use 'depends on'. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/all/Y4YsyaIW+CPdHWv3@dev-arch.thelio-3990X/ Fixes: 4d79f6f34bbb ("wifi: ath10k: Store WLAN firmware version in SMEM image table") Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221202103027.25974-1-kvalo@kernel.org	2022-12-02 20:24:06 +02:00
Gerhard Engleder	dbadae9272	tsnep: Rework RX buffer allocation Refill RX queue in batches of descriptors to improve performance. Refill is allowed to fail as long as a minimum number of descriptors is active. Thus, a limited number of failed RX buffer allocations is now allowed for normal operation. Previously every failed allocation resulted in a dropped frame. If the minimum number of active descriptors is reached, then RX buffers are still reused and frames are dropped. This ensures that the RX queue never runs empty and always continues to operate. Prework for future XDP support. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:47:49 +00:00
Gerhard Engleder	d3dfe8d6c0	tsnep: Throttle interrupts Without interrupt throttling, iperf server mode generates a CPU load of 100% (A53 1.2GHz). Also the throughput suffers with less than 900Mbit/s on a 1Gbit/s link. The reason is a high interrupt load with interrupts every ~20us. Reduce interrupt load by throttling of interrupts. Interrupt delay default is 64us. For iperf server mode the CPU load is significantly reduced to ~20% and the throughput reaches the maximum of 941MBit/s. Interrupts are generated every ~140us. RX and TX coalesce can be configured with ethtool. RX coalesce has priority over TX coalesce if the same interrupt is used. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:47:49 +00:00
Gerhard Engleder	4f661ccfca	tsnep: Add ethtool::get_channels support Allow user space to read number of TX and RX queue. This is useful for device dependent qdisc configurations like TAPRIO with hardware offload. Also ethtool::get_per_queue_coalesce / set_per_queue_coalesce requires that interface. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:47:49 +00:00
Gerhard Engleder	91644df1ba	tsnep: Consistent naming of struct net_device Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:47:49 +00:00
Jonathan Toppins	95cce3fae4	Documentation: bonding: correct xmit hash steps Correct xmit hash steps for layer3+4 as introduced by commit 49aefd131739 ("bonding: do not discard lowest hash bit for non layer3+4 hashing"). Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:46:45 +00:00
Jonathan Toppins	f036b97da6	Documentation: bonding: update miimon default to 100 With commit c1f897ce186a ("bonding: set default miimon value for non-arp modes if not set") the miimon default was changed from zero to 100 if arp_interval is also zero. Document this fact in bonding.rst. Fixes: c1f897ce186a ("bonding: set default miimon value for non-arp modes if not set") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:46:45 +00:00
Andy Shevchenko	a479f9264b	net: thunderbolt: Use bitwise types in the struct thunderbolt_ip_frame_header The main usage of the struct thunderbolt_ip_frame_header is to handle the packets on the media layer. The header is bound to the protocol in which the byte ordering is crucial. However the data type definition doesn't use that and sparse is unhappy, for example (17 altogether): .../thunderbolt.c:718:23: warning: cast to restricted __le32 .../thunderbolt.c:966:42: warning: incorrect type in assignment (different base types) .../thunderbolt.c:966:42: expected unsigned int [usertype] frame_count .../thunderbolt.c:966:42: got restricted __le32 [usertype] Switch to the bitwise types in the struct thunderbolt_ip_frame_header to reduce this, but not completely solving (9 left), because the same data type is used for Rx header handled locally (in CPU byte order). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:42:26 +00:00
Andy Shevchenko	0bbe50f3e8	net: thunderbolt: Switch from __maybe_unused to pm_sleep_ptr() etc Letting the compiler remove these functions when the kernel is built without CONFIG_PM_SLEEP support is simpler and less heavier for builds than the use of __maybe_unused attributes. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:42:26 +00:00
Jiri Pirko	47b438cc27	net: devlink: convert port_list into xarray Some devlink instances may contain thousands of ports. Storing them in linked list and looking them up is not scalable. Convert the linked list into xarray. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-02 10:37:03 +00:00
Jakub Kicinski	3f5a4aa1c3	Merge branch 'hsr' Sebastian Andrzej Siewior says: ==================== I started playing with HSR and run into a problem. Tested latest upstream -rc and noticed more problems. Now it appears to work. For testing I have a small three node setup with iperf and ping. While iperf doesn't complain ping reports missing packets and duplicates. ==================== Link: https://lore.kernel.org/r/20221129164815.128922-1-bigeasy@linutronix.de/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-01 20:26:25 -08:00
Sebastian Andrzej Siewior	7d0455e970	selftests: Add a basic HSR test. This test adds a basic HSRv0 network with 3 nodes. In its current shape it sends and forwards packets, announcements and so merges nodes based on MAC A/B information. It is able to detect duplicate packets and packetloss should any occur. Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-01 20:26:22 -08:00
Sebastian Andrzej Siewior	20d3c1e9b8	hsr: Use a single struct for self_node. self_node_db is a list_head with one entry of struct hsr_node. The purpose is to hold the two MAC addresses of the node itself. It is convenient to recycle the structure. However having a list_head and fetching always the first entry is not really optimal. Created a new data strucure contaning the two MAC addresses named hsr_self_node. Access that structure like an RCU protected pointer so it can be replaced on the fly without blocking the reader. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-01 20:26:22 -08:00
Sebastian Andrzej Siewior	5c7aa13210	hsr: Synchronize sequence number updates. hsr_register_frame_out() compares new sequence_nr vs the old one recorded in hsr_node::seq_out and if the new sequence_nr is higher then it will be written to hsr_node::seq_out as the new value. This operation isn't locked so it is possible that two frames with the same sequence number arrive (via the two slave devices) and are fed to hsr_register_frame_out() at the same time. Both will pass the check and update the sequence counter later to the same value. As a result the content of the same packet is fed into the stack twice. This was noticed by running ping and observing DUP being reported from time to time. Instead of using the hsr_priv::seqnr_lock for the whole receive path (as it is for sending in the master node) add an additional lock that is only used for sequence number checks and updates. Add a per-node lock that is used during sequence number reads and updates. Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-01 20:26:21 -08:00

1 2 3 4 5 ...

1140211 Commits