linux

iv/linux

Author	SHA1	Message	Date
Vladimir Oltean	7bf4a5b071	net: mscc: ocelot: optimize ocelot_mm_irq() The MAC Merge IRQ of all ports is shared with the PTP TX timestamp IRQ of all ports, which means that currently, when a PTP TX timestamp is generated, felix_irq_handler() also polls for the MAC Merge layer status of all ports, looking for changes. This makes the kernel do more work, and under certain circumstances may make ptp4l require a tx_timestamp_timeout argument higher than before. Changes to the MAC Merge layer status are only to be expected under certain conditions - its TX direction needs to be enabled - so we can check early if that is the case, and omit register access otherwise. Make ocelot_mm_update_port_status() skip register access if mm->tx_enabled is unset, and also call it once more, outside IRQ context, from ocelot_port_set_mm(), when mm->tx_enabled transitions from true to false, because an IRQ is also expected in that case. Also, a port may have its MAC Merge layer enabled but it may not have generated the interrupt. In that case, there's no point in writing to DEV_MM_STATUS to acknowledge that IRQ. We can reduce the number of register writes per port with MM enabled by keeping an "ack" variable which writes the "write-one-to-clear" bits. Those are 3 in number: PRMPT_ACTIVE_STICKY, UNEXP_RX_PFRM_STICKY and UNEXP_TX_PFRM_STICKY. The other fields in DEV_MM_STATUS are read-only and it doesn't matter what is written to them, so writing zero is just fine. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 19:01:18 -07:00
Vladimir Oltean	3ff468ef98	net: mscc: ocelot: remove struct ocelot_mm_state :: lock Unfortunately, the workarounds for the hardware bugs make it pointless to keep fine-grained locking for the MAC Merge state of each port. Our vsc9959_cut_through_fwd() implementation requires ocelot->fwd_domain_lock to be held, in order to serialize with changes to the bridging domains and to port speed changes (which affect which ports can be cut-through). Simultaneously, the traffic classes which can be cut-through cannot be preemptible at the same time, and this will depend on the MAC Merge layer state (which changes from threaded interrupt context). Since vsc9959_cut_through_fwd() would have to hold the mm->lock of all ports for a correct and race-free implementation with respect to ocelot_mm_irq(), in practice it means that any time a port's mm->lock is held, it would potentially block holders of ocelot->fwd_domain_lock. In the interest of simple locking rules, make all MAC Merge layer state changes (and preemptible traffic class changes) be serialized by the ocelot->fwd_domain_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 19:01:18 -07:00
Vladimir Oltean	15f93f46f3	net: mscc: ocelot: export a single ocelot_mm_irq() When the switch emits an IRQ, we don't know what caused it, and we iterate through all ports to check the MAC Merge status. Move that iteration inside the ocelot lib; we will change the locking in a future change and it would be good to encapsulate that lock completely within the ocelot lib. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 19:01:18 -07:00
Jakub Kicinski	3b53ada514	Merge branch 'xdp-rx-hwts-metadata-for-stmmac-driver' Song Yoong Siang says: ==================== XDP Rx HWTS metadata for stmmac driver Implemented XDP receive hardware timestamp metadata for stmmac driver. This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata. Below are the test steps and results. Command on DUT: sudo ./xdp_hw_metadata <interface name> Command on Link Partner: echo -n xdp \| nc -u -q1 <destination IPv4 addr> 9091 echo -n skb \| nc -u -q1 <destination IPv4 addr> 9092 Result for port 9091: poll: 1 (0) skip=1 fail=0 redir=1 xsk_ring_cons__peek: 1 0x55f69f65f6d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp: 1677762069053692631 No rx_hash err=-95 0x55f69f65f6d0: complete idx=8 addr=8000 Result for port 9092: poll: 1 (0) skip=2 fail=0 redir=1 found skb hwtstamp = 1677762071.937207680 ==================== Link: https://lore.kernel.org/r/20230415064503.3225835-1-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:57:28 -07:00
Song Yoong Siang	9570df3533	net: stmmac: add Rx HWTS metadata to XDP ZC receive pkt Add receive hardware timestamp metadata support via kfunc to XDP Zero Copy receive packets. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:57:26 -07:00
Song Yoong Siang	e3f9c3e348	net: stmmac: add Rx HWTS metadata to XDP receive pkt Add receive hardware timestamp metadata support via kfunc to XDP receive packets. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:57:26 -07:00
Song Yoong Siang	5b24324a90	net: stmmac: introduce wrapper for struct xdp_buff Introduce struct stmmac_xdp_buff as a preparation to support XDP Rx metadata via kfuncs. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:57:26 -07:00
Jakub Kicinski	6c829efed5	Merge branch 'support-tunnel-mode-in-mlx5-ipsec-packet-offload' Leon Romanovsky says: ==================== Support tunnel mode in mlx5 IPsec packet offload This series extends mlx5 to support tunnel mode in its IPsec packet offload implementation. v0: https://lore.kernel.org/all/cover.1681106636.git.leonro@nvidia.com ==================== Link: https://lore.kernel.org/r/cover.1681388425.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:27 -07:00
Leon Romanovsky	c941da23aa	net/mlx5e: Accept tunnel mode for IPsec packet offload Open mlx5 driver to accept IPsec tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	146c196b60	net/mlx5e: Create IPsec table with tunnel support only when encap is disabled Current hardware doesn't support double encapsulation which is happening when IPsec packet offload tunnel mode is configured together with eswitch encap option. Any user attempt to add new SA/policy after he/she sets encap mode, will generate the following FW syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 1904): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa43321), err(-22) Make sure that we block encap changes before creating flow steering tables. This is applicable only for packet offload in tunnel mode, while packet offload in transport mode and crypto offload, don't have such limitation as they don't perform encapsulation. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	acc109291a	net/mlx5: Allow blocking encap changes in eswitch Existing eswitch encap option enables header encapsulation. Unfortunately currently available hardware isn't able to perform double encapsulation, which can happen once IPsec packet offload tunnel mode is used together with encap mode set to BASIC. So as a solution for misconfiguration, provide an option to block encap changes, which will be used for IPsec packet offload. Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	4c24272b4e	net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode In IPsec packet offload mode all header manipulations are performed by hardware, which is responsible to add/remove L2 header with source and destinations MACs. CX-7 devices don't support offload of in-kernel routing functionality, as such HW needs external help to fill other side MAC as it isn't available for HW. As a solution, let's listen to neigh ARP updates and reconfigure IPsec rules on the fly once new MAC data information arrives. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	efbd31c4d8	net/mlx5e: Support IPsec TX packet offload in tunnel mode Extend mlx5 driver with logic to support IPsec TX packet offload in tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	37a417ca91	net/mlx5e: Support IPsec RX packet offload in tunnel mode Extend mlx5 driver with logic to support IPsec RX packet offload in tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	6480a3b6c9	net/mlx5e: Prepare IPsec packet reformat code for tunnel mode Refactor setup_pkt_reformat() function to accommodate future extension to support tunnel mode. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	006adbc6de	net/mlx5e: Configure IPsec SA tables to support tunnel mode Create SA flow steering tables both for RX and TX with tunnel reformat property. This allows to add and delete extra headers needed for tunnel mode. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	1c80e94929	net/mlx5e: Check IPsec packet offload tunnel capabilities Validate tunnel mode support for IPsec packet offload. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Leon Romanovsky	1210af3b99	net/mlx5e: Add IPsec packet offload tunnel bits Extend packet reformat types and flow table capabilities with IPsec packet offload tunnel bits. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-17 18:55:25 -07:00
Horatiu Vultur	99676a5766	net: lan966x: Fix lan966x_ifh_get From time to time, it was observed that the nanosecond part of the received timestamp, which is extracted from the IFH, it was actually bigger than 1 second. So then when actually calculating the full received timestamp, based on the nanosecond part from IFH and the second part which is read from HW, it was actually wrong. The issue seems to be inside the function lan966x_ifh_get, which extracts information from an IFH(which is an byte array) and returns the value in a u64. When extracting the timestamp value from the IFH, which starts at bit 192 and have the size of 32 bits, then if the most significant bit was set in the timestamp, then this bit was extended then the return value became 0xffffffff... . And the reason of this is because constants without any postfix are treated as signed longs and that is the reason why '1 << 31' becomes 0xffffffff80000000. This is fixed by adding the postfix 'ULL' to 1. Fixes: fd7627833ddf ("net: lan966x: Stop using packing library") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 09:56:49 +01:00
David S. Miller	0af03871b6	Merge branch 'sctp-info-dump' Xin Long says: ==================== sctp: add some missing peer_capables in sctp info dump The 1st patch removes the unused and obsolete hostname_address from sctp_association peer and also the bit from sctp_info peer_capables, and then reuses its bit for reconf_capable and use the higher available bit for intl_capable in the 2nd patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:28:21 +01:00
Xin Long	ab4f1e28c9	sctp: add intl_capable and reconf_capable in ss peer_capable There are two new peer capables have been added since sctp_diag was introduced into SCTP. When dumping the peer capables, these two new peer capables should also be included. To not break the old capables, reconf_capable takes the old hostname_address bit, and intl_capable uses the higher available bit in sctpi_peer_capable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:28:21 +01:00
Xin Long	bd4b281894	sctp: delete the obsolete code for the host name address param In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk: Note 3: An INIT chunk MUST NOT contain the Host Name Address parameter. The receiver of an INIT chunk containing a Host Name Address parameter MUST send an ABORT chunk and MAY include an "Unresolvable Address" error cause. For Supported Address Types: The value indicating the Host Name Address parameter MUST NOT be used when sending this parameter and MUST be ignored when receiving this parameter. Currently Linux SCTP doesn't really support Host Name Address param, but only saves some flag and print debug info, which actually won't even be triggered due to the verification in sctp_verify_param(). This patch is to delete those dead code. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:28:20 +01:00
David S. Miller	9bf55bd442	Merge branch 'mptcp-cleanups' Matthieu Baerts says: ==================== mptcp: various small cleanups Patch 1 makes a function static because it is only used in one file. Patch 2 adds info about the git trees we use to help occasional devs. Patch 3 removes an unused variable. Patch 4 removes duplicated entries from the help menu of a tool used in MPTCP selftests. Patch 5 removes some ShellCheck warnings in mptcp_join.sh selftest. Only very minor improvements then. ==================== Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:34 +01:00
Matthieu Baerts	0fcd72df88	selftests: mptcp: join: fix ShellCheck warnings Most of the code had an issue according to ShellCheck. That's mainly due to the fact it incorrectly believes most of the code was unreachable because it's invoked by variable name, see how the "tests" array is used. Once SC2317 has been ignored, three small warnings were still visible: - SC2155: Declare and assign separately to avoid masking return values. - SC2046: Quote this to prevent word splitting: can be ignored because "ip netns pids" can display more than one pid. - SC2166: Prefer [ p ] \|\| [ q ] as [ p -o q ] is not well defined. This probably didn't fix any actual issues but it might help spotting new interesting warnings reported by ShellCheck as just before, ShellCheck was reporting issues for most lines making it a bit useless. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:33 +01:00
Matthieu Baerts	0a85264e48	selftests: mptcp: remove duplicated entries in usage mptcp_connect tool was printing some duplicated entries when showing how to use it: -j -l -r While at it, I also: - moved the very few entries that were not sorted, - added -R that was missing since commit 8a4b910d005d ("mptcp: selftests: add rcvbuf set option"), - removed the -u parameter that has been removed in commit f730b65c9d85 ("selftests: mptcp: try to set mptcp ulp mode in different sk states"). No need to backport this, it is just an internal tool used by our selftests. The help menu is mainly useful for MPTCP kernel devs. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:33 +01:00
Matthieu Baerts	ce395d0e3a	mptcp: remove unused 'remaining' variable In some functions, 'remaining' variable was given in argument and/or set but never read. net/mptcp/options.c:779:3: warning: Value stored to 'remaining' is never read [clang-analyzer-deadcode.DeadStores]. net/mptcp/options.c:547:3: warning: Value stored to 'remaining' is never read [clang-analyzer-deadcode.DeadStores]. The issue has been reported internally by Alibaba CI. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Suggested-by: Mat Martineau <martineau@kernel.org> Co-developed-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:33 +01:00
Matthieu Baerts	c3d713409b	MAINTAINERS: add git trees for MPTCP This will help occasional developers to find our git repo without having to look at our wiki. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:33 +01:00
Geliang Tang	aa5887dca2	mptcp: make userspace_pm_append_new_local_addr static mptcp_userspace_pm_append_new_local_addr() has always exclusively been used in pm_userspace.c since its introduction in commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs"). So make it static. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:25:33 +01:00
David S. Miller	28f610d086	Merge branch 'mptcp-subflow-init' Matthieu Baerts says: ==================== mptcp: refactor first subflow init This series refactors the initialisation of the first subflow of a listen socket. The first subflow allocation is no longer done at the initialisation of the socket but later, when the connection request is received or when requested by the userspace. This is needed not just because Paolo likes to refactor things but because this simplifies the code and makes the behaviour more consistent with the rest. Also, this is a prerequisite for future patches adding proper support of SELinux/LSM labels with MPTCP and accept(2). In [1], Ondrej Mosnacek explained they discovered the (userspace-facing) sockets returned by accept(2) when using MPTCP always end up with the label representing the kernel (typically system_u:system_r:kernel_t:s0), while it would make more sense to inherit the context from the parent socket (the one that is passed to accept(2)). Before being able to properly support that on SELinux/LSM side, patches 2-3/5 prepare the code to simplify the patch 4/5 moving the allocation. Patch 1/5 is a small clean-up seen while working on the series and patch 5/5 is a small improvement when closing unaccepted sockets. [1] https://lore.kernel.org/netdev/CAFqZXNs2LF-OoQBUiiSEyranJUXkPLcCfBkMkwFeM6qEwMKCTw@mail.gmail.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
Paolo Abeni	8d547809a5	mptcp: fastclose msk when cleaning unaccepted sockets When cleaning up unaccepted mptcp socket still laying inside the listener queue at listener close time, such sockets will go through a regular close, waiting for a timeout before shutting down the subflows. There is no need to keep the kernel resources in use for such a possibly long time: short-circuit to fast-close. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
Paolo Abeni	ddb1a072f8	mptcp: move first subflow allocation at mpc access time In the long run this will simplify the mptcp code and will allow for more consistent behavior. Move the first subflow allocation out of the sock->init ops into the __mptcp_nmpc_socket() helper. Since the first subflow creation can now happen after the first setsockopt() we additionally need to invoke mptcp_sockopt_sync() on it. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
Paolo Abeni	a2702a076e	mptcp: move fastopen subflow check inside mptcp_sendmsg_fastopen() So that we can avoid a bunch of check in fastpath. Additionally we can specialize such check according to the specific fastopen method - defer_connect vs MSG_FASTOPEN. The latter bits will simplify the next patches. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
Paolo Abeni	6176123169	mptcp: avoid unneeded __mptcp_nmpc_socket() usage In a few spots, the mptcp code invokes the __mptcp_nmpc_socket() helper multiple times under the same socket lock scope. Additionally, in such places, the socket status ensures that there is no MP capable handshake running. Under the above condition we can replace the later __mptcp_nmpc_socket() helper invocation with direct access to the msk->subflow pointer and better document such access is not supposed to fail with WARN(). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
Paolo Abeni	7a486c443c	mptcp: drop unneeded argument After commit 3a236aef280e ("mptcp: refactor passive socket initialization"), every mptcp_pm_fully_established() call is always invoked with a GFP_ATOMIC argument. We can then drop it. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:18:34 +01:00
David S. Miller	0475135f8c	mlx5-updates-2023-04-14 Yevgeny Kliteynik Says: ======================= SW Steering: Support pattern/args modify_header actions The following patch series adds support for a new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. The new approach comprises of two types of objects: pattern and argument. Pattern holds header modification templates, later used with corresponding argument object to create complete header modification actions. The pattern indicates which headers are modified, while the arguments provide the specific values. Therefore a single pattern can be used with different arguments in different flows, enabling offloading of large number of modify_header flows. - Patch 1, 2: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows - Patch 3: Allow for chunk allocation separately for STEv0 and STEv1 - Patch 4: Read related device capabilities - Patch 5: Add create/destroy functions for the new general object type - Patch 6: Add support for writing modify header argument to ICM - Patch 7, 8: Some required fixes to support pattern/arg - separate read buffer from the write buffer and fix QP continuous allocation - Patch 9: Add pool for modify header arg objects - Patch 10, 11, 12: Implement MODIFY_HEADER and TNL_L3_TO_L2 actions with the new patterns/args design - Patch 13: Optimization - set modify header action of size 1 directly on the STE instead of separate pattern/args combination - Patch 14: Adjust debug dump for patterns/args - Patch 15: Enable patterns and arguments for supporting devices ======================= -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQ5zl8ACgkQSD+KveBX +j51cgf/fwbs84NDidD8+Ifn2s0uhP5ItwjqfRWDr5xiLupCG4fizh09kox8WDEj MdA+ifbx1Y4DvVs20DZlEypcths5dclms43cOuSq9a9uxnDoJMCyMfj4eeMAk1Kh 1E1aLSYYlyGDFmFkjEQycBjgTECFo2wPJy+0KRmyUmYtkBnMacNjfHfOYV2BqD0o T3av7Q1SDQilb+rT7VGQJVj6EWndx/JsCJfFhaPovkZvYhkcbAssrltEzj8rh0z/ Lbha4FFNLKUT7Q3DXsB6GeLXJq8UZWW9ql+F80em+iMn8MZKcNY98eHK6KT+48UT C51STZ3eIoKt1cs6ESDl0D3vOkI4yg== =Lwwj -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2023-04-14 Yevgeny Kliteynik Says: ======================= SW Steering: Support pattern/args modify_header actions The following patch series adds support for a new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. The new approach comprises of two types of objects: pattern and argument. Pattern holds header modification templates, later used with corresponding argument object to create complete header modification actions. The pattern indicates which headers are modified, while the arguments provide the specific values. Therefore a single pattern can be used with different arguments in different flows, enabling offloading of large number of modify_header flows. - Patch 1, 2: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows - Patch 3: Allow for chunk allocation separately for STEv0 and STEv1 - Patch 4: Read related device capabilities - Patch 5: Add create/destroy functions for the new general object type - Patch 6: Add support for writing modify header argument to ICM - Patch 7, 8: Some required fixes to support pattern/arg - separate read buffer from the write buffer and fix QP continuous allocation - Patch 9: Add pool for modify header arg objects - Patch 10, 11, 12: Implement MODIFY_HEADER and TNL_L3_TO_L2 actions with the new patterns/args design - Patch 13: Optimization - set modify header action of size 1 directly on the STE instead of separate pattern/args combination - Patch 14: Adjust debug dump for patterns/args - Patch 15: Enable patterns and arguments for supporting devices =======================	2023-04-17 08:14:21 +01:00
David S. Miller	e2174b0355	Merge branch 'ovs-selftests' Aaron Conole says: ==================== selftests: openvswitch: add support for testing upcall interface The existing selftest suite for openvswitch will work for regression testing the datapath feature bits, but won't test things like adding interfaces, or the upcall interface. Here, we add some additional test facilities. First, extend the ovs-dpctl.py python module to support the OVS_FLOW and OVS_PACKET netlink families, with some associated messages. These can be extended over time, but the initial support is for more well known cases (output, userspace, and CT). Next, extend the test suite to test upcalls by adding a datapath, monitoring the upcall socket associated with the datapath, and then dumping any upcalls that are received. Compare with expected ARP upcall via arping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:12:33 +01:00
Aaron Conole	9feac87b67	selftests: openvswitch: add support for upcall testing The upcall socket interface can be exercised now to make sure that future feature adjustments to the field can maintain backwards compatibility. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:12:33 +01:00
Aaron Conole	e52b07aa1a	selftests: openvswitch: add flow dump support Add a basic set of fields to print in a 'dpflow' format. This will be used by future commits to check for flow fields after parsing, as well as verifying the flow fields pushed into the kernel from userspace. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:12:33 +01:00
Aaron Conole	74cc26f416	selftests: openvswitch: add interface support Includes an associated test to generate netns and connect interfaces, with the option to include packet tracing. This will be used in the future when flow support is added for additional test cases. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:12:33 +01:00
Horatiu Vultur	c6d6ef3ee3	net: phy: micrel: Fix PTP_PF_PEROUT for lan8841 If the 1PPS output was enabled and then lan8841 was configured to be a follower, then target clock which is used to generate the 1PPS was not configure correctly. The problem was that for each adjustments of the time, also the nanosecond part of the target clock was changed. Therefore the initial nanosecond part of the target clock was changed. The issue can be observed if both the leader and the follower are generating 1PPS and see that their PPS are not aligned even if the time is allined. The fix consists of not modifying the nanosecond part of the target clock when adjusting the time. In this way the 1PPS get also aligned. Fixes: e4ed8ba08e3f ("net: phy: micrel: Add support for PTP_PF_PEROUT for lan8841") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-17 08:10:00 +01:00
Jakub Kicinski	e61caf04b9	Merge branch 'page_pool-allow-caching-from-safely-localized-napi' Jakub Kicinski says: ==================== page_pool: allow caching from safely localized NAPI I went back to the explicit "are we in NAPI method", mostly because I don't like having both around :( (even tho I maintain that in_softirq() && !in_hardirq() is as safe, as softirqs do not nest). Still returning the skbs to a CPU, tho, not to the NAPI instance. I reckon we could create a small refcounted struct per NAPI instance which would allow sockets and other users so hold a persisent and safe reference. But that's a bigger change, and I get 90+% recycling thru the cache with just these patches (for RR and streaming tests with 100% CPU use it's almost 100%). Some numbers for streaming test with 100% CPU use (from previous version, but really they perform the same): HW-GRO page=page before after before after recycle: cached: 0 138669686 0 150197505 cache_full: 0 223391 0 74582 ring: 138551933 9997191 149299454 0 ring_full: 0 488 3154 127590 released_refcnt: 0 0 0 0 alloc: fast: 136491361 148615710 146969587 150322859 slow: 1772 1799 144 105 slow_high_order: 0 0 0 0 empty: 1772 1799 144 105 refill: 2165245 156302 2332880 2128 waive: 0 0 0 0 v1: https://lore.kernel.org/all/20230411201800.596103-1-kuba@kernel.org/ rfcv2: https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230413042605.895677-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:14 -07:00
Jakub Kicinski	294e39e0d0	bnxt: hook NAPIs to page pools bnxt has 1:1 mapping of page pools and NAPIs, so it's safe to hoook them up together. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Jakub Kicinski	8c48eea3ad	page_pool: allow caching from safely localized NAPI Recent patches to mlx5 mentioned a regression when moving from driver local page pool to only using the generic page pool code. Page pool has two recycling paths (1) direct one, which runs in safe NAPI context (basically consumer context, so producing can be lockless); and (2) via a ptr_ring, which takes a spin lock because the freeing can happen from any CPU; producer and consumer may run concurrently. Since the page pool code was added, Eric introduced a revised version of deferred skb freeing. TCP skbs are now usually returned to the CPU which allocated them, and freed in softirq context. This places the freeing (producing of pages back to the pool) enticingly close to the allocation (consumer). If we can prove that we're freeing in the same softirq context in which the consumer NAPI will run - lockless use of the cache is perfectly fine, no need for the lock. Let drivers link the page pool to a NAPI instance. If the NAPI instance is scheduled on the same CPU on which we're freeing - place the pages in the direct cache. With that and patched bnxt (XDP enabled to engage the page pool, sigh, bnxt really needs page pool work :() I see a 2.6% perf boost with a TCP stream test (app on a different physical core than softirq). The CPU use of relevant functions decreases as expected: page_pool_refill_alloc_cache 1.17% -> 0% _raw_spin_lock 2.41% -> 0.98% Only consider lockless path to be safe when NAPI is scheduled - in practice this should cover majority if not all of steady state workloads. It's usually the NAPI kicking in that causes the skb flush. The main case we'll miss out on is when application runs on the same CPU as NAPI. In that case we don't use the deferred skb free path. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Jakub Kicinski	b07a2d97ba	net: skb: plumb napi state thru skb freeing paths We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb(). Going forward we will also try to cache head and data pages. Plumb the "are we in a normal NAPI context" information thru deeper into the freeing path, up to skb_release_data() and skb_free_head()/skb_pp_recycle(). The "not normal NAPI context" comes from netpoll which passes budget of 0 to try to reap the Tx completions but not perform any Rx. Use "bool napi_safe" rather than bare "int budget", the further we get from NAPI the more confusing the budget argument may seem (particularly whether 0 or MAX is the correct value to pass in when not in NAPI). Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Yevgeny Kliteynik	220ae98783	net/mlx5: DR, Enable patterns and arguments for supporting devices Check if patterns and arguments for modify header action are supported and enable them accordingly. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	a21e52bb8f	net/mlx5: DR, Add support for the pattern/arg parameters in debug dump Support the pattern/args-based MODIFY_HDR and TNL_L3_TO_L2 actions in dbg dump Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	40ff097f25	net/mlx5: DR, Modify header action of size 1 optimization Set modify header action of size 1 directly on the STE for supporting devices, thus reducing number of hops and cache misses. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	947e258537	net/mlx5: DR, Support decap L3 action using pattern / arg mechanism Use the new accelerated action for decap L3 on RX side: use the mechanism of pattern and argument same as in modify-header action. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	62e40c8568	net/mlx5: DR, Apply new accelerated modify action and decapl3 If there is support for pattern/args, use the new accelerated modify header action for modify header and decap L3 actions. Otherwise fall back to the old modify-header implementation. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	0caebadda5	net/mlx5: DR, Add modify header argument pointer to actions attributes While building the actions, add the pointer of the arguments for accelerated modify list action into the action's attributes. This will be used later on while building the specific STE for this action. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00

1 2 3 4 5 ...

1172569 Commits