linux

iv/linux

Author	SHA1	Message	Date
Jacob Keller	d40fd60093	ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp In the event of a PTP clock time change due to .adjtime or .settime, the ice driver needs to update the cached copy of the PHC time and also discard any outstanding Tx timestamps. This is required because otherwise the wrong copy of the PHC time will be used when extending the Tx timestamp. This could result in reporting incorrect timestamps to the stack. The current approach taken to handle this is to call ice_ptp_flush_tx_tracker, which will discard any timestamps which are not yet complete. This is problematic for two reasons: 1) it could lead to a potential race condition where the wrong timestamp is associated with a future packet. This can occur with the following flow: 1. Thread A gets request to transmit a timestamped packet, and picks an index and transmits the packet 2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use, marking is as disarded. No timestamp read occurs because the status bit is not set, but the index is released for re-use 3. Thread A gets a new request to transmit another timestamped packet, picks the same (now unused) index and transmits that packet. 4. The PHY transmits the first packet and updates the timestamp slot and generates an interrupt. 5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a valid timestamp but associates it with the new Tx SKB and not the one that actual timestamp for the packet as expected. This could result in the previous timestamp being assigned to a new packet producing incorrect timestamps and leading to incorrect behavior in PTP applications. This is most likely to occur when the packet rate for Tx timestamp requests is very high. 2) on E822 hardware, we must avoid reading a timestamp index more than once each time its status bit is set and an interrupt is generated by hardware. We do have some extensive checks for the unread flag to ensure that only one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads read the timestamp. However, even with this we can still have cases where we "flush" a timestamp that was actually completed in hardware. This can lead to cases where we don't read the timestamp index as appropriate. To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker outside of the teardown path. Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap, the stale bitmap. Start this as cleared when we begin a new timestamp request. When we're about to extend a timestamp and send it up to the stack, first check to see if that stale bit was set. If so, drop the timestamp without sending it to the stack. When we need to update the cached PHC timestamp out of band, just mark all currently outstanding timestamps as stale. This will ensure that once hardware completes the timestamp we'll ignore it correctly and avoid reporting bogus timestamps to userspace. With this change, we fix potential issues caused by calling ice_ptp_flush_tx_tracker during normal operation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	c1f3414df2	ice: cleanup allocations in ice_ptp_alloc_tx_tracker The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and the bitmap for tracking the currently in use indexes. A future change is going to add yet another allocation to this function. If these allocations fail we need to ensure that we properly cleanup and ensure that the pointers in the ice_ptp_tx structure are NULL. Simplify this logic by allocating to local variables first. If any allocation fails, then free everything and exit. Only update the ice_ptp_tx structure if all allocations succeed. This ensures that we have no side effects on the Tx structure unless all allocations have succeeded. Thus, no code will see an invalid pointer and we don't need to re-assign NULL on cleanup. This is safe because kernel "free" functions are designed to be NULL safe and perform no action if passed a NULL pointer. Thus its safe to simply always call kfree or bitmap_free even if one of those pointers was NULL. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:15:03 -08:00
Jacob Keller	3ad5c10bf2	ice: protect init and calibrating check in ice_ptp_request_ts When requesting a new timestamp, the ice_ptp_request_ts function does not hold the Tx tracker lock while checking init and calibrating. This means that we might issue a new timestamp request just after the Tx timestamp tracker starts being deinitialized. This could lead to incorrect access of the timestamp structures. Correct this by moving the init and calibrating checks under the lock, and updating the flows which modify these fields to use the lock. Note that we do not need to hold the lock while checking for tx->init in ice_ptp_tx_tstamp. This is because the teardown function will use synchronize_irq after clearing the flag to ensure that the threaded interrupt completes. Either a) the tx->init flag will be cleared before the ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the threaded interrupt will be executing and the synchronize_irq will wait until the threaded interrupt has completed at which point we know the init field has definitely been set and new interrupts will not execute the Tx timestamp thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 13:12:54 -08:00
Jacob Keller	f0ae124019	ice: synchronize the misc IRQ when tearing down Tx tracker Since commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") the ice driver has used a threaded IRQ for handling Tx timestamps. This change did not add a call to synchronize_irq during ice_ptp_release_tx_tracker. Thus it is possible that an interrupt could occur just as the tracker is being removed. This could lead to a use-after-free of the Tx tracker structure data. Fix this by calling sychronize_irq in ice_ptp_release_tx_tracker after we've cleared the init flag. In addition, make sure that we re-check the init flag at the end of ice_ptp_tx_tstamp before we exit ensuring that we will stop polling for new timestamps once the tracker de-initialization has begun. Refactor the ts_handled variable into "more_timestamps" so that we can simply directly assign this boolean instead of relying on an initialized value of true. This makes the new combined check easier to read. With this change, the ice_ptp_release_tx_tracker function will now wait for the threaded interrupt to complete if it was executing while the init flag was cleared. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	10e4b4a3a3	ice: check Tx timestamp memory register for ready timestamps The PHY for E822 based hardware has a register which indicates which timestamps are valid in the PHY timestamp memory block. Each bit in the register indicates whether the associated index in the timestamp memory is valid. Hardware sets this bit when the timestamp is captured, and clears the bit when the timestamp is read. Use of this register is important as reading timestamp registers can impact the way that hardware generates timestamp interrupts. This occurs because the PHY has an internal value which is incremented when hardware captures a timestamp and decremented when software reads a timestamp. Reading timestamps which are not marked as valid still decrement the internal value and can result in the Tx timestamp interrupt not triggering in the future. To prevent this, use the timestamp memory value to determine which timestamps are ready to be read. The ice_get_phy_tx_tstamp_ready function reads this value. For E810 devices, this just always returns with all bits set. Skip any timestamp which is not set in this bitmap, avoiding reading extra timestamps on E822 devices. The stale check against a cached timestamp value is no longer necessary for PHYs which support the timestamp ready bitmap properly. E810 devices still need this. Introduce a new verify_cached flag to the ice_ptp_tx structure. Use this to determine if we need to perform the verification against the cached timestamp value. Set this to 1 for the E810 Tx tracker init function. Notice that many of the fields in ice_ptp_tx are simple 1 bit flags. Save some structure space by using bitfields of length 1 for these values. Modify the ICE_PTP_TS_VALID check to simply drop the timestamp immediately so that in an event of getting such an invalid timestamp the driver does not attempt to re-read the timestamp again in a future poll of the register. With these changes, the driver now reads each timestamp register exactly once, and does not attempt any re-reads. This ensures the interrupt tracking logic in the PHY will not get stuck. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	0dd9286263	ice: handle discarding old Tx requests in ice_ptp_tx_tstamp Currently the driver uses the PTP kthread to process handling and discarding of stale Tx timestamp requests. The function ice_ptp_tx_tstamp_cleanup is used for this. A separate thread creates complications for the driver as we now have both the main Tx timestamp processing IRQ checking timestamps as well as the kthread. Rather than using the kthread to handle this, simply check for stale timestamps within the ice_ptp_tx_tstamp function. This function must already process the timestamps anyways. If a Tx timestamp has been waiting for 2 seconds we simply clear the bit and discard the SKB. This avoids the complication of having separate threads polling, reducing overall CPU work. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	6b1ff5d392	ice: always call ice_ptp_link_change and make it void The ice_ptp_link_change function is currently only called for E822 based hardware. Future changes are going to extend this function to perform additional tasks on link change. Always call this function, moving the E810 check from the callers down to just before we call the E822-specific function required to restart the PHY. This function also returns an error value, but none of the callers actually check it. In general, the errors it produces are more likely systemic problems such as invalid or corrupt port numbers. No caller checks these, and so no warning is logged. Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be set except when ICE_FLAG_PTP_SUPPORTED is set. Convert the port checks to WARN_ON_ONCE, in order to generate a kernel stack trace when they are hit. Convert the function to void since no caller actually checks these return values. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Jacob Keller	11722c39c8	ice: fix misuse of "link err" with "link status" The ice_ptp_link_change function has a comment which mentions "link err" when referring to the current link status. We are storing the status of whether link is up or down, which is not an error. It is appears that this use of err accidentally got included due to an overzealous search and replace when removing the ice_status enum and local status variable. Fix the wording to use the correct term. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Karol Kolacinski	407b66c07e	ice: Reset TS memory for all quads In E822 products, the owner PF should reset memory for all quads, not only for the one where assigned lport is. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Milena Olech	0357d5cab8	ice: Remove the E822 vernier "bypass" logic The E822 devices support an extended "vernier" calibration which enables higher precision timestamps by accounting for delays in the PHY, and compensating for them. These delays are measured by hardware as part of its vernier calibration logic. The driver currently starts the PHY in "bypass" mode which skips the compensation. Then it later attempts to switch from bypass to vernier. This unfortunately does not work as expected. Instead of properly compensating for the delays, the hardware continues operating in bypass without the improved precision expected. Because we cannot dynamically switch between bypass and vernier mode, refactor the driver to always operate in vernier mode. This has a slight downside: Tx timestamp and Rx timestamp requests that occur as the very first packet set after link up will not complete properly and may be reported to applications as missing timestamps. This occurs frequently in test environments where traffic is light or targeted specifically at testing PTP. However, in practice most environments will have transmitted or received some data over the network before such initial requests are made. Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Sergey Temerkhanov	6b5cbc8c4e	ice: Use more generic names for ice_ptp_tx fields Some supported devices have per-port timestamp memory blocks while others have shared ones within quads. Rename the struct ice_ptp_tx fields to reflect the block entities it works with Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-12-08 11:13:21 -08:00
Oleksij Rempel	bde55dd9cc	net: dsa: microchip: add stats64 support for ksz8 series of switches Add stats64 support for ksz8xxx series of switches. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20221205052904.2834962-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-08 08:27:27 -08:00
Jakub Kicinski	d8b879c00f	Merge branch 'net-ethernet-ti-am65-cpsw-fix-set-channel-operation' Roger Quadros says: ==================== net: ethernet: ti: am65-cpsw: Fix set channel operation This contains a critical bug fix for the recently merged suspend/resume support [1] that broke set channel operation. (ethtool -L eth0 tx <n>) As there were 2 dependent patches on top of the offending commit [1] first revert them and then apply them back after the correct fix. [1] fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") ==================== Link: https://lore.kernel.org/r/20221206094419.19478-1-rogerq@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:35 -08:00
Roger Quadros	020b232f79	net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume On low power during system suspend the ALE table context is lost. Save the ALE context before suspend and restore it after resume. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Roger Quadros	1581cd8b11	net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume During suspend resume the context of PORT_VLAN_REG is lost so save it during suspend and restore it during resume for host port and slave ports. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Roger Quadros	24bc19b05f	net: ethernet: ti: am65-cpsw: Add suspend/resume support Add PM handlers for System suspend/resume. As DMA driver doesn't yet support suspend/resume we free up the DMA channels at suspend and acquire and initialize them at resume. In this revised approach we do not free the TX/RX IRQs at am65_cpsw_nuss_common_stop() as it causes problems. We will now free them only on .suspend() as we need to release the DMA channels (as DMA looses context) and re-acquiring them on .resume() may not necessarily give us the same IRQs. To make this easier: - introduce am65_cpsw_nuss_remove_rx_chns() which is similar to am65_cpsw_nuss_remove_tx_chns(). These will be invoked in pm.suspend() to release the DMA channels and free up the IRQs. - move napi_add() and request_irq() calls to am65_cpsw_nuss_init_rx/tx_chns() so we can invoke them in pm.resume() to acquire the DMA channels and IRQs. As CPTS looses contect during suspend/resume, invoke the necessary CPTS suspend/resume helpers. ALE_CLEAR command is issued in cpsw_ale_start() so no need to issue it before the call to cpsw_ale_start(). Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Roger Quadros	1a014663e7	Revert "net: ethernet: ti: am65-cpsw: Add suspend/resume support" This reverts commit fd23df72f2be317d38d9fde0a8996b8e7454fd2a. This commit broke set channel operation. Revert this and implement it with a different approach in a separate patch. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Roger Quadros	1bae8fa8c4	Revert "net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume" This reverts commit 643cf0e3ab5ccee37b3c53c018bd476c45c4b70e. This is to make it easier to revert the offending commit fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Roger Quadros	1a35259672	Revert "net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume" This reverts commit 1af3cb3702d02167926a2bd18580cecb2d64fd94. This is to make it easier to revert the offending commit fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:17:31 -08:00
Jakub Kicinski	e1228581b3	Merge branch 'devlink-add-port-function-attribute-to-enable-disable-roce-and-migratable' Shay Drory says: ==================== devlink: Add port function attribute to enable/disable Roce and migratable This series is a complete rewrite of the series "devlink: Add port function attribute to enable/disable roce" link: https://lore.kernel.org/netdev/20221102163954.279266-1-danielj@nvidia.com/ Currently mlx5 PCI VF and SF are enabled by default for RoCE functionality. And mlx5 PCI VF is disable by dafault for migratable functionality. Currently a user does not have the ability to disable RoCE for a PCI VF/SF device before such device is enumerated by the driver. User is also incapable to do such setting from smartnic scenario for a VF from the smartnic. Current 'enable_roce' device knob is limited to do setting only at driverinit time. By this time device is already created and firmware has already allocated necessary system memory for supporting RoCE. Also, Currently a user does not have the ability to enable migratable for a PCI VF. The above are a hyper visor level control, to set the functionality of devices passed through to guests. This is achieved by extending existing 'port function' object to control capabilities of a function. This enables users to control capability of the device before enumeration. Examples when user prefers to disable RoCE for a VF when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0 pfnum 0 vfnum 0 external false splittable false function: hw_addr 00:00:00:00:00:00 roce enable $ devlink port function set pci/0000:06:00.0/1 roce disable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0 pfnum 0 vfnum 0 external false splittable false function: hw_addr 00:00:00:00:00:00 roce disable FAQs: ----- 1. What does roce enable/disable do? Ans: It disables RoCE capability of the function before its enumerated, so when driver reads the capability from the device firmware, it is disabled. At this point RDMA stack will not be able to create UD, QP1, RC, XRC type of QPs. When RoCE is disabled, the GID table of all ports of the device is disabled in the device and software stack. 2. How is the roce 'port function' option different from existing devlink param? Ans: RoCE attribute at the port function level disables the RoCE capability at the specific function level; while enable_roce only does at the software level. 3. Why is this option for disabling only RoCE and not the whole RDMA device? Ans: Because user still wants to use the RDMA device for non RoCE commands in more memory efficient way. ==================== Link: https://lore.kernel.org/r/20221206185119.380138-1-shayd@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:21 -08:00
Shay Drory	e5b9642a33	net/mlx5: E-Switch, Implement devlink port function cmds to control migratable Implement devlink port function commands to enable / disable migratable. This is used to control the migratable capability of the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Shay Drory	a8ce7b26a5	devlink: Expose port function commands to control migratable Expose port function commands to enable / disable migratable capability, this is used to set the port function as migratable. Live migration is the process of transferring a live virtual machine from one physical host to another without disrupting its normal operation. In order for a VM to be able to perform LM, all the VM components must be able to perform migration. e.g.: to be migratable. In order for VF to be migratable, VF must be bound to VFIO driver with migration support. When migratable capability is enabled for a function of the port, the device is making the necessary preparations for the function to be migratable, which might include disabling features which cannot be migrated. Example of LM with migratable function configuration: Set migratable of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable disable $ devlink port function set pci/0000:06:00.0/2 migratable enable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable enable Bind VF to VFIO driver with migration support: $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind Attach VF to the VM. Start the VM. Perform LM. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Yishai Hadas	7db98396ef	net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Implement devlink port function commands to enable / disable RoCE. This is used to control the RoCE device capabilities. This patch implement infrastructure which will be used by downstream patches that will add additional capabilities. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Shay Drory	47d0c500d7	net/mlx5: Add generic getters for other functions caps Downstream patch requires to get other function GENERAL2 caps while mlx5_vport_get_other_func_cap() gets only one type of caps (general). Rename it to represent this and introduce a generic implementation of mlx5_vport_get_other_func_cap(). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Shay Drory	da65e9ff3b	devlink: Expose port function commands to control RoCE Expose port function commands to enable / disable RoCE, this is used to control the port RoCE device capabilities. When RoCE is disabled for a function of the port, function cannot create any RoCE specific resources (e.g GID table). It also saves system memory utilization. For example disabling RoCE enable a VF/SF saves 1 Mbytes of system memory per function. Example of a PCI VF port which supports function configuration: Set RoCE of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce enable $ devlink port function set pci/0000:06:00.0/2 roce disable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce disable Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Shay Drory	875cd5eeba	devlink: Move devlink port function hw_addr attr documentation devlink port function hw_addr attr documentation is in mlx5 specific file while there is nothing mlx5 specific about it. Move it to devlink-port.rst. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Shay Drory	c0bea69d1c	devlink: Validate port function request In order to avoid partial request processing, validate the request before processing it. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:18 -08:00
Yishai Hadas	df268f6ca7	net/mlx5: Introduce IFC bits for migratable Introduce IFC related capabilities to enable setting VF to be able to perform live migration. e.g.: to be migratable. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:09:17 -08:00
Jakub Kicinski	5955a948ac	Merge branch 'bridge-mcast-preparations-for-evpn-extensions' Ido Schimmel says: ==================== bridge: mcast: Preparations for EVPN extensions This patchset was split from [1] and includes non-functional changes aimed at making it easier to add additional netlink attributes later on. Future extensions are available here [2]. The idea behind these patches is to create an MDB configuration structure into which netlink messages are parsed into. The structure is then passed in the entry creation / deletion call chain instead of passing the netlink attributes themselves. The same pattern is used by other rtnetlink objects such as routes and nexthops. I initially tried to extend the current code, but it proved to be too difficult, which is why I decided to refactor it to the extensible and familiar pattern used by other rtnetlink objects. Tested using existing selftests and using a new selftest that will be submitted together with the planned extensions. [1] https://lore.kernel.org/netdev/20221018120420.561846-1-idosch@nvidia.com/ [2] https://github.com/idosch/linux/commits/submit/mdb_v1 ==================== Link: https://lore.kernel.org/r/20221206105809.363767-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:54 -08:00
Ido Schimmel	f86c3e2c1b	bridge: mcast: Constify 'group' argument in br_multicast_new_port_group() The 'group' argument is not modified, so mark it as 'const'. It will allow us to constify arguments of the callers of this function in future patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:52 -08:00
Ido Schimmel	090149eaf3	bridge: mcast: Remove redundant function arguments Drop the first three arguments and instead extract them from the MDB configuration structure. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:52 -08:00
Ido Schimmel	4c1ebc6c1f	bridge: mcast: Move checks out of critical section The checks only require information parsed from the RTM_NEWMDB netlink message and do not rely on any state stored in the bridge driver. Therefore, there is no need to perform the checks in the critical section under the multicast lock. Move the checks out of the critical section. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:52 -08:00
Ido Schimmel	3ee5662345	bridge: mcast: Remove br_mdb_parse() The parsing of the netlink messages and the validity checks are now performed in br_mdb_config_init() so we can remove br_mdb_parse(). This finally allows us to stop passing netlink attributes deep in the MDB control path and only use the MDB configuration structure. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
Ido Schimmel	9f52a51429	bridge: mcast: Use MDB group key from configuration structure The MDB group key (i.e., {source, destination, protocol, VID}) is currently determined under the multicast lock from the netlink attributes. Instead, use the group key from the MDB configuration structure that was prepared before acquiring the lock. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
Ido Schimmel	8bd9c08e32	bridge: mcast: Propagate MDB configuration structure further As an intermediate step towards only using the new MDB configuration structure, pass it further in the control path instead of passing individual attributes. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
Ido Schimmel	f2b5aac681	bridge: mcast: Use MDB configuration structure where possible The MDB configuration structure (i.e., struct br_mdb_config) now includes all the necessary information from the parsed RTM_{NEW,DEL}MDB netlink messages, so use it. This will later allow us to delete the calls to br_mdb_parse() from br_mdb_add() and br_mdb_del(). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
Ido Schimmel	3866116815	bridge: mcast: Remove redundant checks These checks are now redundant as they are performed by br_mdb_config_init() while parsing the RTM_{NEW,DEL}MDB messages. Remove them. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
Ido Schimmel	cb45392686	bridge: mcast: Centralize netlink attribute parsing Netlink attributes are currently passed deep in the MDB creation call chain, making it difficult to add new attributes. In addition, some validity checks are performed under the multicast lock although they can be performed before it is ever acquired. As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB messages into a configuration structure, relieving other functions from the need to handle raw netlink attributes. Subsequent patches will convert the MDB code to use this configuration structure. This is consistent with how other rtnetlink objects are handled, such as routes and nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:05:51 -08:00
ye xingchen	16dc16d9f0	net: ethernet: use sysfs_emit() to instead of scnprintf() Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/202212051918564721658@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 20:02:44 -08:00
Jakub Kicinski	cfbf877a33	Merge tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== ieee802154-next 2022-12-05 Miquel continued his work towards full scanning support. For this, we now allow the creation of dedicated coordinator interfaces to allow a PAN coordinator to serve in the network and set the needed address filters with the hardware. On top of this we have the first part to allow scanning for available 15.4 networks. A new netlink scan group, within the existing nl802154 API, was added. In addition Miquel fixed two issues that have been introduced in the former patches to free an skb correctly and clarifying an expression in the stack. From David Girault we got tracing support when registering new PANs. * tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: mac802154: Trace the registration of new PANs ieee802154: Advertize coordinators discovery mac802154: Allow the creation of coordinator interfaces mac802154: Clarify an expression mac802154: Move an skb free within the rx path ==================== Link: https://lore.kernel.org/r/20221205131909.1871790-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 17:33:26 -08:00
Oleksij Rempel	5608e0a817	net: asix: add support for the Linux Automation GmbH USB 10Base-T1L Add ASIX based USB 10Base-T1L adapter support: https://linux-automation.com/en/products/usb-t1l.html Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20221205132102.2941732-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-12-07 17:12:09 -08:00
Paolo Abeni	a2220b5458	Merge branch 'cn10kb-mac-block-support' Hariprasad Kelam says: ==================== CN10KB MAC block support OcteonTx2's next gen platform the CN10KB has RPM_USX MAC which has a different serdes when compared to RPM MAC. Though the underlying HW is different, the CSR interface has been designed largely inline with RPM MAC, with few exceptions though. So we are using the same CGX driver for RPM_USX MAC as well and will have a different set of APIs for RPM_USX where ever necessary. The RPM and RPM_USX blocks support a different number of LMACS. RPM_USX support 8 LMACS per MAC block whereas legacy RPM supports only 4 LMACS per MAC. with this RPM_USX support double the number of DMAC filters and fifo size. This patchset adds initial support for CN10KB's RPM_USX MAC i.e registering the driver and defining MAC operations (mac_ops). With these changes PF and VF netdev packet path will work and PF and VF netdev drivers are able to configure MAC features like pause frames,PFC and loopback etc. Also implements FEC stats for CN10K Mac block RPM and CN10KB Mac block RPM_USX and extends ethtool support for PF and VF drivers by defining get_fec_stats API to display FEC stats. ==================== Link: https://lore.kernel.org/r/20221205070521.21860-1-hkelam@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 12:28:06 +01:00
Hariprasad Kelam	84ad364211	octeontx2-af: Add FEC stats for RPM/RPM_USX block CN10K silicon MAC block RPM and CN10KB silicon MAC block RPM_USX both support BASER and RSFEC modes. Also MAC (CGX) on OcteonTx2 silicon variants and MAC (RPM) on OcteonTx3 CN10K are different and FEC stats need to be read differently. CN10KB MAC block (RPM_USX) fec csr offsets are same as CN10K MAC block (RPM) mac_ops points to same fn(). Upper layer interface between RVU AF and PF netdev is kept same. Based on silicon variant appropriate fn() pointer is called to read FEC stats Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 12:24:29 +01:00
Hariprasad Kelam	b441c4ac5e	octeontx2-pf: ethtool: Implement get_fec_stats This patch registers a callback for get_fec_stats such that FEC stats can be queried from the below command "ethtool -I --show-fec eth0" Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 12:24:29 +01:00
Hariprasad Kelam	b9d0fedc62	octeontx2-af: cn10kb: Add RPM_USX MAC support OcteonTx2's next gen platform the CN10KB has RPM_USX MAC which has a different serdes when compared to RPM MAC. Though the underlying HW is different, the CSR interface has been designed largely inline with RPM MAC, with few exceptions though. So we are using the same CGX driver for RPM_USX MAC as well and will have a different set of APIs for RPM_USX where ever necessary. The RPM and RPM_USX blocks support a different number of LMACS. RPM_USX support 8 LMACS per MAC block whereas legacy RPM supports only 4 LMACS per MAC. with this RPM_USX support double the number of DMAC filters and fifo size. This patch adds initial support for CN10KB's RPM_USX MAC i.e registering the driver and defining MAC operations (mac_ops). Adds the logic to configure internal loopback and pause frames and assign FIFO length to LMACS. Kernel reads lmac features like lmac type, autoneg, etc from shared firmware data this structure only supports 4 lmacs per MAC, this patch extends this structure to accommodate 8 lmacs. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 12:24:29 +01:00
Rakesh Babu Saladi	f2e664ad50	octeontx2-af: Support variable number of lmacs Most of the code in CGX/RPM driver assumes that max lmacs per given MAC as always, 4 and the number of MAC blocks also as 4. With this assumption, the max number of interfaces supported is hardcoded to 16. This creates a problem as next gen CN10KB silicon MAC supports 8 lmacs per MAC block. This patch solves the problem by using "max lmac per MAC block" value from constant csrs and uses cgx_cnt_max value which is populated based number of MAC blocks supported by silicon. Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 12:24:29 +01:00
Paolo Abeni	f82389eecd	Merge branch 'net-dsa-microchip-add-mtu-support-for-ksz8-series' Oleksij Rempel says: ==================== net: dsa: microchip: add MTU support for KSZ8 series ==================== Link: https://lore.kernel.org/r/20221205052232.2834166-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 11:58:00 +01:00
Oleksij Rempel	55a952eef7	net: dsa: microchip: ksz8: move all DSA configurations to one location To make the code more comparable to KSZ9477 code, move DSA configurations to the same location. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 11:57:58 +01:00
Oleksij Rempel	6b30cfa86e	net: dsa: microchip: enable MTU normalization for KSZ8795 and KSZ9477 compatible switches KSZ8795 and KSZ9477 compatible series of switches use global max frame size configuration register. So, enable MTU normalization for this reason. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 11:57:58 +01:00
Oleksij Rempel	29d1e85f45	net: dsa: microchip: ksz8: add MTU configuration support Make MTU configurable on KSZ87xx and KSZ88xx series of switches. Before this patch, pre-configured behavior was different on different switch series, due to opposite meaning of the same bit: - KSZ87xx: Reg 4, Bit 1 - if 1, max frame size is 1532; if 0 - 1514 - KSZ88xx: Reg 4, Bit 1 - if 1, max frame size is 1514; if 0 - 1532 Since the code was telling "... SW_LEGAL_PACKET_DISABLE, true)", I assume, the idea was to set max frame size to 1532. With this patch, by setting MTU size 1500, both switch series will be configured to the 1532 frame limit. This patch was tested on KSZ8873. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-12-07 11:57:58 +01:00

1 2 3 4 5 ...

1140294 Commits