118425 Commits

Author SHA1 Message Date
Russell King (Oracle)
5767c6a8d9 net: dsa: ocelot: use lynx_pcs_create_mdiodev()
Use the newly introduced lynx_pcs_create_mdiodev() which simplifies the
creation and destruction of the lynx PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 21:46:53 -07:00
Russell King (Oracle)
86b5f2d8cd net: pcs: lynx: add lynx_pcs_create_mdiodev()
Add lynx_pcs_create_mdiodev() to simplify the creation of the mdio
device associated with lynx PCS. In order to allow lynx_pcs_destroy()
to clean this up, we need to arrange for lynx_pcs_create() to take a
refcount on the mdiodev, and lynx_pcs_destroy() to put it.

Adding the refcounting to lynx_pcs_create()..lynx_pcs_destroy() will
be transparent to existing users of these interfaces.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 21:46:53 -07:00
Russell King (Oracle)
727e373f89 net: stmmac: use xpcs_create_mdiodev()
Use the new xpcs_create_mdiodev() creator, which simplifies the
creation and destruction of the mdio device associated with xpcs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 21:46:53 -07:00
Russell King (Oracle)
9a5d500cff net: pcs: xpcs: add xpcs_create_mdiodev()
Add xpcs_create_mdiodev() to simplify the creation of the mdio device
associated with the XPCS. In order to allow xpcs_destroy() to clean
this up, we need to arrange for xpcs_create() to take a refcount on
the mdiodev, and xpcs_destroy() to put it.

Adding the refcounting to xpcs_create()..xpcs_destroy() will be
transparent to existing users of these interfaces.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29 21:46:53 -07:00
Florian Fainelli
4781e965e6 net: phy: broadcom: Register dummy IRQ handler
In order to have our interrupt descriptor fully setup and in particular
the action, ensure that we register a full fledged interrupt handler.
This also allow us to set the interrupt polarity and flow through the
same call.

This is specifically necessary for kernel/irq/pm.c::suspend_device_irq
to set the interrupt descriptor to the IRQD_WAKEUP_ARMED state and
enable the interrupt for wake-up since it was still in a disabled state.

Without an interrupt descriptor we would have ran into cases where the
wake-up interrupt is not capable of waking up the system, specifically
if we resumed the system ACPI S5 using the Ethernet PHY. In that case
the Ethernet PHY interrupt would be pending by the time the kernel
booted, which it would acknowledge but then we could never use it as
a wake-up source again.

Fixes: 8baddaa9d4ba ("net: phy: broadcom: Add support for Wake-on-LAN")
Suggested-by: Doug Berger <doug.berger@broadcom.com>
Debugged-by: Doug Berger <doug.berger@broadcom.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26 12:17:06 +01:00
Russell King (Oracle)
267d7692f6 net: dsa: mv88e6xxx: move link forcing to mac_prepare/mac_finish
Move the link forcing out of mac_config() and into the mac_prepare()
and mac_finish() methods. This results in no change to the order in
which these operations are performed, but does mean when we convert
mv88e6xxx to phylink_pcs support, we will continue to preserve this
ordering.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26 10:39:40 +01:00
Pieter Jansen van Vuuren
ca7d05007d sfc: handle VI shortage on ef100 by readjusting the channels
When fewer VIs are allocated than what is allowed we can readjust
the channels by calling efx_mcdi_alloc_vis() again.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26 10:15:30 +01:00
Jakub Kicinski
d4031ec844 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/raw.c
  3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
  c85be08fc4fa ("raw: Stop using RTO_ONLINK.")
https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/freescale/fec_main.c
  9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values")
  144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25 19:57:39 -07:00
Christophe JAILLET
623a713853 net/mlx4: Use bitmap_weight_and()
Use bitmap_weight_and() instead of hand writing it.

This saves a few LoC and is slightly faster, should it mater.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/a29c2348a062408bec45cee2601b2417310e5ea7.1684865809.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-25 13:35:08 +02:00
Russell King (Oracle)
ae4899bb48 net: phylink: provide phylink_pcs_config() and phylink_pcs_link_up()
Add two helper functions for calling PCS methods. phylink_pcs_config()
allows us to handle PCS configuration specifics in one location, rather
than the two call sites. phylink_pcs_link_up() gives us consistency.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q1TzK-007Exd-Rs@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:23:29 -07:00
David Epping
71460c9ec5 net: phy: mscc: enable VSC8501/2 RGMII RX clock
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.

For other PHYs supported by this driver the clock output is enabled
by default.

Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:14:23 -07:00
David Epping
7df0b33d79 net: phy: mscc: remove unnecessary phydev locking
Holding the struct phy_device (phydev) lock is unnecessary when
accessing phydev->interface in the PHY driver .config_init method,
which is the only place that vsc85xx_rgmii_set_skews() is called from.

The phy_modify_paged() function implements required MDIO bus level
locking, which can not be achieved by a phydev lock.

Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:14:10 -07:00
David Epping
fb055ce4a9 net: phy: mscc: add support for VSC8501
The VSC8501 PHY can use the same driver implementation as the VSC8502.
Adding the PHY ID and copying the handler functions of VSC8502 is
sufficient to operate it.

Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:14:10 -07:00
David Epping
57fb54ab9f net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
The mscc driver implements support for VSC8502, so its ID should be in
the MODULE_DEVICE_TABLE for automatic loading.

Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 22:14:10 -07:00
Russell King (Oracle)
883a98ede4 net: pcs: xpcs: avoid reading STAT1 more than once
Avoid reading the STAT1 registers more than once while getting the PCS
state, as this register contains latching-low bits that are lost after
the first read.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:23 -07:00
Russell King (Oracle)
21234ef166 net: pcs: xpcs: use phylink_resolve_c73() helper
Use phylink_resolve_c73() to resolve the clause 73 autonegotiation
result.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:23 -07:00
Russell King (Oracle)
428d603fca net: pcs: xpcs: correct pause resolution
xpcs was indicating symmetric pause should be enabled regardless of
the advertisements by either party. Fix this to use
linkmode_resolve_pause() now that we're no longer obliterating the
link partner's advertisement by logically anding it with our own.

This is transitional, the function will be entirely replaced with
phylink_resolve_c73() in the following patch.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
1f94ba198b net: pcs: xpcs: correct lp_advertising contents
lp_advertising is supposed to reflect the link partner's advertisement
unmodified by the local advertisement, but xpcs bitwise ands it with
the local advertisement prior to calculating the resolution of the
negotiation.

Fix this by moving the bitwise and to xpcs_resolve_lpa_c73() so it can
place the results in a temporary bitmap before passing that to
ixpcs_get_max_usxgmii_speed().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
3f0360e09c net: pcs: xpcs: use mii_c73_to_linkmode() helper
Convert xpcs clause 73 reading to use the newly introduced
mii_c73_to_linkmode() helper to translate the link partner
advertisement to an ethtool bitmap.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
6f7b89b45f net: pcs: xpcs: clean up reading clause 73 link partner advertisement
Read the clause 73 link partner advertisement in a loop and then
translate to the ethtool modes.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
dad987484e net: phylink: add function to resolve clause 73 negotiation
Add a function to resolve clause 73 negotiation according to the
priority resolution function described in clause 73.3.6.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
dc7a51411e net: phylink: remove duplicated linkmode pause resolution
Phylink had two chunks of code virtually the same for resolving the
negotiated pause modes. Factor this down to one function.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Jiri Pirko
9277649c66 devlink: pass devlink_port pointer to ops->port_del() instead of index
Historically there was a reason why port_dev() along with for example
port_split() did get port_index instead of the devlink_port pointer.
With the locking changes that were done which ensured devlink instance
mutex is hold for every command, the port ops could get devlink_port
pointer directly. Change the forgotten port_dev() op to be as others
and pass devlink_port pointer instead of port_index.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
Jiri Pirko
c496daeb86 devlink: remove duplicate port notification
The notification about created port is send from devl_port_register()
function called from ops->port_new(). No need to send it again here,
so remove the call and the helper function.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
David S. Miller
ba46c96db9 mlx5-fixes-2023-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmRsUT8ACgkQSD+KveBX
 +j61Zwf9GyvzrD29Lmu0/BTsLAnf7GAyJi/SMzXJ09Tp1dAYSWmF2DE3fzKvNoQ/
 VT2udSKbZ96b2N9SGF396KZaV8gHxg23IAzILia1JDPd4Pn7YaNymAIWGU7vn+Tq
 ErG7atPVnJV5R1H6SwO2KpOClG7jOjUPMF87uDCl2g+IpYNgjKa9hcnt5bguztC2
 KBW/sV7BCYVWOUrmlSe1hH2Fn4djhga3i4JBIzjp55Dz1voIu5SHsT13Ou2/UuiC
 1RDqBTJ9WvnviAxICbI96TLMJTFnDo9HFGHPIQRhZ6k25PIuWX6GLMKaceVlfCd+
 BZvRG+PNOsDR9a9tFCjMBfx7KE0bMw==
 =dwjN
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2023-05-22

This series provides bug fixes for the mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:40:14 +01:00
Russell King (Oracle)
59088b5a94 net: phy: avoid kernel warning dump when stopping an errored PHY
When taking a network interface down (or removing a SFP module) after
the PHY has encountered an error, phy_stop() complains incorrectly
that it was called from HALTED state.

The reason this is incorrect is that the network driver will have
called phy_start() when the interface was brought up, and the fact
that the PHY has a problem bears no relationship to the administrative
state of the interface. Taking the interface administratively down
(which calls phy_stop()) is always the right thing to do after a
successful phy_start() call, whether or not the PHY has encountered
an error.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:28:27 +01:00
Arınç ÜNAL
04910d8cbf net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
The commit c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx
offload, only use DSA untagging") makes VLAN RX offloading to be only used
on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621
and MT7622). The commit disables the proper handling of special tagged
(DSA) frames, added with commit 87e3df4961f4 ("net-next: ethernet:
mediatek: add CDM able to recognize the tag for DSA"), for non
MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the
other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the
CPU improperly. This issue can be observed on frames with TCP, for example,
a TCP speed test using iperf3 won't work.

The commit disables the proper handling of special tagged (DSA) frames
because it assumes that these SoCs don't use more than one MAC, which is
wrong. Although I made Frank address this false assumption on the patch log
when they sent the patch on behalf of Felix, the code still made changes
with this assumption.

Therefore, the proper handling of special tagged (DSA) frames must be kept
enabled in all circumstances as it doesn't affect non DSA tagged frames.

Hardware DSA untagging, introduced with the commit 2d7605a72906 ("net:
ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX
offloading are operations on the two CDM components of the frame engine,
CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and
are between the MACs and the CPU. These operations apply to all MACs of the
SoC so if one MAC uses DSA and the other doesn't, the hardware DSA
untagging operation will cause the CDMP component to transmit non DSA
tagged frames to the CPU improperly.

Since the VLAN RX offloading feature configuration was dropped, VLAN RX
offloading can only be used along with hardware DSA untagging. So, for the
case above, we need to disable both features and leave it to the CPU,
therefore software, to untag the DSA and VLAN tags.

So the correct way to handle this is:

For all SoCs:

Enable the proper handling of special tagged (DSA) frames
(MTK_CDMQ_IG_CTRL).

For non MTK_NETSYS_V2 SoCs:

Enable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Enable VLAN RX offloading (MTK_CDMP_EG_CTRL).

When a non MTK_NETSYS_V2 SoC MAC does not use DSA:

Disable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Disable VLAN RX offloading (MTK_CDMP_EG_CTRL).

Fixes: c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:04:41 +01:00
Coco Li
a695641c8e gve: Support IPv6 Big TCP on DQ
Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO
packets. See https://lwn.net/Articles/895398/. This can improve the
throughput and CPU usage.

Perf test result:
ip -d link show $DEV
gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000

For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate.

In single streams when line rate is not saturated, we expect throughput improvements.
When the networking is performing at line rate, we expect cpu usage improvements.

Tcp_stream (unidirectional stream test, T=thread, F=flow):
skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68
skb=64kb,  T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67
skb=180kb, T=1, F=1, yes zerocopy:  throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52
skb=64kb,  T=1, F=1, yes zerocopy:  throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25

skb=180kb, T=20, F=100, no zerocopy:  throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4
skb=64kb,  T=20, F=100, no zerocopy:  throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69
skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7
skb=64kb,  T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 21:11:35 -07:00
Sebastian Andrzej Siewior
d6c36cbc5e r8169: Use a raw_spinlock_t for the register locks.
The driver's interrupt service routine is requested with the
IRQF_NO_THREAD if MSI is available. This means that the routine is
invoked in hardirq context even on PREEMPT_RT. The routine itself is
relatively short and schedules a worker, performs register access and
schedules NAPI. On PREEMPT_RT, scheduling NAPI from hardirq results in
waking ksoftirqd for further processing so using NAPI threads with this
driver is highly recommended since it NULL routes the threaded-IRQ
efforts.

Adding rtl_hw_aspm_clkreq_enable() to the ISR is problematic on
PREEMPT_RT because the function uses spinlock_t locks which become
sleeping locks on PREEMPT_RT. The locks are only used to protect
register access and don't nest into other functions or locks. They are
also not used for unbounded period of time. Therefore it looks okay to
convert them to raw_spinlock_t.

Convert the three locks which are used from the interrupt service
routine to raw_spinlock_t.

Fixes: e1ed3e4d9111 ("r8169: disable ASPM during NAPI poll")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20230522134121.uxjax0F5@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:38:12 -07:00
Jaco Coetzee
57910a47ff nfp: add L4 RSS hashing on UDP traffic
Add layer 4 RSS hashing on UDP traffic to allow for the
utilization of multiple queues for multiple connections on
the same IP address.

Previously, since the introduction of the driver, RSS hashing
was only performed on the source and destination IP addresses
of UDP packets thereby limiting UDP traffic to a single queue
for multiple connections on the same IP address. The transport
layer is now included in RSS hashing for UDP traffic, which
was not previously the case. The reason behind the previous
limitation is unclear - either a historic limitation of the
NFP device, or an oversight.

Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20230522141335.22536-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:36:43 -07:00
Josua Mayer
ac2e8e3cfe net: sfp: add support for HXSX-ATRI-1 copper SFP+ module
Walsun offers commercial ("C") and industrial ("I") variants of
multi-rate copper SFP+ modules.

Add quirk for HXSX-ATRI-1 using same parameters as the already supported
commercial variant HXSX-ATRC-1.

Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230522145242.30192-2-josua@solid-run.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:34:22 -07:00
Horatiu Vultur
6007612459 lan966x: Fix unloading/loading of the driver
It was noticing that after a while when unloading/loading the driver and
sending traffic through the switch, it would stop working. It would stop
forwarding any traffic and the only way to get out of this was to do a
power cycle of the board. The root cause seems to be that the switch
core is initialized twice. Apparently initializing twice the switch core
disturbs the pointers in the queue systems in the HW, so after a while
it would stop sending the traffic.
Unfortunetly, it is not possible to use a reset of the switch here,
because the reset line is connected to multiple devices like MDIO,
SGPIO, FAN, etc. So then all the devices will get reseted when the
network driver will be loaded.
So the fix is to check if the core is initialized already and if that is
the case don't initialize it again.

Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522120038.3749026-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23 13:01:06 +02:00
Ratheesh Kannoth
b2e3406a38 octeontx2-pf: Add support for page pool
Page pool for each rx queue enhance rx side performance
by reclaiming buffers back to each queue specific pool. DMA
mapping is done only for first allocation of buffers.
As subsequent buffers allocation avoid DMA mapping,
it results in performance improvement.

Image        |  Performance
------------ | ------------
Vannila      |   3Mpps
             |
with this    |   42Mpps
change	     |
---------------------------

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://lore.kernel.org/r/20230522020404.152020-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23 10:47:50 +02:00
Shay Drory
1da438c0ae net/mlx5: Fix indexing of mlx5_irq
After the cited patch, mlx5_irq xarray index can be different then
mlx5_irq MSIX table index.
Fix it by storing both mlx5_irq xarray index and MSIX table index.

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Shay Drory
ef8c063cf8 net/mlx5: Fix irq affinity management
The cited patch deny the user of changing the affinity of mlx5 irqs,
which break backward compatibility.
Hence, allow the user to change the affinity of mlx5 irqs.

Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Shay Drory
9c2d080109 net/mlx5: Free irqs only on shutdown callback
Whenever a shutdown is invoked, free irqs only and keep mlx5_irq
synthetic wrapper intact in order to avoid use-after-free on
system shutdown.

for example:
==================================================================
BUG: KASAN: use-after-free in _find_first_bit+0x66/0x80
Read of size 8 at addr ffff88823fc0d318 by task kworker/u192:0/13608

CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  print_report+0x170/0x473
  ? _find_first_bit+0x66/0x80
  kasan_report+0xad/0x130
  ? _find_first_bit+0x66/0x80
  _find_first_bit+0x66/0x80
  mlx5e_open_channels+0x3c5/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

Freed by task 1:
  kasan_save_stack+0x23/0x50
  kasan_set_track+0x21/0x30
  kasan_save_free_info+0x2a/0x40
  ____kasan_slab_free+0x169/0x1d0
  slab_free_freelist_hook+0xd2/0x190
  __kmem_cache_free+0x1a1/0x2f0
  irq_pool_free+0x138/0x200 [mlx5_core]
  mlx5_irq_table_destroy+0xf6/0x170 [mlx5_core]
  mlx5_core_eq_free_irqs+0x74/0xf0 [mlx5_core]
  shutdown+0x194/0x1aa [mlx5_core]
  pci_device_shutdown+0x75/0x120
  device_shutdown+0x35c/0x620
  kernel_restart+0x60/0xa0
  __do_sys_reboot+0x1cb/0x2c0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x4b/0xb5

The buggy address belongs to the object at ffff88823fc0d300
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 24 bytes inside of
  192-byte region [ffff88823fc0d300, ffff88823fc0d3c0)

The buggy address belongs to the physical page:
page:0000000010139587 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x23fc0c
head:0000000010139587 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004ca00
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88823fc0d200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88823fc0d280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 >ffff88823fc0d300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff88823fc0d380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff88823fc0d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
general protection fault, probably for non-canonical address
0xdffffc005c40d7ac: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000002e206bd60-0x00000002e206bd67]
CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
RIP: 0010:__alloc_pages+0x141/0x5c0
Call Trace:
  <TASK>
  ? sysvec_apic_timer_interrupt+0xa0/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? __alloc_pages_slowpath.constprop.0+0x1ec0/0x1ec0
  ? _raw_spin_unlock_irqrestore+0x3d/0x80
  __kmalloc_large_node+0x80/0x120
  ? kvmalloc_node+0x4e/0x170
  __kmalloc_node+0xd4/0x150
  kvmalloc_node+0x4e/0x170
  mlx5e_open_channels+0x631/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>
---[ end trace 0000000000000000  ]---
RIP: 0010:__alloc_pages+0x141/0x5c0
Code: e0 39 a3 96 89 e9 b8 22 01 32 01 83 e1 0f 48 89 fa 01 c9 48 c1 ea
03 d3 f8 83 e0 03 89 44 24 6c 48 b8 00 00 00 00 00 fc ff df <80> 3c 02
00 0f 85 fc 03 00 00 89 e8 4a 8b 14 f5 e0 39 a3 96 4c 89
RSP: 0018:ffff888251f0f438 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 1ffff1104a3e1e8b RCX: 0000000000000000
RDX: 000000005c40d7ac RSI: 0000000000000003 RDI: 00000002e206bd60
RBP: 0000000000052dc0 R08: ffff8882b0044218 R09: ffff8882b0045e8a
R10: fffffbfff300fefc R11: ffff888167af4000 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000696c7070 R15: ffff8882373f4380
FS:  0000000000000000(0000) GS:ffff88bf2be80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005641d031eee8 CR3: 0000002e7ca14000 CR4: 0000000000350ee0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception  ]---]

Reported-by: Frederick Lawler <fred@cloudflare.com>
Link: https://lore.kernel.org/netdev/be5b9271-7507-19c5-ded1-fa78f1980e69@cloudflare.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Shay Drory
1f893f57a3 net/mlx5: Devcom, serialize devcom registration
From one hand, mlx5 driver is allowing to probe PFs in parallel.
From the other hand, devcom, which is a share resource between PFs, is
registered without any lock. This might resulted in memory problems.

Hence, use the global mlx5_dev_list_lock in order to serialize devcom
registration.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Shay Drory
af87194352 net/mlx5: Devcom, fix error flow in mlx5_devcom_register_device
In case devcom allocation is failed, mlx5 is always freeing the priv.
However, this priv might have been allocated by a different thread,
and freeing it might lead to use-after-free bugs.
Fix it by freeing the priv only in case it was allocated by the
running thread.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Shay Drory
8c253dfc89 net/mlx5: E-switch, Devcom, sync devcom events and devcom comp register
devcom events are sent to all registered component. Following the
cited patch, it is possible for two components, e.g.: two eswitches,
to send devcom events, while both components are registered. This
means eswitch layer will do double un/pairing, which is double
allocation and free of resources, even though only one un/pairing is
needed. flow example:

	cpu0					cpu1
	----					----

 mlx5_devlink_eswitch_mode_set(dev0)
  esw_offloads_devcom_init()
   mlx5_devcom_register_component(esw0)
                                         mlx5_devlink_eswitch_mode_set(dev1)
                                          esw_offloads_devcom_init()
                                           mlx5_devcom_register_component(esw1)
                                           mlx5_devcom_send_event()
   mlx5_devcom_send_event()

Hence, check whether the eswitches are already un/paired before
free/allocation of resources.

Fixes: 09b278462f16 ("net: devlink: enable parallel ops on netlink interface")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Paul Blakey
dfa1e46d60 net/mlx5e: TC, Fix using eswitch mapping in nic mode
Cited patch is using the eswitch object mapping pool while
in nic mode where it isn't initialized. This results in the
trace below [0].

Fix that by using either nic or eswitch object mapping pool
depending if eswitch is enabled or not.

[0]:
[  826.446057] ==================================================================
[  826.446729] BUG: KASAN: slab-use-after-free in mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.447515] Read of size 8 at addr ffff888194485830 by task tc/6233

[  826.448243] CPU: 16 PID: 6233 Comm: tc Tainted: G        W          6.3.0-rc6+ #1
[  826.448890] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  826.449785] Call Trace:
[  826.450052]  <TASK>
[  826.450302]  dump_stack_lvl+0x33/0x50
[  826.450650]  print_report+0xc2/0x610
[  826.450998]  ? __virt_addr_valid+0xb1/0x130
[  826.451385]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.451935]  kasan_report+0xae/0xe0
[  826.452276]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.452829]  mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.453368]  ? __kmalloc_node+0x5a/0x120
[  826.453733]  esw_add_restore_rule+0x20f/0x270 [mlx5_core]
[  826.454288]  ? mlx5_eswitch_add_send_to_vport_meta_rule+0x260/0x260 [mlx5_core]
[  826.455011]  ? mutex_unlock+0x80/0xd0
[  826.455361]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[  826.455862]  ? mapping_add+0x2cb/0x440 [mlx5_core]
[  826.456425]  mlx5e_tc_action_miss_mapping_get+0x139/0x180 [mlx5_core]
[  826.457058]  ? mlx5e_tc_update_skb_nic+0xb0/0xb0 [mlx5_core]
[  826.457636]  ? __kasan_kmalloc+0x77/0x90
[  826.458000]  ? __kmalloc+0x57/0x120
[  826.458336]  mlx5_tc_ct_flow_offload+0x325/0xe40 [mlx5_core]
[  826.458916]  ? ct_kernel_enter.constprop.0+0x48/0xa0
[  826.459360]  ? mlx5_tc_ct_parse_action+0xf0/0xf0 [mlx5_core]
[  826.459933]  ? mlx5e_mod_hdr_attach+0x491/0x520 [mlx5_core]
[  826.460507]  ? mlx5e_mod_hdr_get+0x12/0x20 [mlx5_core]
[  826.461046]  ? mlx5e_tc_attach_mod_hdr+0x154/0x170 [mlx5_core]
[  826.461635]  mlx5e_configure_flower+0x969/0x2110 [mlx5_core]
[  826.462217]  ? _raw_spin_lock_bh+0x85/0xe0
[  826.462597]  ? __mlx5e_add_fdb_flow+0x750/0x750 [mlx5_core]
[  826.463163]  ? kasan_save_stack+0x2e/0x40
[  826.463534]  ? down_read+0x115/0x1b0
[  826.463878]  ? down_write_killable+0x110/0x110
[  826.464288]  ? tc_setup_action.part.0+0x9f/0x3b0
[  826.464701]  ? mlx5e_is_uplink_rep+0x4c/0x90 [mlx5_core]
[  826.465253]  ? mlx5e_tc_reoffload_flows_work+0x130/0x130 [mlx5_core]
[  826.465878]  tc_setup_cb_add+0x112/0x250
[  826.466247]  fl_hw_replace_filter+0x230/0x310 [cls_flower]
[  826.466724]  ? fl_hw_destroy_filter+0x1a0/0x1a0 [cls_flower]
[  826.467212]  fl_change+0x14e1/0x2030 [cls_flower]
[  826.467636]  ? sock_def_readable+0x89/0x120
[  826.468019]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[  826.468509]  ? kasan_unpoison+0x23/0x50
[  826.468873]  ? get_random_u16+0x180/0x180
[  826.469244]  ? __radix_tree_lookup+0x2b/0x130
[  826.469640]  ? fl_get+0x7b/0x140 [cls_flower]
[  826.470042]  ? fl_mask_put+0x200/0x200 [cls_flower]
[  826.470478]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[  826.470973]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[  826.471427]  tc_new_tfilter+0x644/0x1050
[  826.471795]  ? tc_get_tfilter+0x860/0x860
[  826.472170]  ? __thaw_task+0x130/0x130
[  826.472525]  ? arch_stack_walk+0x98/0xf0
[  826.472892]  ? cap_capable+0x9f/0xd0
[  826.473235]  ? security_capable+0x47/0x60
[  826.473608]  rtnetlink_rcv_msg+0x1d5/0x550
[  826.473985]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  826.474383]  ? __stack_depot_save+0x35/0x4c0
[  826.474779]  ? kasan_save_stack+0x2e/0x40
[  826.475149]  ? kasan_save_stack+0x1e/0x40
[  826.475518]  ? __kasan_record_aux_stack+0x9f/0xb0
[  826.475939]  ? task_work_add+0x77/0x1c0
[  826.476305]  netlink_rcv_skb+0xe0/0x210
[  826.476661]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  826.477057]  ? netlink_ack+0x7c0/0x7c0
[  826.477412]  ? rhashtable_jhash2+0xef/0x150
[  826.477796]  ? _copy_from_iter+0x105/0x770
[  826.484386]  netlink_unicast+0x346/0x490
[  826.484755]  ? netlink_attachskb+0x400/0x400
[  826.485145]  ? kernel_text_address+0xc2/0xd0
[  826.485535]  netlink_sendmsg+0x3b0/0x6c0
[  826.485902]  ? kernel_text_address+0xc2/0xd0
[  826.486296]  ? netlink_unicast+0x490/0x490
[  826.486671]  ? iovec_from_user.part.0+0x7a/0x1a0
[  826.487083]  ? netlink_unicast+0x490/0x490
[  826.487461]  sock_sendmsg+0x73/0xc0
[  826.487803]  ____sys_sendmsg+0x364/0x380
[  826.488186]  ? import_iovec+0x7/0x10
[  826.488531]  ? kernel_sendmsg+0x30/0x30
[  826.488893]  ? __copy_msghdr+0x180/0x180
[  826.489258]  ? kasan_save_stack+0x2e/0x40
[  826.489629]  ? kasan_save_stack+0x1e/0x40
[  826.490002]  ? __kasan_record_aux_stack+0x9f/0xb0
[  826.490424]  ? __call_rcu_common.constprop.0+0x46/0x580
[  826.490876]  ___sys_sendmsg+0xdf/0x140
[  826.491231]  ? copy_msghdr_from_user+0x110/0x110
[  826.491649]  ? fget_raw+0x120/0x120
[  826.491988]  ? ___sys_recvmsg+0xd9/0x130
[  826.492355]  ? folio_batch_add_and_move+0x80/0xa0
[  826.492776]  ? _raw_spin_lock+0x7a/0xd0
[  826.493137]  ? _raw_spin_lock+0x7a/0xd0
[  826.493500]  ? _raw_read_lock_irq+0x30/0x30
[  826.493880]  ? kasan_set_track+0x21/0x30
[  826.494249]  ? kasan_save_free_info+0x2a/0x40
[  826.494650]  ? do_sys_openat2+0xff/0x270
[  826.495016]  ? __fget_light+0x1b5/0x200
[  826.495377]  ? __virt_addr_valid+0xb1/0x130
[  826.495763]  __sys_sendmsg+0xb2/0x130
[  826.496118]  ? __sys_sendmsg_sock+0x20/0x20
[  826.496501]  ? __x64_sys_rseq+0x2e0/0x2e0
[  826.496874]  ? do_user_addr_fault+0x276/0x820
[  826.497273]  ? fpregs_assert_state_consistent+0x52/0x60
[  826.497727]  ? exit_to_user_mode_prepare+0x30/0x120
[  826.498158]  do_syscall_64+0x3d/0x90
[  826.498502]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  826.498949] RIP: 0033:0x7f9b67f4f887
[  826.499294] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  826.500742] RSP: 002b:00007fff5d1a5498 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  826.501395] RAX: ffffffffffffffda RBX: 0000000064413ce6 RCX: 00007f9b67f4f887
[  826.501975] RDX: 0000000000000000 RSI: 00007fff5d1a5500 RDI: 0000000000000003
[  826.502556] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[  826.503135] R10: 00007f9b67e08708 R11: 0000000000000246 R12: 0000000000000001
[  826.503714] R13: 0000000000000001 R14: 00007fff5d1a9800 R15: 0000000000485400
[  826.504304]  </TASK>

[  826.504753] Allocated by task 3764:
[  826.505090]  kasan_save_stack+0x1e/0x40
[  826.505453]  kasan_set_track+0x21/0x30
[  826.505810]  __kasan_kmalloc+0x77/0x90
[  826.506164]  __mlx5_create_flow_table+0x16d/0xbb0 [mlx5_core]
[  826.506742]  esw_offloads_enable+0x60d/0xfb0 [mlx5_core]
[  826.507292]  mlx5_eswitch_enable_locked+0x4d3/0x680 [mlx5_core]
[  826.507885]  mlx5_devlink_eswitch_mode_set+0x2a3/0x580 [mlx5_core]
[  826.508513]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.508969]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.509427]  genl_rcv_msg+0x28d/0x3e0
[  826.509772]  netlink_rcv_skb+0xe0/0x210
[  826.510133]  genl_rcv+0x24/0x40
[  826.510448]  netlink_unicast+0x346/0x490
[  826.510810]  netlink_sendmsg+0x3b0/0x6c0
[  826.511179]  sock_sendmsg+0x73/0xc0
[  826.511519]  __sys_sendto+0x18d/0x220
[  826.511867]  __x64_sys_sendto+0x72/0x80
[  826.512232]  do_syscall_64+0x3d/0x90
[  826.512576]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.513220] Freed by task 5674:
[  826.513535]  kasan_save_stack+0x1e/0x40
[  826.513893]  kasan_set_track+0x21/0x30
[  826.514245]  kasan_save_free_info+0x2a/0x40
[  826.514629]  ____kasan_slab_free+0x11a/0x1b0
[  826.515021]  __kmem_cache_free+0x14d/0x280
[  826.515399]  tree_put_node+0x109/0x1c0 [mlx5_core]
[  826.515907]  mlx5_destroy_flow_table+0x119/0x630 [mlx5_core]
[  826.516481]  esw_offloads_steering_cleanup+0xe7/0x150 [mlx5_core]
[  826.517084]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
[  826.517632]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[  826.518225]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[  826.518834]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.519286]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.519748]  genl_rcv_msg+0x28d/0x3e0
[  826.520101]  netlink_rcv_skb+0xe0/0x210
[  826.520458]  genl_rcv+0x24/0x40
[  826.520771]  netlink_unicast+0x346/0x490
[  826.521137]  netlink_sendmsg+0x3b0/0x6c0
[  826.521505]  sock_sendmsg+0x73/0xc0
[  826.521842]  __sys_sendto+0x18d/0x220
[  826.522191]  __x64_sys_sendto+0x72/0x80
[  826.522554]  do_syscall_64+0x3d/0x90
[  826.522894]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.523540] Last potentially related work creation:
[  826.523969]  kasan_save_stack+0x1e/0x40
[  826.524331]  __kasan_record_aux_stack+0x9f/0xb0
[  826.524739]  insert_work+0x30/0x130
[  826.525078]  __queue_work+0x34b/0x690
[  826.525426]  queue_work_on+0x48/0x50
[  826.525766]  __rhashtable_remove_fast_one+0x4af/0x4d0 [mlx5_core]
[  826.526365]  del_sw_flow_group+0x1b5/0x270 [mlx5_core]
[  826.526898]  tree_put_node+0x109/0x1c0 [mlx5_core]
[  826.527407]  esw_offloads_steering_cleanup+0xd3/0x150 [mlx5_core]
[  826.528009]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
[  826.528616]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[  826.529218]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[  826.529823]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.530276]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.530733]  genl_rcv_msg+0x28d/0x3e0
[  826.531079]  netlink_rcv_skb+0xe0/0x210
[  826.531439]  genl_rcv+0x24/0x40
[  826.531755]  netlink_unicast+0x346/0x490
[  826.532123]  netlink_sendmsg+0x3b0/0x6c0
[  826.532487]  sock_sendmsg+0x73/0xc0
[  826.532825]  __sys_sendto+0x18d/0x220
[  826.533175]  __x64_sys_sendto+0x72/0x80
[  826.533533]  do_syscall_64+0x3d/0x90
[  826.533877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.534521] The buggy address belongs to the object at ffff888194485800
                which belongs to the cache kmalloc-512 of size 512
[  826.535506] The buggy address is located 48 bytes inside of
                freed 512-byte region [ffff888194485800, ffff888194485a00)

[  826.536666] The buggy address belongs to the physical page:
[  826.537138] page:00000000d75841dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x194480
[  826.537915] head:00000000d75841dd order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  826.538595] flags: 0x200000000010200(slab|head|node=0|zone=2)
[  826.539089] raw: 0200000000010200 ffff888100042c80 ffffea0004523800 dead000000000002
[  826.539755] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[  826.540417] page dumped because: kasan: bad access detected

[  826.541095] Memory state around the buggy address:
[  826.541519]  ffff888194485700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  826.542149]  ffff888194485780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  826.542773] >ffff888194485800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.543400]                                      ^
[  826.543822]  ffff888194485880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.544452]  ffff888194485900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.545079] ==================================================================

Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
Rahul Rameshbabu
7aa5038019 net/mlx5e: Fix SQ wake logic in ptp napi_poll context
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken
up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb
fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.

Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Vlad Buslov
691c041bf2 net/mlx5e: Fix deadlock in tc route query code
Cited commit causes ABBA deadlock[0] when peer flows are created while
holding the devcom rw semaphore. Due to peer flows offload implementation
the lock is taken much higher up the call chain and there is no obvious way
to easily fix the deadlock. Instead, since tc route query code needs the
peer eswitch structure only to perform a lookup in xarray and doesn't
perform any sleeping operations with it, refactor the code for lockless
execution in following ways:

- RCUify the devcom 'data' pointer. When resetting the pointer
synchronously wait for RCU grace period before returning. This is fine
since devcom is currently only used for synchronization of
pairing/unpairing of eswitches which is rare and already expensive as-is.

- Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has
already been used in some unlocked contexts without proper
annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't
an issue since all relevant code paths checked it again after obtaining the
devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as
"best effort" check to return NULL when devcom is being unpaired. Note that
while RCU read lock doesn't prevent the unpaired flag from being changed
concurrently it still guarantees that reader can continue to use 'data'.

- Refactor mlx5e_tc_query_route_vport() function to use new
mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.

[0]:

[  164.599612] ======================================================
[  164.600142] WARNING: possible circular locking dependency detected
[  164.600667] 6.3.0-rc3+ #1 Not tainted
[  164.601021] ------------------------------------------------------
[  164.601557] handler1/3456 is trying to acquire lock:
[  164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.603078]
               but task is already holding lock:
[  164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.604459]
               which lock already depends on the new lock.

[  164.605190]
               the existing dependency chain (in reverse order) is:
[  164.605848]
               -> #1 (&comp->sem){++++}-{3:3}:
[  164.606380]        down_read+0x39/0x50
[  164.606772]        mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.607336]        mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core]
[  164.607914]        mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core]
[  164.608495]        mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core]
[  164.609063]        mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core]
[  164.609627]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.610175]        mlx5e_configure_flower+0x952/0x1a20 [mlx5_core]
[  164.610741]        tc_setup_cb_add+0xd4/0x200
[  164.611146]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.611661]        fl_change+0xc95/0x18a0 [cls_flower]
[  164.612116]        tc_new_tfilter+0x3fc/0xd20
[  164.612516]        rtnetlink_rcv_msg+0x418/0x5b0
[  164.612936]        netlink_rcv_skb+0x54/0x100
[  164.613339]        netlink_unicast+0x190/0x250
[  164.613746]        netlink_sendmsg+0x245/0x4a0
[  164.614150]        sock_sendmsg+0x38/0x60
[  164.614522]        ____sys_sendmsg+0x1d0/0x1e0
[  164.614934]        ___sys_sendmsg+0x80/0xc0
[  164.615320]        __sys_sendmsg+0x51/0x90
[  164.615701]        do_syscall_64+0x3d/0x90
[  164.616083]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.616568]
               -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}:
[  164.617210]        __lock_acquire+0x159e/0x26e0
[  164.617638]        lock_acquire+0xc2/0x2a0
[  164.618018]        __mutex_lock+0x92/0xcd0
[  164.618401]        mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.618943]        post_process_attr+0x153/0x2d0 [mlx5_core]
[  164.619471]        mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[  164.620021]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.620564]        mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[  164.621125]        tc_setup_cb_add+0xd4/0x200
[  164.621531]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.622047]        fl_change+0xc95/0x18a0 [cls_flower]
[  164.622500]        tc_new_tfilter+0x3fc/0xd20
[  164.622906]        rtnetlink_rcv_msg+0x418/0x5b0
[  164.623324]        netlink_rcv_skb+0x54/0x100
[  164.623727]        netlink_unicast+0x190/0x250
[  164.624138]        netlink_sendmsg+0x245/0x4a0
[  164.624544]        sock_sendmsg+0x38/0x60
[  164.624919]        ____sys_sendmsg+0x1d0/0x1e0
[  164.625340]        ___sys_sendmsg+0x80/0xc0
[  164.625731]        __sys_sendmsg+0x51/0x90
[  164.626117]        do_syscall_64+0x3d/0x90
[  164.626502]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.626995]
               other info that might help us debug this:

[  164.627725]  Possible unsafe locking scenario:

[  164.628268]        CPU0                    CPU1
[  164.628683]        ----                    ----
[  164.629098]   lock(&comp->sem);
[  164.629421]                                lock(&esw->offloads.encap_tbl_lock);
[  164.630066]                                lock(&comp->sem);
[  164.630555]   lock(&esw->offloads.encap_tbl_lock);
[  164.630993]
                *** DEADLOCK ***

[  164.631575] 3 locks held by handler1/3456:
[  164.631962]  #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[  164.632703]  #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
[  164.633552]  #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.634435]
               stack backtrace:
[  164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1
[  164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  164.636340] Call Trace:
[  164.636616]  <TASK>
[  164.636863]  dump_stack_lvl+0x47/0x70
[  164.637217]  check_noncircular+0xfe/0x110
[  164.637601]  __lock_acquire+0x159e/0x26e0
[  164.637977]  ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core]
[  164.638472]  lock_acquire+0xc2/0x2a0
[  164.638828]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.639339]  ? lock_is_held_type+0x98/0x110
[  164.639728]  __mutex_lock+0x92/0xcd0
[  164.640074]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.640576]  ? __lock_acquire+0x382/0x26e0
[  164.640958]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.641468]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.641965]  mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.642454]  ? lock_release+0xbf/0x240
[  164.642819]  post_process_attr+0x153/0x2d0 [mlx5_core]
[  164.643318]  mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[  164.643835]  __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.644340]  mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[  164.644862]  ? lock_acquire+0xc2/0x2a0
[  164.645219]  tc_setup_cb_add+0xd4/0x200
[  164.645588]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.646067]  fl_change+0xc95/0x18a0 [cls_flower]
[  164.646488]  tc_new_tfilter+0x3fc/0xd20
[  164.646861]  ? tc_del_tfilter+0x810/0x810
[  164.647236]  rtnetlink_rcv_msg+0x418/0x5b0
[  164.647621]  ? rtnl_setlink+0x160/0x160
[  164.647982]  netlink_rcv_skb+0x54/0x100
[  164.648348]  netlink_unicast+0x190/0x250
[  164.648722]  netlink_sendmsg+0x245/0x4a0
[  164.649090]  sock_sendmsg+0x38/0x60
[  164.649434]  ____sys_sendmsg+0x1d0/0x1e0
[  164.649804]  ? copy_msghdr_from_user+0x6d/0xa0
[  164.650213]  ___sys_sendmsg+0x80/0xc0
[  164.650563]  ? lock_acquire+0xc2/0x2a0
[  164.650926]  ? lock_acquire+0xc2/0x2a0
[  164.651286]  ? __fget_files+0x5/0x190
[  164.651644]  ? find_held_lock+0x2b/0x80
[  164.652006]  ? __fget_files+0xb9/0x190
[  164.652365]  ? lock_release+0xbf/0x240
[  164.652723]  ? __fget_files+0xd3/0x190
[  164.653079]  __sys_sendmsg+0x51/0x90
[  164.653435]  do_syscall_64+0x3d/0x90
[  164.653784]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.654229] RIP: 0033:0x7f378054f8bd
[  164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48
[  164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd
[  164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014
[  164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c
[  164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0
[  164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540

Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Roi Dayan
a65735148e net/mlx5: Fix error message when failing to allocate device memory
Fix spacing for the error and also the correct error code pointer.

Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Vlad Buslov
be071cdb16 net/mlx5e: Use correct encap attribute during invalidation
With introduction of post action infrastructure most of the users of encap
attribute had been modified in order to obtain the correct attribute by
calling mlx5e_tc_get_encap_attr() helper instead of assuming encap action
is always on default attribute. However, the cited commit didn't modify
mlx5e_invalidate_encap() which prevents it from destroying correct modify
header action which leads to a warning [0]. Fix the issue by using correct
attribute.

[0]:

Feb 21 09:47:35 c-237-177-40-045 kernel: WARNING: CPU: 17 PID: 654 at drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:684 mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: RIP: 0010:mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:47:35 c-237-177-40-045 kernel:  <TASK>
Feb 21 09:47:35 c-237-177-40-045 kernel:  mlx5e_tc_fib_event_work+0x8e3/0x1f60 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? mlx5e_take_all_encap_flows+0xe0/0xe0 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lock_downgrade+0x6d0/0x6d0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  process_one_work+0x7c2/0x1310
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? pwq_dec_nr_in_flight+0x230/0x230
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? rwlock_bug.part.0+0x90/0x90
Feb 21 09:47:35 c-237-177-40-045 kernel:  worker_thread+0x59d/0xec0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? __kthread_parkme+0xd9/0x1d0

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Yevgeny Kliteynik
c7dd225bc2 net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.

Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Erez Shitrit
1e5daf5565 net/mlx5: DR, Fix crc32 calculation to work on big-endian (BE) CPUs
When calculating crc for hash index we use the function crc32 that
calculates for little-endian (LE) arch.
Then we convert it to network endianness using htonl(), but it's wrong
to do the conversion in BE archs since the crc32 value is already LE.

The solution is to switch the bytes from the crc result for all types
of arc.

Fixes: 40416d8ede65 ("net/mlx5: DR, Replace CRC32 implementation to use kernel lib")
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Shay Drory
2be5bd42a5 net/mlx5: Handle pairing of E-switch via uplink un/load APIs
In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.

The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.

[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.

[  157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[  157.964662 ] #PF: supervisor read access in kernel mode
[  157.965123 ] #PF: error_code(0x0000) - not-present page
[  157.965582 ] PGD 0 P4D 0
[  157.965866 ] Oops: 0000 [#1] SMP
[  157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[  157.976164 ] Call Trace:
[  157.976437 ]  <TASK>
[  157.976690 ]  __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[  157.977230 ]  mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[  157.977767 ]  mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[  157.984653 ]  mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[  157.985212 ]  mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[  157.985714 ]  esw_offloads_disable+0x5a/0x110 [mlx5_core]
[  157.986209 ]  mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[  157.986757 ]  mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[  157.987248 ]  mlx5_unload+0x2a/0xb0 [mlx5_core]
[  157.987678 ]  mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[  157.988127 ]  remove_one+0x64/0xe0 [mlx5_core]
[  157.988549 ]  pci_device_remove+0x31/0xa0
[  157.988933 ]  device_release_driver_internal+0x18f/0x1f0
[  157.989402 ]  driver_detach+0x3f/0x80
[  157.989754 ]  bus_remove_driver+0x70/0xf0
[  157.990129 ]  pci_unregister_driver+0x34/0x90
[  157.990537 ]  mlx5_cleanup+0xc/0x1c [mlx5_core]
[  157.990972 ]  __x64_sys_delete_module+0x15a/0x250
[  157.991398 ]  ? exit_to_user_mode_prepare+0xea/0x110
[  157.991840 ]  do_syscall_64+0x3d/0x90
[  157.992198 ]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:04 -07:00
Shay Drory
2a0a935fb6 net/mlx5: Collect command failures data only for known commands
DEVX can issue a general command, which is not used by mlx5 driver.
In case such command is failed, mlx5 is trying to collect the failure
data, However, mlx5 doesn't create a storage for this command, since
mlx5 doesn't use it. This lead to array-index-out-of-bounds error.

Fix it by checking whether the command is known before collecting the
failure data.

Fixes: 34f46ae0d4b3 ("net/mlx5: Add command failures data to debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:04 -07:00
Christophe JAILLET
640bf95b2c 3c589_cs: Fix an error handling path in tc589_probe()
Should tc589_config() fail, some resources need to be released as already
done in the remove function.

Fixes: 15b99ac17295 ("[PATCH] pcmcia: add return value to _config() functions")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/d8593ae867b24c79063646e36f9b18b0790107cb.1684575975.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-22 19:17:58 -07:00
Christophe JAILLET
5b17a4971d forcedeth: Fix an error handling path in nv_probe()
If an error occures after calling nv_mgmt_acquire_sema(), it should be
undone with a corresponding nv_mgmt_release_sema() call.

Add it in the error handling path of the probe as already done in the
remove function.

Fixes: cac1c52c3621 ("forcedeth: mgmt unit interface")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Link: https://lore.kernel.org/r/355e9a7d351b32ad897251b6f81b5886fcdc6766.1684571393.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-22 19:17:28 -07:00