111378 Commits

Author SHA1 Message Date
Li Qiong
0d462a681d net: lan966x: fix checking for return value of platform_get_irq_byname()
[ Upstream commit 40b4ac880e21d917da7f3752332fa57564a4c202 ]

The platform_get_irq_byname() returns non-zero IRQ number
or negative error number. "if (irq)" always true, chang it
to "if (irq > 0)"

Signed-off-by: Li Qiong <liqiong@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05 10:31:32 +02:00
Aleksander Jan Bajkowski
3ef2786e32 net: lantiq_xrx200: restore buffer if memory allocation failed
[ Upstream commit c9c3b1775f80fa21f5bff874027d2ccb10f5d90c ]

In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.

Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:12 +02:00
Aleksander Jan Bajkowski
0d9981b063 net: lantiq_xrx200: fix lock under memory pressure
[ Upstream commit c4b6e9341f930e4dd089231c0414758f5f1f9dbd ]

When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.

After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
Aleksander Jan Bajkowski
73f4758652 net: lantiq_xrx200: confirm skb is allocated before using
[ Upstream commit c8b043702dc0894c07721c5b019096cebc8c798f ]

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
Heiner Kallweit
27a5ab8fec net: stmmac: work around sporadic tx issue on link-up
[ Upstream commit a3a57bf07de23fe1ff779e0fdf710aa581c3ff73 ]

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
R Mohamed Shah
c830d71201 ionic: VF initial random MAC address if no assigned mac
[ Upstream commit 19058be7c48ceb3e60fa3948e24da1059bd68ee4 ]

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers.  Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <mohamed@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
Shannon Nelson
79e77fb156 ionic: fix up issues with handling EAGAIN on FW cmds
[ Upstream commit 0fc4dd452d6c14828eed6369155c75c0ac15bab3 ]

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW.  The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
Shannon Nelson
94d71d99e5 ionic: clear broken state on generation change
[ Upstream commit 9cb9dadb8f45c67e4310e002c2f221b70312b293 ]

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state.  This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails.  This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:11 +02:00
Lorenzo Bianconi
b886aebd0c net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
[ Upstream commit 0cf731f9ebb5bf6f252055bebf4463a5c0bd490b ]

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:10 +02:00
Lorenzo Bianconi
2f23757084 net: ethernet: mtk_eth_soc: enable rx cksum offload for MTK_NETSYS_V2
[ Upstream commit da6e113ff010815fdd21ee1e9af2e8d179a2680f ]

Enable rx checksum offload for mt7986 chipset.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c8699805c18f7fd38315fcb8da2787676d83a32c.1654544585.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:10 +02:00
Sylwester Dziedziuch
82fd140276 i40e: Fix incorrect address type for IPv6 flow rules
[ Upstream commit bcf3a156429306070afbfda5544f2b492d25e75b ]

It was not possible to create 1-tuple flow director
rule for IPv6 flow type. It was caused by incorrectly
checking for source IP address when validating user provided
destination IP address.

Fix this by changing ip6src to correct ip6dst address
in destination IP address validation for IPv6 flow type.

Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:10 +02:00
Jacob Keller
c2b99b2a24 ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
[ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <spayne@aurora.tech>
Reported-by: Ilya Evenbach <ievenbach@aurora.tech>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:10 +02:00
Vikas Gupta
c5101ebeb2 bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback
[ Upstream commit 366c304741729e64d778c80555d9eb422cf5cc89 ]

LRO/GRO_HW should be disabled if there is an attached XDP program.
BNXT_FLAG_TPA is the current setting of the LRO/GRO_HW.  Using
BNXT_FLAG_TPA to disable LRO/GRO_HW will cause these features to be
permanently disabled once they are disabled.

Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:06 +02:00
Vikas Gupta
2ec3dc278d bnxt_en: fix NQ resource accounting during vf creation on 57500 chips
[ Upstream commit 09a89cc59ad67794a11e1d3dd13c5b3172adcc51 ]

There are 2 issues:

1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs
   with the number of NQs assigned to the VFs.  The IRQs are fixed
   on each function and cannot be re-assigned.  Only the NQs are being
   assigned to the VFs.

2. vf_msix is the total number of NQs to be assigned to the VFs.  So
   we should decrement vf_msix from hw_resc->max_nqs.

Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:06 +02:00
Vikas Gupta
46195aec86 bnxt_en: set missing reload flag in devlink features
[ Upstream commit 574b2bb9692fd3d45ed631ac447176d4679f3010 ]

Add missing devlink_set_features() API for callbacks reload_down
and reload_up to function.

Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:06 +02:00
Pavan Chebbi
51ca62d327 bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use
[ Upstream commit 7dd3de7cb1d657a918c6b2bc673c71e318aa0c05 ]

Using BNXT_PAGE_MODE_BUF_SIZE + offset as buffer length value is not
sufficient when running single buffer XDP programs doing redirect
operations. The stack will complain on missing skb tail room. Fix it
by using PAGE_SIZE when calling xdp_init_buff() for single buffer
programs.

Fixes: b231c3f3414c ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:06 +02:00
Maciej Żenczykowski
223fbc2f50 net: ipvtap - add __init/__exit annotations to module init/exit funcs
[ Upstream commit 4b2e3a17e9f279325712b79fb01d1493f9e3e005 ]

Looks to have been left out in an oversight.

Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Sainath Grandhi <sainath.grandhi@intel.com>
Fixes: 235a9d89da97 ('ipvtap: IP-VLAN based tap driver')
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Jonathan Toppins
3ff47e5994 bonding: 802.3ad: fix no transmission of LACPDUs
[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
    bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
    params.ad_actor_system = 0
    params.mode = BOND_MODE_8023AD
    ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
    bond_enslave
    -> bond_3ad_initialize(); because first slave
       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
          return
results:
     Nothing is run in bond_3ad_initialize() because dev_addr equals
     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
     never initialized anywhere else.

The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.

Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Sergei Antonov
b8bd96a46c net: moxa: get rid of asymmetry in DMA mapping/unmapping
[ Upstream commit 0ee7828dfc56e97d71e51e6374dc7b4eb2b6e081 ]

Since priv->rx_mapping[i] is maped in moxart_mac_open(), we
should unmap it from moxart_mac_stop(). Fixes 2 warnings.

1. During error unwinding in moxart_mac_probe(): "goto init_fail;",
then moxart_mac_free_memory() calls dma_unmap_single() with
priv->rx_mapping[i] pointers zeroed.

WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980
DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes]
CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ #60
Hardware name: Generic DT based system
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x34/0x44
 dump_stack_lvl from __warn+0xbc/0x1f0
 __warn from warn_slowpath_fmt+0x94/0xc8
 warn_slowpath_fmt from check_unmap+0x704/0x980
 check_unmap from debug_dma_unmap_page+0x8c/0x9c
 debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8
 moxart_mac_free_memory from moxart_mac_probe+0x190/0x218
 moxart_mac_probe from platform_probe+0x48/0x88
 platform_probe from really_probe+0xc0/0x2e4

2. After commands:
 ip link set dev eth0 down
 ip link set dev eth0 up

WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec
DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported
CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ #57
Hardware name: Generic DT based system
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x34/0x44
 dump_stack_lvl from __warn+0xbc/0x1f0
 __warn from warn_slowpath_fmt+0x94/0xc8
 warn_slowpath_fmt from add_dma_entry+0x204/0x2ec
 add_dma_entry from dma_map_page_attrs+0x110/0x328
 dma_map_page_attrs from moxart_mac_open+0x134/0x320
 moxart_mac_open from __dev_open+0x11c/0x1ec
 __dev_open from __dev_change_flags+0x194/0x22c
 __dev_change_flags from dev_change_flags+0x14/0x44
 dev_change_flags from devinet_ioctl+0x6d4/0x93c
 devinet_ioctl from inet_ioctl+0x1ac/0x25c

v1 -> v2:
Extraneous change removed.

Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220819110519.1230877-1-saproj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Xiaolei Wang
162571b774 net: phy: Don't WARN for PHY_READY state in mdio_bus_phy_resume()
[ Upstream commit 6dbe852c379ff032a70a6b13a91914918c82cb07 ]

For some MAC drivers, they set the mac_managed_pm to true in its
->ndo_open() callback. So before the mac_managed_pm is set to true,
we still want to leverage the mdio_bus_phy_suspend()/resume() for
the phy device suspend and resume. In this case, the phy device is
in PHY_READY, and we shouldn't warn about this. It also seems that
the check of mac_managed_pm in WARN_ON is redundant since we already
check this in the entry of mdio_bus_phy_resume(), so drop it.

Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220819082451.1992102-1-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Alex Elder
834a5483bf net: ipa: don't assume SMEM is page-aligned
[ Upstream commit b8d4380365c515d8e0351f2f46d371738dd19be1 ]

In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed)
and then its virtual address is fetched using qcom_smem_get().  The
physical address associated with that region is also fetched.

The physical address is adjusted so that it is page-aligned, and an
attempt is made to update the size of the region to compensate for
any non-zero adjustment.

But that adjustment isn't done properly.  The physical address is
aligned twice, and as a result the size is never actually adjusted.

Fix this by *not* aligning the "addr" local variable, and instead
making the "phys" local variable be the adjusted "addr" value.

Fixes: a0036bb413d5b ("net: ipa: define SMEM memory region for IPA")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220818134206.567618-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Vladimir Oltean
67426e99a1 net: dsa: microchip: keep compatibility with device tree blobs with no phy-mode
[ Upstream commit 5fbb08eb7f945c7e8896ea39f03143ce66dfa4c7 ]

DSA has multiple ways of specifying a MAC connection to an internal PHY.
One requires a DT description like this:

	port@0 {
		reg = <0>;
		phy-handle = <&internal_phy>;
		phy-mode = "internal";
	};

(which is IMO the recommended approach, as it is the clearest
description)

but it is also possible to leave the specification as just:

	port@0 {
		reg = <0>;
	}

and if the driver implements ds->ops->phy_read and ds->ops->phy_write,
the DSA framework "knows" it should create a ds->slave_mii_bus, and it
should connect to a non-OF-based internal PHY on this MDIO bus, at an
MDIO address equal to the port address.

There is also an intermediary way of describing things:

	port@0 {
		reg = <0>;
		phy-handle = <&internal_phy>;
	};

In case 2, DSA calls phylink_connect_phy() and in case 3, it calls
phylink_of_phy_connect(). In both cases, phylink_create() has been
called with a phy_interface_t of PHY_INTERFACE_MODE_NA, and in both
cases, PHY_INTERFACE_MODE_NA is translated into phy->interface.

It is important to note that phy_device_create() initializes
dev->interface = PHY_INTERFACE_MODE_GMII, and so, when we use
phylink_create(PHY_INTERFACE_MODE_NA), no one will override this, and we
will end up with a PHY_INTERFACE_MODE_GMII interface inherited from the
PHY.

All this means that in order to maintain compatibility with device tree
blobs where the phy-mode property is missing, we need to allow the
"gmii" phy-mode and treat it as "internal".

Fixes: 2c709e0bdad4 ("net: dsa: microchip: ksz8795: add phylink support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216320
Reported-by: Craig McQueen <craig@mcqueen.id.au>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Tested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20220818143250.2797111-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:05 +02:00
Arun Ramadoss
dfee8aec73 net: dsa: microchip: update the ksz_phylink_get_caps
[ Upstream commit 7012033ce10e0968e6cb82709aa0ed7f2080b61e ]

This patch assigns the phylink_get_caps in ksz8795 and ksz9477 to
ksz_phylink_get_caps. And update their mac_capabilities in the
respective ksz_dev_ops.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Arun Ramadoss
bb015bf77a net: dsa: microchip: move the port mirror to ksz_common
[ Upstream commit 00a298bbc23876288b1cd04c38752d8e7ed53ae2 ]

This patch updates the common port mirror add/del dsa_switch_ops in
ksz_common.c. The individual switches implementation is executed based
on the ksz_dev_ops function pointers.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Arun Ramadoss
d719d680a5 net: dsa: microchip: move vlan functionality to ksz_common
[ Upstream commit f0d997e31bb307c7aa046c4992c568547fd25195 ]

This patch moves the vlan dsa_switch_ops such as vlan_add, vlan_del and
vlan_filtering from the individual files ksz8795.c, ksz9477.c to
ksz_common.c file.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Arun Ramadoss
23fcd52165 net: dsa: microchip: move tag_protocol to ksz_common
[ Upstream commit 534a0431e9e68959e2c0d71c141d5b911d66ad7c ]

This patch move the dsa hook get_tag_protocol to ksz_common file. And
the tag_protocol is returned based on the dev->chip_id.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Arun Ramadoss
eafb01efb8 net: dsa: microchip: move switch chip_id detection to ksz_common
[ Upstream commit 91a98917a8839923d404a77c21646ca5fc9e330a ]

KSZ87xx and KSZ88xx have chip_id representation at reg location 0. And
KSZ9477 compatible switch and LAN937x switch have same chip_id detection
at location 0x01 and 0x02. To have the common switch detect
functionality for ksz switches, ksz_switch_detect function is
introduced.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Arun Ramadoss
422e808ba8 net: dsa: microchip: ksz9477: cleanup the ksz9477_switch_detect
[ Upstream commit 27faa0aa85f6696d411bbbebaed9f0f723c2a175 ]

The ksz9477_switch_detect performs the detecting the chip id from the
location 0x00 and also check gigabit compatibility check & number of
ports based on the register global_options0. To prepare the common ksz
switch detect function, routine other than chip id read is moved to
ksz9477_switch_init.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Maor Dickman
eaa08e3c5a net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off
[ Upstream commit 550f96432e6f6770efdaee0e65239d61431062a1 ]

The cited commit reintroduced the ability to set hw-tc-offload
in switchdev mode by reusing NIC mode calls without modifying it
to support both modes, this can cause an illegal memory access
when trying to turn hw-tc-offload off.

Fix this by using the right TC_FLAG when checking if tc rules
are installed while disabling hw-tc-offload.

Fixes: d3cbd4254df8 ("net/mlx5e: Add ndo_set_feature for uplink representor")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:04 +02:00
Aya Levin
160967199c net/mlx5e: Fix wrong application of the LRO state
[ Upstream commit 7b3707fc79044871ab8f3d5fa5e9603155bb5577 ]

Driver caches packet merge type in mlx5e_params instance which must be
in perfect sync with the netdev_feature's bit.
Prior to this patch, in certain conditions (*) LRO state was set in
mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
be applied on the RQs (HW level).

(*) This can happen only on profile init (mlx5e_build_nic_params()),
when RQ expect non-linear SKB and PCI is fast enough in comparison to
link width.

Solution: remove setting of packet merge type from
mlx5e_build_nic_params() as netdev features are not updated.

Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Moshe Shemesh
b0faef5159 net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
[ Upstream commit d59b73a66e5e0682442b6d7b4965364e57078b80 ]

Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.

kernel log:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.19.0-rc8+ #2 Not tainted
 ------------------------------------------------------
 kworker/u20:0/8 is trying to acquire lock:
 ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]

 but task is already holding lock:
 ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
        down_write+0x90/0x150
        blocking_notifier_chain_register+0x53/0xa0
        mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
        mlx5_init_one+0x261/0x490 [mlx5_core]
        probe_one+0x430/0x680 [mlx5_core]
        local_pci_probe+0xd6/0x170
        work_for_cpu_fn+0x4e/0xa0
        process_one_work+0x7c2/0x1340
        worker_thread+0x6f6/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2fc7/0x6720
        lock_acquire+0x1c1/0x550
        __mutex_lock+0x12c/0x14b0
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        bus_for_each_drv+0x123/0x1a0
        __device_attach+0x1a3/0x460
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        __auxiliary_device_add+0x88/0xc0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        process_one_work+0x7c2/0x1340
        worker_thread+0x59d/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&notifier->n_head)->rwsem);
                                lock(&dev->intf_state_mutex);
                                lock(&(&notifier->n_head)->rwsem);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 4 locks held by kworker/u20:0/8:
  #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
  #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
  #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
  #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460

 stack backtrace:
 CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  check_noncircular+0x278/0x300
  ? print_circular_bug+0x460/0x460
  ? lock_chain_count+0x20/0x20
  ? register_lock_class+0x1880/0x1880
  __lock_acquire+0x2fc7/0x6720
  ? register_lock_class+0x1880/0x1880
  ? register_lock_class+0x1880/0x1880
  lock_acquire+0x1c1/0x550
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  __mutex_lock+0x12c/0x14b0
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? _raw_read_unlock+0x1f/0x30
  ? mutex_lock_io_nested+0x1320/0x1320
  ? __ioremap_caller.constprop.0+0x306/0x490
  ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
  ? iounmap+0x160/0x160
  mlx5_init_one+0x2e/0x490 [mlx5_core]
  mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
  ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
  auxiliary_bus_probe+0x9d/0xe0
  really_probe+0x1e0/0xaa0
  __driver_probe_device+0x219/0x480
  ? auxiliary_match_id+0xe9/0x140
  driver_probe_device+0x49/0x130
  __device_attach_driver+0x1b8/0x280
  ? driver_allows_async_probing+0x140/0x140
  bus_for_each_drv+0x123/0x1a0
  ? bus_for_each_dev+0x1a0/0x1a0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  ? trace_hardirqs_on+0x2d/0x100
  __device_attach+0x1a3/0x460
  ? device_driver_attach+0x1e0/0x1e0
  ? kobject_uevent_env+0x22d/0xf10
  bus_probe_device+0x1a2/0x260
  device_add+0x9b1/0x1b40
  ? dev_set_name+0xab/0xe0
  ? __fw_devlink_link_to_suppliers+0x260/0x260
  ? memset+0x20/0x40
  ? lockdep_init_map_type+0x21a/0x7d0
  __auxiliary_device_add+0x88/0xc0
  ? auxiliary_device_init+0x86/0xa0
  mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
  blocking_notifier_call_chain+0xd5/0x130
  mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
  ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
  ? lock_downgrade+0x6e0/0x6e0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? rwlock_bug.part.0+0x90/0x90
  worker_thread+0x59d/0xec0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Roy Novich
0ea1abf797 net/mlx5: Fix cmd error logging for manage pages cmd
[ Upstream commit 090f3e4f4089ab8041ed7d632c7851c2a42fcc10 ]

When the driver unloads, give/reclaim_pages may fail as PF driver in
teardown flow, current code will lead to the following kernel log print
'failed reclaiming pages: err 0'.

Fix it to get same behavior as before the cited commits,
by calling mlx5_cmd_check before handling error state.
mlx5_cmd_check will verify if the returned error is an actual error
needed to be handled by the driver or not and will return an
appropriate value.

Fixes: 8d564292a166 ("net/mlx5: Remove redundant error on reclaim pages")
Fixes: 4dac2f10ada0 ("net/mlx5: Remove redundant notify fail on give pages")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Vlad Buslov
cddad6c98f net/mlx5: Disable irq when locking lag_lock
[ Upstream commit 8e93f29422ffe968d7161f91acdf0d47f5323727 ]

The lag_lock is taken from both process and softirq contexts which results
lockdep warning[0] about potential deadlock. However, just disabling
softirqs by using *_bh spinlock API is not enough since it will cause
warning in some contexts where the lock is obtained with hard irqs
disabled. To fix the issue save current irq state, disable them before
obtaining the lock an re-enable irqs from saved state after releasing it.

[0]:

[Sun Aug  7 13:12:29 2022] ================================
[Sun Aug  7 13:12:29 2022] WARNING: inconsistent lock state
[Sun Aug  7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
[Sun Aug  7 13:12:29 2022] --------------------------------
[Sun Aug  7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[Sun Aug  7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[Sun Aug  7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
[Sun Aug  7 13:12:29 2022]   lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]   _raw_spin_lock+0x2c/0x40
[Sun Aug  7 13:12:29 2022]   mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_nic_enable+0x114/0x470 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_resume+0x105/0x160 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_probe+0xac3/0x14f0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   auxiliary_bus_probe+0x9d/0xe0
[Sun Aug  7 13:12:29 2022]   really_probe+0x1e0/0xaa0
[Sun Aug  7 13:12:29 2022]   __driver_probe_device+0x219/0x480
[Sun Aug  7 13:12:29 2022]   driver_probe_device+0x49/0x130
[Sun Aug  7 13:12:29 2022]   __driver_attach+0x1e4/0x4d0
[Sun Aug  7 13:12:29 2022]   bus_for_each_dev+0x11e/0x1a0
[Sun Aug  7 13:12:29 2022]   bus_add_driver+0x3f4/0x5a0
[Sun Aug  7 13:12:29 2022]   driver_register+0x20f/0x390
[Sun Aug  7 13:12:29 2022]   __auxiliary_driver_register+0x14e/0x260
[Sun Aug  7 13:12:29 2022]   mlx5e_init+0x38/0x90 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
[Sun Aug  7 13:12:29 2022]   do_one_initcall+0xc4/0x400
[Sun Aug  7 13:12:29 2022]   do_init_module+0x18a/0x620
[Sun Aug  7 13:12:29 2022]   load_module+0x563a/0x7040
[Sun Aug  7 13:12:29 2022]   __do_sys_finit_module+0x122/0x1d0
[Sun Aug  7 13:12:29 2022]   do_syscall_64+0x3d/0x90
[Sun Aug  7 13:12:29 2022]   entry_SYSCALL_64_after_hwframe+0x46/0xb0
[Sun Aug  7 13:12:29 2022] irq event stamp: 3596508
[Sun Aug  7 13:12:29 2022] hardirqs last  enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
[Sun Aug  7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
[Sun Aug  7 13:12:29 2022] softirqs last  enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022]
                           other info that might help us debug this:
[Sun Aug  7 13:12:29 2022]  Possible unsafe locking scenario:

[Sun Aug  7 13:12:29 2022]        CPU0
[Sun Aug  7 13:12:29 2022]        ----
[Sun Aug  7 13:12:29 2022]   lock(lag_lock);
[Sun Aug  7 13:12:29 2022]   <Interrupt>
[Sun Aug  7 13:12:29 2022]     lock(lag_lock);
[Sun Aug  7 13:12:29 2022]
                            *** DEADLOCK ***

[Sun Aug  7 13:12:29 2022] 4 locks held by swapper/0/0:
[Sun Aug  7 13:12:29 2022]  #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
[Sun Aug  7 13:12:29 2022]  #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
[Sun Aug  7 13:12:29 2022]  #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
[Sun Aug  7 13:12:29 2022]
                           stack backtrace:
[Sun Aug  7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
[Sun Aug  7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[Sun Aug  7 13:12:29 2022] Call Trace:
[Sun Aug  7 13:12:29 2022]  <IRQ>
[Sun Aug  7 13:12:29 2022]  dump_stack_lvl+0x57/0x7d
[Sun Aug  7 13:12:29 2022]  mark_lock.part.0.cold+0x5f/0x92
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? unwind_next_frame+0x1c4/0x1b50
[Sun Aug  7 13:12:29 2022]  ? secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? stack_access_ok+0x1d0/0x1d0
[Sun Aug  7 13:12:29 2022]  ? start_kernel+0x3a7/0x3c5
[Sun Aug  7 13:12:29 2022]  __lock_acquire+0x1260/0x6720
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? mark_lock.part.0+0xed/0x3060
[Sun Aug  7 13:12:29 2022]  ? stack_trace_save+0x91/0xc0
[Sun Aug  7 13:12:29 2022]  lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  _raw_spin_lock+0x2c/0x40
[Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  atomic_notifier_call_chain+0xd7/0x1d0
[Sun Aug  7 13:12:29 2022]  br_switchdev_fdb_notify+0xea/0x100
[Sun Aug  7 13:12:29 2022]  ? br_switchdev_set_port_flag+0x310/0x310
[Sun Aug  7 13:12:29 2022]  fdb_notify+0x11b/0x150
[Sun Aug  7 13:12:29 2022]  br_fdb_update+0x34c/0x570
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? br_fdb_add_local+0x50/0x50
[Sun Aug  7 13:12:29 2022]  ? br_allowed_ingress+0x5f/0x1070
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  br_handle_frame_finish+0x786/0x18e0
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  ? sctp_inet_bind_verify+0x4d/0x190
[Sun Aug  7 13:12:29 2022]  ? xlog_unpack_data+0x2e0/0x310
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  br_nf_hook_thresh+0x227/0x380 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? setup_pre_routing+0x460/0x460 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
[Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  br_handle_frame+0x8a9/0x1270
[Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? bond_handle_frame+0xf9/0xac0 [bonding]
[Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug  7 13:12:29 2022]  __netif_receive_skb_core+0x7c0/0x2c70
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  ? generic_xdp_tx+0x5b0/0x5b0
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  __netif_receive_skb_list_core+0x2d7/0x8a0
[Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  ? process_backlog+0x960/0x960
[Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x129/0x400
[Sun Aug  7 13:12:29 2022]  ? kvm_clock_get_cycles+0x14/0x20
[Sun Aug  7 13:12:29 2022]  netif_receive_skb_list_internal+0x5f4/0xd60
[Sun Aug  7 13:12:29 2022]  ? do_xdp_generic+0x150/0x150
[Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  napi_complete_done+0x188/0x710
[Sun Aug  7 13:12:29 2022]  mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? __queue_work+0x53c/0xeb0
[Sun Aug  7 13:12:29 2022]  __napi_poll+0x9f/0x540
[Sun Aug  7 13:12:29 2022]  net_rx_action+0x420/0xb70
[Sun Aug  7 13:12:29 2022]  ? napi_threaded_poll+0x470/0x470
[Sun Aug  7 13:12:29 2022]  ? __common_interrupt+0x79/0x1a0
[Sun Aug  7 13:12:29 2022]  __do_softirq+0x271/0x92c
[Sun Aug  7 13:12:29 2022]  irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022]  common_interrupt+0x7d/0xa0
[Sun Aug  7 13:12:29 2022]  </IRQ>
[Sun Aug  7 13:12:29 2022]  <TASK>
[Sun Aug  7 13:12:29 2022]  asm_common_interrupt+0x22/0x40
[Sun Aug  7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
[Sun Aug  7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
[Sun Aug  7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
[Sun Aug  7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
[Sun Aug  7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
[Sun Aug  7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
[Sun Aug  7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
[Sun Aug  7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
[Sun Aug  7 13:12:29 2022]  ? default_idle_call+0xcc/0x460
[Sun Aug  7 13:12:29 2022]  default_idle_call+0xec/0x460
[Sun Aug  7 13:12:29 2022]  do_idle+0x394/0x450
[Sun Aug  7 13:12:29 2022]  ? arch_cpu_idle_exit+0x40/0x40
[Sun Aug  7 13:12:29 2022]  cpu_startup_entry+0x19/0x20
[Sun Aug  7 13:12:29 2022]  rest_init+0x156/0x250
[Sun Aug  7 13:12:29 2022]  arch_call_rest_init+0xf/0x15
[Sun Aug  7 13:12:29 2022]  start_kernel+0x3a7/0x3c5
[Sun Aug  7 13:12:29 2022]  secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug  7 13:12:29 2022]  </TASK>

Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Eli Cohen
3325cb4f2d net/mlx5: Eswitch, Fix forwarding decision to uplink
[ Upstream commit 942fca7e762be39204e5926e91a288a343a97c72 ]

Make sure to modify the rule for uplink forwarding only for the case
where destination vport number is MLX5_VPORT_UPLINK.

Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Eli Cohen
4c040acf57 net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY
[ Upstream commit a6e675a66175869b7d87c0e1dd0ddf93e04f8098 ]

Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered.
Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and
ldev->pf[MLX5_LAG_P1].dev have valid pointers when
MLX5_LAG_FLAG_NDEVS_READY is set.

The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and
clearing it. Setting it is done wrongly when both
ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set;
clearing it is done right when either of ldev->pf[i].netdev is cleared.

Consider the following scenario:
1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer
2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and
   ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in
   MLX5_LAG_FLAG_NDEVS_READY is set.
3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev.
   MLX5_LAG_FLAG_NDEVS_READY remains set.

Further execution of mlx5_do_bond() will result in null pointer
dereference when calling mlx5_lag_is_multipath()

This patch fixes the following call trace actually encountered:

[ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8
[ 1293.478756] #PF: supervisor read access in kernel mode
[ 1293.481320] #PF: error_code(0x0000) - not-present page
[ 1293.483686] PGD 0 P4D 0
[ 1293.484434] Oops: 0000 [#1] SMP PTI
[ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1
[ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
[ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core]
[ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8
[ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202
[ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000
[ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000
[ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c
[ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0
[ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858
[ 1293.508753] FS:  0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000
[ 1293.510782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0
[ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Vlad Buslov
1155eb7baf net/mlx5e: Properly disable vlan strip on non-UL reps
[ Upstream commit f37044fd759b6bc40b6398a978e0b1acdf717372 ]

When querying mlx5 non-uplink representors capabilities with ethtool
rx-vlan-offload is marked as "off [fixed]". However, it is actually always
enabled because mlx5e_params->vlan_strip_disable is 0 by default when
initializing struct mlx5e_params instance. Fix the issue by explicitly
setting the vlan_strip_disable to 'true' for non-uplink representors.

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:03 +02:00
Maciej Fijalkowski
952efbc7a0 ice: xsk: use Rx ring's XDP ring when picking NAPI context
[ Upstream commit 9ead7e74bfd6dd54db12ef133b8604add72511de ]

Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.

This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.

To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.

Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.

[0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:02 +02:00
Maciej Fijalkowski
03a3f29fe5 ice: xsk: prohibit usage of non-balanced queue id
[ Upstream commit 5a42f112d367bb4700a8a41f5c12724fde6bfbb9 ]

Fix the following scenario:
1. ethtool -L $IFACE rx 8 tx 96
2. xdpsock -q 10 -t -z

Above refers to a case where user would like to attach XSK socket in
txonly mode at a queue id that does not have a corresponding Rx queue.
At this moment ice's XSK logic is tightly bound to act on a "queue pair",
e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
both of them will get XSK pool assigned, which is broken for the presented
queue configuration. This results in the splat included at the bottom,
which is basically an OOB access to Rx ring array.

To fix this, allow using the ids only in scope of "combined" queues
reported by ethtool. However, logic should be rewritten to allow such
configurations later on, which would end up as a complete rewrite of the
control path, so let us go with this temporary fix.

[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
[420160.566359] #PF: supervisor read access in kernel mode
[420160.572657] #PF: error_code(0x0000) - not-present page
[420160.579002] PGD 0 P4D 0
[420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
[420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G           OE     5.19.0-rc7+ #10
[420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420160.693211] FS:  00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420160.703645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420160.740707] PKRU: 55555554
[420160.745960] Call Trace:
[420160.750962]  <TASK>
[420160.755597]  ? kmalloc_large_node+0x79/0x90
[420160.762703]  ? __kmalloc_node+0x3f5/0x4b0
[420160.769341]  xp_assign_dev+0xfd/0x210
[420160.775661]  ? shmem_file_read_iter+0x29a/0x420
[420160.782896]  xsk_bind+0x152/0x490
[420160.788943]  __sys_bind+0xd0/0x100
[420160.795097]  ? exit_to_user_mode_prepare+0x20/0x120
[420160.802801]  __x64_sys_bind+0x16/0x20
[420160.809298]  do_syscall_64+0x38/0x90
[420160.815741]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[420160.823731] RIP: 0033:0x7fa86a0dd2fb
[420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
[420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
[420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
[420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
[420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
[420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
[420160.919817]  </TASK>
[420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
[420160.977576] CR2: 0000000000000082
[420160.985037] ---[ end trace 0000000000000000 ]---
[420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420161.201505] FS:  00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420161.213628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420161.257052] PKRU: 55555554

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:02 +02:00
Hayes Wang
edbcbe37c3 r8152: fix the RX FIFO settings when suspending
[ Upstream commit b75d612014447e04abdf0e37ffb8f2fd8b0b49d6 ]

The RX FIFO would be changed when suspending, so the related settings
have to be modified, too. Otherwise, the flow control would work
abnormally.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216333
Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
Fixes: cdf0b86b250f ("r8152: fix a WOL issue")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:02 +02:00
Hayes Wang
3eb8eb6e2e r8152: fix the units of some registers for RTL8156A
[ Upstream commit 6dc4df12d741c0fe8f885778a43039e0619b9cd9 ]

The units of PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY are 16 bytes.

Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:02 +02:00
Sabrina Dubroca
4eac2ff103 Revert "net: macsec: update SCI upon MAC address change."
[ Upstream commit e82c649e851c9c25367fb7a2a6cf3479187de467 ]

This reverts commit 6fc498bc82929ee23aa2f35a828c6178dfd3f823.

Commit 6fc498bc8292 states:

    SCI should be updated, because it contains MAC in its first 6
    octets.

That's not entirely correct. The SCI can be based on the MAC address,
but doesn't have to be. We can also use any 64-bit number as the
SCI. When the SCI based on the MAC address, it uses a 16-bit "port
number" provided by userspace, which commit 6fc498bc8292 overwrites
with 1.

In addition, changing the SCI after macsec has been setup can just
confuse the receiver. If we configure the RXSC on the peer based on
the original SCI, we should keep the same SCI on TX.

When the macsec device is being managed by a userspace key negotiation
daemon such as wpa_supplicant, commit 6fc498bc8292 would also
overwrite the SCI defined by userspace.

Fixes: 6fc498bc8292 ("net: macsec: update SCI upon MAC address change.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/9b1a9d28327e7eb54550a92eebda45d25e54dd0d.1660667033.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31 17:18:01 +02:00
Deren Wu
4500aa1135 mt76: mt7921: fix command timeout in AP stop period
commit 9d958b60ebc2434f2b7eae83d77849e22d1059eb upstream.

Due to AP stop improperly, mt7921 driver would face random command timeout
by chip fw problem. Migrate AP start/stop process to .start_ap/.stop_ap and
congiure BSS network settings in both hooks.

The new flow is shown below.
* AP start
    .start_ap()
      configure BSS network resource
      set BSS to connected state
    .bss_info_changed()
      enable fw beacon offload

* AP stop
    .bss_info_changed()
      disable fw beacon offload (skip this command)
    .stop_ap()
      set BSS to disconnected state (beacon offload disabled automatically)
      destroy BSS network resource

Fixes: 116c69603b01 ("mt76: mt7921: Add AP mode support")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31 17:18:00 +02:00
Vladimir Oltean
e124bab08a net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats
[ Upstream commit e780e3193e889fd8358b862f7cd18ec5a4901caf ]

Rather than reading the stats64 counters directly from the 32-bit
hardware, it's better to rely on the output produced by the periodic
ocelot_port_update_stats().

It would be even better to call ocelot_port_update_stats() right from
ocelot_get_stats64() to make sure we report the current values rather
than the ones from 2 seconds ago. But we need to export
ocelot_port_update_stats() from the switch lib towards the switchdev
driver for that, and future work will largely undo that.

There are more ocelot-based drivers waiting to be introduced, an example
of which is the SPI-controlled VSC7512. In that driver's case, it will
be impossible to call ocelot_port_update_stats() from ndo_get_stats64
context, since the latter is atomic, and reading the stats over SPI is
sleepable. So the compromise taken here, which will also hold going
forward, is to report 64-bit counters to stats64, which are not 100% up
to date.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:39 +02:00
Vladimir Oltean
e07a74dc0f net: mscc: ocelot: make struct ocelot_stat_layout array indexable
[ Upstream commit 9190460084ddd0e9235f55eab0fdd5456b5f2fd5 ]

The ocelot counters are 32-bit and require periodic reading, every 2
seconds, by ocelot_port_update_stats(), so that wraparounds are
detected.

Currently, the counters reported by ocelot_get_stats64() come from the
32-bit hardware counters directly, rather than from the 64-bit
accumulated ocelot->stats, and this is a problem for their integrity.

The strategy is to make ocelot_get_stats64() able to cherry-pick
individual stats from ocelot->stats the way in which it currently reads
them out from SYS_COUNT_* registers. But currently it can't, because
ocelot->stats is an opaque u64 array that's used only to feed data into
ethtool -S.

To solve that problem, we need to make ocelot->stats indexable, and
associate each element with an element of struct ocelot_stat_layout used
by ethtool -S.

This makes ocelot_stat_layout a fat (and possibly sparse) array, so we
need to change the way in which we access it. We no longer need
OCELOT_STAT_END as a sentinel, because we know the array's size
(OCELOT_NUM_STATS). We just need to skip the array elements that were
left unpopulated for the switch revision (ocelot, felix, seville).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:39 +02:00
Vladimir Oltean
3aa635bf2f net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work
[ Upstream commit 18d8e67df184081bc6ce6220a2dd965cfd3d7e6b ]

The 2 methods can run concurrently, and one will change the window of
counters (SYS_STAT_CFG_STAT_VIEW) that the other sees. The fix is
similar to what commit 7fbf6795d127 ("net: mscc: ocelot: fix mutex lock
error during ethtool stats read") has done for ethtool -S.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:38 +02:00
Vladimir Oltean
0d05a55803 net: mscc: ocelot: turn stats_lock into a spinlock
[ Upstream commit 22d842e3efe56402c33b5e6e303bb71ce9bf9334 ]

ocelot_get_stats64() currently runs unlocked and therefore may collide
with ocelot_port_update_stats() which indirectly accesses the same
counters. However, ocelot_get_stats64() runs in atomic context, and we
cannot simply take the sleepable ocelot->stats_lock mutex. We need to
convert it to an atomic spinlock first. Do that as a preparatory change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:38 +02:00
Lin Ma
64c0c233a8 igb: Add lock to avoid data race
commit 6faee3d4ee8be0f0367d0c3d826afb3571b7a5e0 upstream.

The commit c23d92b80e0b ("igb: Teardown SR-IOV before
unregister_netdev()") places the unregister_netdev() call after the
igb_disable_sriov() call to avoid functionality issue.

However, it introduces several race conditions when detaching a device.
For example, when .remove() is called, the below interleaving leads to
use-after-free.

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

Moreover, the igb_disable_sriov() also suffers from data race with the
requests from VF driver.

 (FREE from device detaching)      |   (USE from requests)
igb_remove                         |  igb_msix_other
 igb_disable_sriov                 |   igb_msg_task
  kfree(adapter->vf_data)          |    vf < adapter->vfs_allocated_count
  adapter->vfs_allocated_count = 0 |

To this end, this commit first eliminates the data races from netdev
core by using rtnl_lock (similar to commit 719479230893 ("dpaa2-eth: add
MAC/PHY support through phylink")). And then adds a spinlock to
eliminate races from driver requests. (similar to commit 1e53834ce541
("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero")

Fixes: c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220817184921.735244-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25 11:45:37 +02:00
Christophe JAILLET
9400aeb419 stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove()
commit 5c23d6b717e4e956376f3852b90f58e262946b50 upstream.

Commit 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove
paths") removed this clk_disable_unprepare()

This was partly revert by commit ac322f86b56c ("net: stmmac: Fix clock
handling on remove path") which removed this clk_disable_unprepare()
because:
"
   While unloading the dwmac-intel driver, clk_disable_unprepare() is
   being called twice in stmmac_dvr_remove() and
   intel_eth_pci_remove(). This causes kernel panic on the second call.
"

However later on, commit 5ec55823438e8 ("net: stmmac: add clocks management
for gmac driver") has updated stmmac_dvr_remove() which do not call
clk_disable_unprepare() anymore.

So this call should now be called from intel_eth_pci_remove().

Fixes: 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/d7c8c1dadf40df3a7c9e643f76ffadd0ccc1ad1b.1660659689.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25 11:45:37 +02:00
Csókás Bence
330eccd73f fec: Fix timer capture timing in fec_ptp_enable_pps()
commit 61d5e2a251fb20c2c5e998c3f1d52ed6d5360319 upstream.

Code reimplements functionality already in `fec_ptp_read()`,
but misses check for FEC_QUIRK_BUG_CAPTURE. Replace with function call.

Fixes: 28b5f058cf1d ("net: fec: ptp: fix convergence issue to support LinuxPTP stack")
Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
Link: https://lore.kernel.org/r/20220811101348.13755-1-csokas.bence@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25 11:45:36 +02:00
Alan Brady
7c9ebb648c i40e: Fix to stop tx_timeout recovery if GLOBR fails
commit 57c942bc3bef0970f0b21f8e0998e76a900ea80d upstream.

When a tx_timeout fires, the PF attempts to recover by incrementally
resetting.  First we try a PFR, then CORER and finally a GLOBR.  If the
GLOBR fails, then we keep hitting the tx_timeout and incrementing the
recovery level and issuing dmesgs, which is both annoying to the user
and accomplishes nothing.

If the GLOBR fails, then we're pretty much totally hosed, and there's
not much else we can do to recover, so this makes it such that we just
kill the VSI and stop hitting the tx_timeout in such a case.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25 11:45:36 +02:00