linux/drivers/net/ethernet
Ivan Vecera 5cb1ebdbc4 ice: Fix race condition during interface enslave
Commit 5dbbbd01cb ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.

Reproducer:
```
#!/bin/sh

ip link add lag0 type bond mode 1 miimon 100
ip link set lag0

for n in {1..10}; do
        echo Cycle: $n
        ip link set ens7f0 master lag0
        sleep 1
        ip link set ens7f0 nomaster
done
```

This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871]  __schedule+0x2d1/0x830
[20976.219364]  schedule+0x35/0xa0
[20976.222510]  schedule_preempt_disabled+0xa/0x10
[20976.227043]  __mutex_lock.isra.7+0x310/0x420
[20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
[20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
[20976.285691]  auxiliary_bus_probe+0x45/0x70
[20976.289790]  really_probe+0x1f2/0x480
[20976.298509]  driver_probe_device+0x49/0xc0
[20976.302609]  bus_for_each_drv+0x79/0xc0
[20976.306448]  __device_attach+0xdc/0x160
[20976.310286]  bus_probe_device+0x9d/0xb0
[20976.314128]  device_add+0x43c/0x890
[20976.321287]  __auxiliary_device_add+0x43/0x60
[20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
[20976.342591]  process_one_work+0x1a7/0x360
[20976.350536]  worker_thread+0x30/0x390
[20976.358128]  kthread+0x10a/0x120
[20976.365547]  ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921]  __schedule+0x2d1/0x830
[20976.452414]  schedule+0x35/0xa0
[20976.455559]  schedule_preempt_disabled+0xa/0x10
[20976.460090]  __mutex_lock.isra.7+0x310/0x420
[20976.464364]  device_del+0x36/0x3c0
[20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288]  notifier_call_chain+0x47/0x70
[20976.481386]  __netdev_upper_dev_link+0x18b/0x280
[20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
[20976.494475]  do_setlink+0x336/0xf50
[20976.502517]  __rtnl_newlink+0x529/0x8b0
[20976.543441]  rtnl_newlink+0x43/0x60
[20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238]  netlink_rcv_skb+0x4c/0x120
[20976.563079]  netlink_unicast+0x196/0x230
[20976.567005]  netlink_sendmsg+0x204/0x3d0
[20976.570930]  sock_sendmsg+0x4c/0x50
[20976.574423]  ____sys_sendmsg+0x1eb/0x250
[20976.586807]  ___sys_sendmsg+0x7c/0xc0
[20976.606353]  __sys_sendmsg+0x57/0xa0
[20976.609930]  do_syscall_64+0x5b/0x1a0
[20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca

1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
   is called from ice_service_task() context, aux device is created
   and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
   RTNL lock and that notifier calls ice_unplug_aux_dev(). That
   function tries to take aux device->lock but this is already taken
   by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
   taken in step 2
4. Dead-lock

The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
  call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
  then ice_unplug_aux_dev() is called. If it is set (in other words
  plugging of aux device was requested and ice_plug_aux_dev() is
  potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
  the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
  whether it was already cleared by ice_clear_rdma_cap(). If so then
  aux device is unplugged.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 15:07:46 -08:00
..
3com ethernet: 3com/typhoon: don't write directly to netdev->dev_addr 2022-01-26 15:40:01 +00:00
8390 net:mcf8390: Use platform_get_irq() to get the interrupt 2022-03-09 12:14:30 +00:00
actions
adaptec
aeroflex
agere et131x: Remove useless DMA-32 fallback configuration 2022-01-09 16:52:19 -08:00
alacritech
allwinner net: ethernet: sun4i-emac: Fix an error handling path in emac_probe() 2022-01-15 22:34:52 +00:00
alteon net: alteon: Simplify DMA setting 2022-01-09 16:52:18 -08:00
altera net: altera: set a couple error code in probe() 2021-12-03 14:23:11 +00:00
amazon net: ena: Extract recurring driver reset code into a function 2022-01-07 19:25:52 -08:00
amd net: amd-xgbe: disable interrupts during pci removal 2022-02-09 12:52:59 +00:00
apm
apple net: apple: bmac: Fix build since dev_addr constification 2022-01-14 11:22:57 +00:00
aquantia net: atlantic: Use the bitmap API instead of hand-writing it 2022-01-24 12:57:01 +00:00
arc net: arc_emac: Fix use after free in arc_mdio_probe() 2022-03-10 14:49:21 -08:00
asix
atheros atl1c: fix tx timeout after link flap on Mikrotik 10/25G NIC 2022-02-11 14:41:02 -08:00
broadcom net: bcmgenet: Don't claim WOL when its not available 2022-03-10 14:54:32 -08:00
brocade bna: Simplify DMA setting 2022-01-09 16:52:18 -08:00
cadence net: macb: Fix lost RX packet wakeup race in NAPI receive 2022-03-04 12:05:54 +00:00
calxeda
cavium Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
chelsio net: chelsio: cxgb3: check the return value of pci_find_capability() 2022-02-25 12:52:05 +00:00
cirrus
cisco Updates for the interrupt subsystem: 2022-01-13 08:53:45 -08:00
cortina net: gemini: allow any RGMII interface mode 2022-01-05 10:31:22 -08:00
davicom
dec
dlink
emulex Updates for the interrupt subsystem: 2022-01-13 08:53:45 -08:00
engleder tsnep: Fix s390 devm_ioremap_resource warning 2021-12-17 19:22:57 -08:00
ezchip
faraday drivers/net/ftgmac100: fix DHCP potential failure with systemd 2022-02-23 12:50:19 +00:00
freescale gianfar: ethtool: Fix refcount leak in gfar_get_ts_info 2022-03-10 12:20:54 -08:00
fujitsu
google gve: Recording rx queue before sending to napi 2022-02-08 16:52:31 -08:00
hisilicon net: hns3: handle empty unknown interrupt for VF 2022-01-25 13:08:05 +00:00
huawei Updates for the interrupt subsystem: 2022-01-13 08:53:45 -08:00
i825xx ethernet: i825xx: don't write directly to netdev->dev_addr 2022-01-26 15:40:01 +00:00
ibm ibmvnic: Allow queueing resets during probe 2022-02-25 10:57:47 +00:00
intel ice: Fix race condition during interface enslave 2022-03-10 15:07:46 -08:00
litex net: ethernet: litex: Add the dependency on HAS_IOMEM 2022-02-08 20:43:40 -08:00
marvell net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr 2022-03-09 12:16:32 +00:00
mediatek net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config() 2022-01-15 22:33:17 +00:00
mellanox net/mlx5e: SHAMPO, reduce TIR indication 2022-03-09 11:39:35 -08:00
micrel Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-23 16:09:58 -08:00
microchip net: sparx5: Add #include to remove warning 2022-02-28 11:34:26 +00:00
microsoft Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
moxa
mscc net: mscc: ocelot: fix use-after-free in ocelot_vlan_del() 2022-02-15 14:38:20 +00:00
myricom myri10ge: Simplify DMA setting 2022-01-09 16:52:18 -08:00
natsemi
neterion net: vxge: Use dma_set_mask_and_coherent() and simplify code 2022-01-03 10:42:58 +00:00
netronome nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() 2022-02-18 21:08:14 -08:00
ni
nvidia
nxp net: ethernet: lpc_eth: Handle error for clk_enable 2022-03-09 12:15:20 +00:00
oki-semi net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX 2021-12-14 12:28:24 +00:00
packetengines
pasemi
pensando Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-30 12:12:12 -08:00
qlogic qed: return status of qed_iov_get_link 2022-03-07 12:22:28 +00:00
qualcomm
rdc
realtek Power management updates for 5.17-rc1 2022-01-10 20:34:00 -08:00
renesas net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX 2021-12-14 12:28:24 +00:00
rocker Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-09 17:00:17 -08:00
samsung net: sxgbe: fix return value of __setup handler 2022-02-25 08:53:13 -08:00
seeq ethernet: seeq/ether3: don't write directly to netdev->dev_addr 2022-01-26 15:40:01 +00:00
sfc sfc: extend the locking on mcdi->seqno 2022-03-03 14:11:58 +00:00
sgi
silan
sis
smsc ethernet: smc911x: fix indentation in get/set EEPROM 2022-02-01 19:59:03 -08:00
socionext Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
stmicro net: stmmac: fix return value of __setup handler 2022-02-25 08:53:17 -08:00
sun ethernet: sun: Free the coherent when failing in probing 2022-03-07 11:32:22 +00:00
synopsys
tehuti tehuti: Use dma_set_mask_and_coherent() and simplify code 2022-01-02 12:21:16 +00:00
ti net: ethernet: ti: cpts: Handle error for clk_enable 2022-03-09 12:12:46 +00:00
toshiba
tundra ethernet: tundra: don't write directly to netdev->dev_addr 2022-01-26 15:40:01 +00:00
vertexcom Revert "net: vertexcom: default to disabled on kbuild" 2022-01-10 21:11:07 -08:00
via
wiznet
xilinx ethernet: Fix error handling in xemaclite_of_probe 2022-03-08 22:15:01 -08:00
xircom
xscale net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX 2021-12-14 12:28:24 +00:00
dnet.c
dnet.h
ec_bhf.c
ethoc.c net: ethoc: Use platform_get_irq() to get the interrupt 2021-12-27 12:22:19 +00:00
fealnx.c
jme.c
jme.h
Kconfig net: vertexcom: Add MSE102x SPI support 2021-12-13 14:15:41 +00:00
korina.c
lantiq_etop.c net: lantiq_etop: remove unnecessary space in cast 2021-12-30 13:20:23 +00:00
lantiq_xrx200.c net: lantiq_xrx200: fix use after free bug 2022-03-07 11:29:35 +00:00
Makefile net: vertexcom: Add MSE102x SPI support 2021-12-13 14:15:41 +00:00