linux/drivers/infiniband/hw/mlx5
Shay Drory 374012b004 RDMA/mlx5: Fix mkey cache possible deadlock on cleanup
Fix the deadlock by refactoring the MR cache cleanup flow to flush the
workqueue without holding the rb_lock.
This adds a race between cache cleanup and creation of new entries which
we solve by denied creation of new entries after cache cleanup started.

Lockdep:
WARNING: possible circular locking dependency detected
 [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted
 [ 2785.339778 ] ------------------------------------------------------
 [ 2785.340848 ] devlink/53872 is trying to acquire lock:
 [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900
 [ 2785.343403 ]
 [ 2785.343403 ] but task is already holding lock:
 [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]
 [ 2785.346273 ]
 [ 2785.346273 ] which lock already depends on the new lock.
 [ 2785.346273 ]
 [ 2785.347720 ]
 [ 2785.347720 ] the existing dependency chain (in reverse order) is:
 [ 2785.349003 ]
 [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}:
 [ 2785.350160 ]        __mutex_lock+0x14c/0x15c0
 [ 2785.350962 ]        delayed_cache_work_func+0x2d1/0x610 [mlx5_ib]
 [ 2785.352044 ]        process_one_work+0x7c2/0x1310
 [ 2785.352879 ]        worker_thread+0x59d/0xec0
 [ 2785.353636 ]        kthread+0x28f/0x330
 [ 2785.354370 ]        ret_from_fork+0x1f/0x30
 [ 2785.355135 ]
 [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}:
 [ 2785.356515 ]        __lock_acquire+0x2d8a/0x5fe0
 [ 2785.357349 ]        lock_acquire+0x1c1/0x540
 [ 2785.358121 ]        __flush_work+0xe8/0x900
 [ 2785.358852 ]        __cancel_work_timer+0x2c7/0x3f0
 [ 2785.359711 ]        mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib]
 [ 2785.360781 ]        mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib]
 [ 2785.361969 ]        __mlx5_ib_remove+0x68/0x120 [mlx5_ib]
 [ 2785.362960 ]        mlx5r_remove+0x63/0x80 [mlx5_ib]
 [ 2785.363870 ]        auxiliary_bus_remove+0x52/0x70
 [ 2785.364715 ]        device_release_driver_internal+0x3c1/0x600
 [ 2785.365695 ]        bus_remove_device+0x2a5/0x560
 [ 2785.366525 ]        device_del+0x492/0xb80
 [ 2785.367276 ]        mlx5_detach_device+0x1a9/0x360 [mlx5_core]
 [ 2785.368615 ]        mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core]
 [ 2785.369934 ]        mlx5_devlink_reload_down+0x292/0x580 [mlx5_core]
 [ 2785.371292 ]        devlink_reload+0x439/0x590
 [ 2785.372075 ]        devlink_nl_cmd_reload+0xaef/0xff0
 [ 2785.372973 ]        genl_family_rcv_msg_doit.isra.0+0x1bd/0x290
 [ 2785.374011 ]        genl_rcv_msg+0x3ca/0x6c0
 [ 2785.374798 ]        netlink_rcv_skb+0x12c/0x360
 [ 2785.375612 ]        genl_rcv+0x24/0x40
 [ 2785.376295 ]        netlink_unicast+0x438/0x710
 [ 2785.377121 ]        netlink_sendmsg+0x7a1/0xca0
 [ 2785.377926 ]        sock_sendmsg+0xc5/0x190
 [ 2785.378668 ]        __sys_sendto+0x1bc/0x290
 [ 2785.379440 ]        __x64_sys_sendto+0xdc/0x1b0
 [ 2785.380255 ]        do_syscall_64+0x3d/0x90
 [ 2785.381031 ]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
 [ 2785.381967 ]
 [ 2785.381967 ] other info that might help us debug this:
 [ 2785.381967 ]
 [ 2785.383448 ]  Possible unsafe locking scenario:
 [ 2785.383448 ]
 [ 2785.384544 ]        CPU0                    CPU1
 [ 2785.385383 ]        ----                    ----
 [ 2785.386193 ]   lock(&dev->cache.rb_lock);
 [ 2785.386940 ]				lock((work_completion)(&(&ent->dwork)->work));
 [ 2785.388327 ]				lock(&dev->cache.rb_lock);
 [ 2785.389425 ]   lock((work_completion)(&(&ent->dwork)->work));
 [ 2785.390414 ]
 [ 2785.390414 ]  *** DEADLOCK ***
 [ 2785.390414 ]
 [ 2785.391579 ] 6 locks held by devlink/53872:
 [ 2785.392341 ]  #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
 [ 2785.393630 ]  #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0
 [ 2785.395324 ]  #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core]
 [ 2785.397322 ]  #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core]
 [ 2785.399231 ]  #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
 [ 2785.400864 ]  #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]

Fixes: b958451783 ("RDMA/mlx5: Change the cache structure to an RB-tree")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-09-26 12:33:53 +03:00
..
ah.c
cmd.c RDMA/mlx5: Use query_special_contexts for mkeys 2023-02-17 16:22:23 -04:00
cmd.h RDMA/mlx5: Use query_special_contexts for mkeys 2023-02-17 16:22:23 -04:00
cong.c IB/mlx5: Extend debug control for CC parameters 2023-02-19 11:50:59 +02:00
counters.c IB/mlx5: Add HW counter called rx_dct_connect 2023-07-31 11:40:32 +03:00
counters.h
cq.c net/mlx5: Allocate completion EQs dynamically 2023-08-07 10:53:52 -07:00
devx.c net/mlx5: Allocate completion EQs dynamically 2023-08-07 10:53:52 -07:00
devx.h RDMA/mlx5: Attach ndescs to mlx5_ib_mkey 2021-10-19 14:42:53 +03:00
dm.c RDMA/mlx5: Support handling of modify-header pattern ICM area 2022-06-13 14:58:01 -07:00
dm.h RDMA/mlx5: Expose UAPI to query DM 2021-04-13 19:36:37 -03:00
doorbell.c net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
fs.c RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation 2023-09-26 12:29:40 +03:00
fs.h RDMA/mlx5: Create an indirect flow table for steering anchor 2023-06-11 11:25:34 +03:00
gsi.c net/mlx5: Lag, expose number of lag ports 2022-05-09 22:54:00 -07:00
ib_rep.c {net/RDMA}/mlx5: introduce lag_for_each_peer 2023-06-07 14:00:42 -07:00
ib_rep.h
ib_virt.c RDMA/mlx5: Delete useless module.h include 2022-01-28 13:03:12 -04:00
Kconfig
macsec.c RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion 2023-08-20 12:35:24 +03:00
macsec.h RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion 2023-08-20 12:35:24 +03:00
mad.c RDMA/mlx: Remove unnecessary variable initializations 2023-07-31 10:05:23 +03:00
main.c RDMA/mlx5: Fix NULL string error 2023-09-26 12:29:44 +03:00
Makefile RDMA/mlx5: Implement MACsec gid addition and deletion 2023-08-20 12:35:24 +03:00
mem.c IB/mlx5: Remove duplicate header inclusion related to ODP 2022-08-23 11:22:13 +03:00
mlx5_ib.h RDMA/mlx5: Fix mkey cache possible deadlock on cleanup 2023-09-26 12:33:53 +03:00
mr.c RDMA/mlx5: Fix mkey cache possible deadlock on cleanup 2023-09-26 12:33:53 +03:00
odp.c Merge mlx5-next into rdma.git for-next 2023-02-17 16:24:14 -04:00
qos.c
qp.c RDMA/mlx5: Fix affinity assignment 2023-06-11 11:27:17 +03:00
qp.h RDMA/mlx5: Handle DCT QP logic separately from low level QP interface 2023-06-11 11:21:40 +03:00
qpc.c RDMA/mlx5: Return the firmware result upon destroying QP/RQ 2023-06-11 11:21:46 +03:00
restrack.c
restrack.h
srq_cmd.c
srq.c RDMA/mlx5: Use query_special_contexts for mkeys 2023-02-17 16:22:23 -04:00
srq.h
std_types.c RDMA/mlx5: Fill port info based on the relevant eswitch 2021-08-05 13:49:24 -07:00
umr.c RDMA/mlx5: Allow relaxed ordering read in VFs and VMs 2023-04-16 13:29:26 +03:00
umr.h RDMA/mlx5: Allow relaxed ordering read in VFs and VMs 2023-04-16 13:29:26 +03:00
wr.c RDMA/mlx5: Use query_special_contexts for mkeys 2023-02-17 16:22:23 -04:00
wr.h RDMA/mlx5: Expose wqe posting helpers outside of wr.c 2022-04-25 11:53:00 -03:00