linux

iv/linux

Author	SHA1	Message	Date
Patrisious Haddad	885faf3c7e	RDMA/mlx5: Change the key being sent for MPV device affiliation commit 02e7d139e5e24abb5fde91934fc9dc0344ac1926 upstream. Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices. Instead use a unique value from inside the core device which is already initialized at this stage. Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-20 17:02:06 +01:00
Patrisious Haddad	fdd350fe5e	RDMA/mlx5: Send events from IB driver about device affiliation state [ Upstream commit 0d293714ac32650bfb669ceadf7cc2fad8161401 ] Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 762a55a54eec ("net/mlx5e: Disable IPsec offload support if not FW steering") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-20 17:01:44 +01:00
Shifeng Li	fc054130cd	RDMA/irdma: Avoid free the non-cqp_request scratch [ Upstream commit e3e82fcb79eeb3f1a88a89f676831773caff514a ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Mike Marciniszyn	926c1c7a8f	RDMA/irdma: Fix support for 64k pages [ Upstream commit 03769f72d66edab82484449ed594cb6b00ae0223 ] Virtual QP and CQ require a 4K HW page size but the driver passes PAGE_SIZE to ib_umem_find_best_pgsz() instead. Fix this by using the appropriate 4k value in the bitmap passed to ib_umem_find_best_pgsz(). Fixes: 693a5386eff0 ("RDMA/irdma: Split mr alloc and free into new functions") Link: https://lore.kernel.org/r/20231129202143.1434-4-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Mike Marciniszyn	12a77574f0	RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned [ Upstream commit 0a5ec366de7e94192669ba08de6ed336607fd282 ] The SQ is shared for between kernel and used by storing the kernel page pointer and passing that to a kmap_atomic(). This then requires that the alignment is PAGE_SIZE aligned. Fix by adding an iWarp specific alignment check. Fixes: e965ef0e7b2c ("RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qp") Link: https://lore.kernel.org/r/20231129202143.1434-3-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Mike Marciniszyn	d103c131ef	RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz [ Upstream commit 4fbc3a52cd4d14de3793f4b2c721d7306ea84cf9 ] 64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ <--- Live HCA page \|OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO\| <--- offset \| \| <--- VA \| MR data \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: 10c75ccb54e4 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Shifeng Li	55b6b95737	RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info() [ Upstream commit 2b78832f50c4d711e161b166d7d8790968051546 ] When removing the irdma driver or unplugging its aux device, the ccq queue is released before destorying the cqp_cmpl_wq queue. But in the window, there may still be completion events for wqes. That will cause a UAF in irdma_sc_ccq_get_cqe_info(). [34693.333191] BUG: KASAN: use-after-free in irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333194] Read of size 8 at addr ffff889097f80818 by task kworker/u67:1/26327 [34693.333194] [34693.333199] CPU: 9 PID: 26327 Comm: kworker/u67:1 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [34693.333200] Hardware name: SANGFOR Inspur/NULL, BIOS 4.1.13 08/01/2016 [34693.333211] Workqueue: cqp_cmpl_wq cqp_compl_worker [irdma] [34693.333213] Call Trace: [34693.333220] dump_stack+0x71/0xab [34693.333226] print_address_description+0x6b/0x290 [34693.333238] ? irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333240] kasan_report+0x14a/0x2b0 [34693.333251] irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333264] ? irdma_free_cqp_request+0x151/0x1e0 [irdma] [34693.333274] irdma_cqp_ce_handler+0x1fb/0x3b0 [irdma] [34693.333285] ? irdma_ctrl_init_hw+0x2c20/0x2c20 [irdma] [34693.333290] ? __schedule+0x836/0x1570 [34693.333293] ? strscpy+0x83/0x180 [34693.333296] process_one_work+0x56a/0x11f0 [34693.333298] worker_thread+0x8f/0xf40 [34693.333301] ? __kthread_parkme+0x78/0xf0 [34693.333303] ? rescuer_thread+0xc50/0xc50 [34693.333305] kthread+0x2a0/0x390 [34693.333308] ? kthread_destroy_worker+0x90/0x90 [34693.333310] ret_from_fork+0x1f/0x40 Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Shifeng Li <lishifeng1992@126.com> Link: https://lore.kernel.org/r/20231121101236.581694-1-lishifeng1992@126.com Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:14 +01:00
Kalesh AP	cfaede20f5	RDMA/bnxt_re: Correct module description string [ Upstream commit 422b19f7f006e813ee0865aadce6a62b3c263c42 ] The word "Driver" is repeated twice in the "modinfo bnxt_re" output description. Fix it. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1700555387-6277-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:14 +01:00
Jack Wang	ca19116cca	RDMA/rtrs-clt: Remove the warnings for req in_use check [ Upstream commit 0c8bb6eb70ca41031f663b4481aac9ac78b53bc6 ] As we chain the WR during write request: memory registration, rdma write, local invalidate, if only the last WR fail to send due to send queue overrun, the server can send back the reply, while client mark the req->in_use to false in case of error in rtrs_clt_req when error out from rtrs_post_rdma_write_sg. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Jack Wang	4b13d47c90	RDMA/rtrs-clt: Fix the max_send_wr setting [ Upstream commit 6d09f6f7d7584e099633282ea915988914f86529 ] For each write request, we need Request, Response Memory Registration, Local Invalidate. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Md Haris Iqbal	5aafdb5e93	RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight [ Upstream commit c4d32e77fc1006f99eeb78417efc3d81a384072a ] Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Md Haris Iqbal	7bb1941e7f	RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true [ Upstream commit 3a71cd6ca0ce33d1af019ecf1d7167406fa54400 ] Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Md Haris Iqbal	dfb398fdcb	RDMA/rtrs-srv: Check return values while processing info request [ Upstream commit ed1e52aefa16f15dc2f04054a3baf11726a7460e ] While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Jack Wang	30614ed295	RDMA/rtrs-clt: Start hb after path_up [ Upstream commit 3e44a61b5db873612e20e7b7922468d7d1ac2d22 ] If we start hb too early, it will confuse server side to close the session. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Jack Wang	b659d48fc4	RDMA/rtrs-srv: Do not unconditionally enable irq [ Upstream commit 3ee7ecd712048ade6482bea4b2f3dcaf039c0348 ] When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:13 +01:00
Mustafa Ismail	9b1b8ab2bd	RDMA/irdma: Add wait for suspend on SQD [ Upstream commit bd6da690c27d75cae432c09162d054b34fa2156f ] Currently, there is no wait for the QP suspend to complete on a modify to SQD state. Add a wait, after the modify to SQD state, for the Suspend Complete AE. While we are at it, update the suspend timeout value in irdma_prep_tc_change to use IRDMA_EVENT_TIMEOUT_MS too. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Mustafa Ismail	951c6d336e	RDMA/irdma: Do not modify to SQD on error [ Upstream commit ba12ab66aa83a2340a51ad6e74b284269745138c ] Remove the modify to SQD before going to ERROR state. It is not needed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Junxian Huang	38772f6672	RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm [ Upstream commit efb9cbf66440482ceaa90493d648226ab7ec2ebf ] Add a default congest control algorithm so that driver won't return an error when the configured algorithm is invalid. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231028093242.670325-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Shigeru Yoshida	24436fe77e	RDMA/core: Fix uninit-value access in ib_get_eth_speed() [ Upstream commit 0550d4604e2ca4e653dc13f0c009fc42106b6bfc ] KMSAN reported the following uninit-value access issue: lo speed is unknown, defaulting to 1000 ===================================================== BUG: KMSAN: uninit-value in ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline] BUG: KMSAN: uninit-value in ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998 ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline] ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline] siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203 setup_port_data drivers/infiniband/core/device.c:848 [inline] setup_device drivers/infiniband/core/device.c:1244 [inline] ib_register_device+0x1589/0x1df0 drivers/infiniband/core/device.c:1383 siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline] siw_newlink+0x129e/0x13d0 drivers/infiniband/sw/siw/siw_main.c:490 nldev_newlink+0x8fd/0xa60 drivers/infiniband/core/nldev.c:1763 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0xe8a/0x1120 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0xf4b/0x1230 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1242/0x1420 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x997/0xd60 net/socket.c:2588 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2642 __sys_sendmsg net/socket.c:2671 [inline] __do_sys_sendmsg net/socket.c:2680 [inline] __se_sys_sendmsg net/socket.c:2678 [inline] __x64_sys_sendmsg+0x2fa/0x4a0 net/socket.c:2678 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Local variable lksettings created at: ib_get_eth_speed+0x4b/0xaf0 drivers/infiniband/core/verbs.c:1974 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline] siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203 CPU: 0 PID: 11257 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 ===================================================== If __ethtool_get_link_ksettings() fails, `netdev_speed` is set to the default value, SPEED_1000. In this case, if `lanes` field of struct ethtool_link_ksettings is not initialized, an uninitialized value is passed to ib_get_width_and_speed(). This causes the above issue. This patch resolves the issue by initializing `lanes` to 0. Fixes: cb06b6b3f6cb ("RDMA/core: Get IB width and speed from netdev") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20231108143113.1360567-1-syoshida@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:11 +01:00
Ilpo Järvinen	69319be69f	RDMA/hfi1: Use FIELD_GET() to extract Link Width [ Upstream commit 8bf7187d978610b9e327a3d92728c8864a575ebd ] Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting, and remove extract_width() which only wraps that FIELD_GET(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230919125648.1920-2-ilpo.jarvinen@linux.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-28 17:19:42 +00:00
George Kennedy	6387f269d8	IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF [ Upstream commit 2ef422f063b74adcc4a4a9004b0a87bb55e0a836 ] In the unlikely event that workqueue allocation fails and returns NULL in mlx5_mkey_cache_init(), delete the call to mlx5r_umr_resource_cleanup() (which frees the QP) in mlx5_ib_stage_post_ib_reg_umr_init(). This will avoid attempted double free of the same QP when __mlx5_ib_add() does its cleanup. Resolves a splat: Syzkaller reported a UAF in ib_destroy_qp_user workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642): failed to create work queue infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642): mr cache init failed -12 ================================================================== BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642 Call Trace: <TASK> kasan_report (mm/kasan/report.c:590) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... </TASK> Allocated by task 1642: __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026 mm/slab_common.c:1039) create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720 ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209) ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347) mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Freed by task 1642: __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076 drivers/infiniband/hw/mlx5/main.c:4065) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") Link: https://lore.kernel.org/r/1698170518-4006-1-git-send-email-george.kennedy@oracle.com Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:23 +01:00
Leon Romanovsky	6ce9304d26	RDMA/hfi1: Workaround truncation compilation error [ Upstream commit d4b2d165714c0ce8777d5131f6e0aad617b7adc4 ] Increase name array to be large enough to overcome the following compilation error. drivers/infiniband/hw/hfi1/efivar.c: In function ‘read_hfi1_efi_var’: drivers/infiniband/hw/hfi1/efivar.c:124:44: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:124:9: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/efivar.c:133:52: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:133:17: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:243: drivers/infiniband/hw/hfi1/efivar.o] Error 1 Fixes: c03c08d50b3d ("IB/hfi1: Check upper-case EFI variables") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/238fa39a8fd60e87a5ad7e1ca6584fcdf32e9519.1698159993.git.leonro@nvidia.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:22 +01:00
Junxian Huang	cd294f7c60	RDMA/hns: Fix init failure of RoCE VF and HIP08 [ Upstream commit 07f06e0e5cd99555c861e874716d9e2627655fd5 ] During device init, a struct for HW stats will be allocated. As HW stats are not supported for VF and HIP08, currently hns_roce_alloc_hw_port_stats() returns NULL in this case. However, ib-core considers the returned NULL pointer as memory allocation failure and returns ENOMEM, eventually leading to the failure of VF and HIP08 init. In the case where the driver does not support the .alloc_hw_port_stats() ops, ib-core will return EOPNOTSUPP and ignore this error code in the upper layer function. So for VF and HIP08, just don't set the HW stats ops to ib-core. Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Junxian Huang	1210137c7f	RDMA/hns: Fix unnecessary port_num transition in HW stats allocation [ Upstream commit b4a797b894dc91a541ea230db6fa00cc74683bfd ] The num_ports capability of devices should be compared with the number of port(i.e. the input param "port_num") but not the port index(i.e. port_num - 1). Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Luoyouming	ceb48ed167	RDMA/hns: The UD mode can only be configured with DCQCN [ Upstream commit 27c5fd271d8b8730fc0bb1b6cae953ad7808a874 ] Due to hardware limitations, only DCQCN is supported for UD. Therefore, the default algorithm for UD is set to DCQCN. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Luoyouming	74161f98a4	RDMA/hns: Add check for SL [ Upstream commit 5e617c18b1f34ec57ad5dce44f09de603cf6bd6c ] SL set by users may exceed the capability of devices. So add check for this situation. Fixes: fba429fcf9a5 ("RDMA/hns: Fix missing fields in address vector") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: f0cb411aad23 ("RDMA/hns: Use new interface to modify QP context") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	b2dcb5207e	RDMA/hns: Fix signed-unsigned mixed comparisons [ Upstream commit b5f9efff101b06fd06a5e280a2b00b1335f5f476 ] The ib_mtu_enum_to_int() and uverbs_attr_get_len() may returns a negative value. In this case, mixed comparisons of signed and unsigned types will throw wrong results. This patch adds judgement for this situation. Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	d7e674b71f	RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common() [ Upstream commit c64e9710f9241e38a1c761ed1c1a30854784da66 ] ucmd in hns_roce_create_qp_common() are not initialized. But it works fine until new member sdb_addr is added to struct hns_roce_ib_create_qp. If the user-mode driver uses an old version ABI, then the value of the new member will be undefined after ib_copy_from_udata(). This patch fixes it by initialize this variable to 0. And the default value of the new member sdb_addr will be 0 which is invalid. Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	1a9db0d265	RDMA/hns: Fix printing level of asynchronous events [ Upstream commit 9faef73ef4f6666b97e04d99734ac09251098185 ] The current driver will print all asynchronous events. Some of the print levels are set improperly, e.g. SRQ limit reach and SRQ last wqe reach, which may also occur during normal operation of the software. Currently, the information of these event is printed as a warning, which causes a large amount of printing even during normal use of the application. As a result, the service performance deteriorates. This patch fixes the printing storms by modifying the print level. Fixes: b00a92c8f2ca ("RDMA/hns: Move all prints out of irq handle") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Patrisious Haddad	2eeff1f91a	IB/mlx5: Fix rdma counter binding for RAW QP [ Upstream commit c1336bb4aa5e809a622a87d74311275514086596 ] Previously when we had a RAW QP, we bound a counter to it when it moved to INIT state, using the counter context inside RQC. But when we try to modify that counter later in RTS state we used modify QP which tries to change the counter inside QPC instead of RQC. Now we correctly modify the counter set_id inside of RQC instead of QPC for the RAW QP. Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/2e5ab6713784a8fe997d19c508187a0dfecf2dfc.1696847964.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Gustavo A. R. Silva	119d4350a5	RDMA/core: Use size_{add,sub,mul}() in calls to struct_size() [ Upstream commit 81760bedc65194ff38e1e4faefd5f9f0c95c19a4 ] If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` provides against potential integer overflows is defeated. Fix this by hardening calls to `struct_size()` with `size_add()`, `size_sub()` and `size_mul()`. Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") Fixes: a4676388e2e2 ("RDMA/core: Simplify how the gid_attrs sysfs is created") Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") Fixes: 324e227ea7c9 ("RDMA/device: Add ib_device_get_by_netdev()") Fixes: 5aad26a7eac5 ("IB/core: Use struct_size() in kzalloc()") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:18 +01:00
Leon Romanovsky	c38d23a544	RDMA/core: Require admin capabilities to set system parameters Like any other set command, require admin permissions to do it. Cc: stable@vger.kernel.org Fixes: 2b34c5580226 ("RDMA/core: Add command to set ib_core device net namspace sharing mode") Link: https://lore.kernel.org/r/75d329fdd7381b52cbdf87910bef16c9965abb1f.1696443438.git.leon@kernel.org Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-10-05 20:01:47 +03:00
Leon Romanovsky	c99a7457e5	RDMA/mlx5: Remove not-used cache disable flag During execution of mlx5_mkey_cache_cleanup(), there is a guarantee that MR are not registered and/or destroyed. It means that we don't need newly introduced cache disable flag. Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-10-02 14:32:44 +03:00
Mark Zhang	e0fe97efdb	RDMA/cma: Initialize ib_sa_multicast structure to 0 when join Initialize the structure to 0 so that it's fields won't have random values. For example fields like rec.traffic_class (as well as rec.flow_label and rec.sl) is used to generate the user AH through: cma_iboe_join_multicast cma_make_mc_event ib_init_ah_from_mcmember And a random traffic_class causes a random IP DSCP in RoCEv2. Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/20230927090511.603595-1-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 13:10:40 +03:00
Shay Drory	374012b004	RDMA/mlx5: Fix mkey cache possible deadlock on cleanup Fix the deadlock by refactoring the MR cache cleanup flow to flush the workqueue without holding the rb_lock. This adds a race between cache cleanup and creation of new entries which we solve by denied creation of new entries after cache cleanup started. Lockdep: WARNING: possible circular locking dependency detected [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted [ 2785.339778 ] ------------------------------------------------------ [ 2785.340848 ] devlink/53872 is trying to acquire lock: [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900 [ 2785.343403 ] [ 2785.343403 ] but task is already holding lock: [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] [ 2785.346273 ] [ 2785.346273 ] which lock already depends on the new lock. [ 2785.346273 ] [ 2785.347720 ] [ 2785.347720 ] the existing dependency chain (in reverse order) is: [ 2785.349003 ] [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}: [ 2785.350160 ] __mutex_lock+0x14c/0x15c0 [ 2785.350962 ] delayed_cache_work_func+0x2d1/0x610 [mlx5_ib] [ 2785.352044 ] process_one_work+0x7c2/0x1310 [ 2785.352879 ] worker_thread+0x59d/0xec0 [ 2785.353636 ] kthread+0x28f/0x330 [ 2785.354370 ] ret_from_fork+0x1f/0x30 [ 2785.355135 ] [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}: [ 2785.356515 ] __lock_acquire+0x2d8a/0x5fe0 [ 2785.357349 ] lock_acquire+0x1c1/0x540 [ 2785.358121 ] __flush_work+0xe8/0x900 [ 2785.358852 ] __cancel_work_timer+0x2c7/0x3f0 [ 2785.359711 ] mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib] [ 2785.360781 ] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib] [ 2785.361969 ] __mlx5_ib_remove+0x68/0x120 [mlx5_ib] [ 2785.362960 ] mlx5r_remove+0x63/0x80 [mlx5_ib] [ 2785.363870 ] auxiliary_bus_remove+0x52/0x70 [ 2785.364715 ] device_release_driver_internal+0x3c1/0x600 [ 2785.365695 ] bus_remove_device+0x2a5/0x560 [ 2785.366525 ] device_del+0x492/0xb80 [ 2785.367276 ] mlx5_detach_device+0x1a9/0x360 [mlx5_core] [ 2785.368615 ] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core] [ 2785.369934 ] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core] [ 2785.371292 ] devlink_reload+0x439/0x590 [ 2785.372075 ] devlink_nl_cmd_reload+0xaef/0xff0 [ 2785.372973 ] genl_family_rcv_msg_doit.isra.0+0x1bd/0x290 [ 2785.374011 ] genl_rcv_msg+0x3ca/0x6c0 [ 2785.374798 ] netlink_rcv_skb+0x12c/0x360 [ 2785.375612 ] genl_rcv+0x24/0x40 [ 2785.376295 ] netlink_unicast+0x438/0x710 [ 2785.377121 ] netlink_sendmsg+0x7a1/0xca0 [ 2785.377926 ] sock_sendmsg+0xc5/0x190 [ 2785.378668 ] __sys_sendto+0x1bc/0x290 [ 2785.379440 ] __x64_sys_sendto+0xdc/0x1b0 [ 2785.380255 ] do_syscall_64+0x3d/0x90 [ 2785.381031 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 2785.381967 ] [ 2785.381967 ] other info that might help us debug this: [ 2785.381967 ] [ 2785.383448 ] Possible unsafe locking scenario: [ 2785.383448 ] [ 2785.384544 ] CPU0 CPU1 [ 2785.385383 ] ---- ---- [ 2785.386193 ] lock(&dev->cache.rb_lock); [ 2785.386940 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.388327 ] lock(&dev->cache.rb_lock); [ 2785.389425 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.390414 ] [ 2785.390414 ] * DEADLOCK * [ 2785.390414 ] [ 2785.391579 ] 6 locks held by devlink/53872: [ 2785.392341 ] #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 [ 2785.393630 ] #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0 [ 2785.395324 ] #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core] [ 2785.397322 ] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core] [ 2785.399231 ] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 2785.400864 ] #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] Fixes: b95845178328 ("RDMA/mlx5: Change the cache structure to an RB-tree") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-09-26 12:33:53 +03:00
Shay Drory	dab994bcc6	RDMA/mlx5: Fix NULL string error checkpath is complaining about NULL string, change it to 'Unknown'. Fixes: 37aa5c36aa70 ("IB/mlx5: Add UARs write-combining and non-cached mapping") Signed-off-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:44 +03:00
Hamdan Igbaria	2fad8f06a5	RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation The mutex was not unlocked on some of the error flows. Moved the unlock location to include all the error flow scenarios. Fixes: e1f4a52ac171 ("RDMA/mlx5: Create an indirect flow table for steering anchor") Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Link: https://lore.kernel.org/r/1244a69d783da997c0af0b827c622eb00495492e.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:40 +03:00
Michael Guralnik	4f14c6c021	RDMA/mlx5: Fix assigning access flags to cache mkeys After the change to use dynamic cache structure, new cache entries can be added and the mkey allocation can no longer assume that all mkeys created for the cache have access_flags equal to zero. Example of a flow that exposes the issue: A user registers MR with RO on a HCA that cannot UMR RO and the mkey is created outside of the cache. When the user deregisters the MR, a new cache entry is created to store mkeys with RO. Later, the user registers 2 MRs with RO. The first MR is reused from the new cache entry. When we try to get the second mkey from the cache we see the entry is empty so we go to the MR cache mkey allocation flow which would have allocated a mkey with no access flags, resulting the user getting a MR without RO. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:33 +03:00
Christophe JAILLET	d7f393430a	IB/mlx4: Fix the size of a buffer in add_port_entries() In order to be sure that 'buff' is never truncated, its size should be 12, not 11. When building with W=1, this fixes the following warnings: drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’: drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 268 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 268 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 286 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 286 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-23 21:53:24 +03:00
Selvin Xavier	a83c692789	RDMA/bnxt_re: Decrement resource stats correctly rc_qp_count and ud_qp_count is not decremented during qp destroy. Fix this. Fixes: cb95709e0dca ("bnxt_re: Update the hw counters for resource stats") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
Selvin Xavier	9fc5f9a92f	RDMA/bnxt_re: Fix the handling of control path response data Flag that indicate control path command completion should be cleared only after copying the command response data. As soon as the is_in_used flag is clear, the waiting thread can proceed with wrong response data. This wrong data is causing multiple issues like wrong lkey used in data traffic and wrong AH Id etc. Use a memory barrier to ensure that the response data is copied and visible to the process waiting on a different cpu core before clearing the is_in_used flag. Clear the is_in_used after copying the command response. Fixes: bcfee4ce3e01 ("RDMA/bnxt_re: remove redundant cmdq_bitmap") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
Leon Romanovsky	18126c7676	RDMA/cma: Fix truncation compilation warning in make_cma_ports The following compilation error is false alarm as RDMA devices don't have such large amount of ports to actually cause to format truncation. drivers/infiniband/core/cma_configfs.c: In function ‘make_cma_ports’: drivers/infiniband/core/cma_configfs.c:223:57: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 223 \| snprintf(port_str, sizeof(port_str), "%u", i + 1); \| ^ drivers/infiniband/core/cma_configfs.c:223:17: note: ‘snprintf’ output between 2 and 11 bytes into a destination of size 10 223 \| snprintf(port_str, sizeof(port_str), "%u", i + 1); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[5]: *** [scripts/Makefile.build:243: drivers/infiniband/core/cma_configfs.o] Error 1 Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm") Link: https://lore.kernel.org/r/a7e3b347ee134167fa6a3787c56ef231a04bc8c2.1694434639.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-09-18 11:52:10 +03:00
Cheng Xu	b2abdffb50	RDMA/erdma: Fix NULL pointer access in regmr_cmd Fix the crash of regmr_cmd called by erdma_ib_alloc_mr. The reason is that mr->mem.mtt is not initialized but it is accessed in regmr_cmd. The call trace information: BUG: kernel NULL pointer dereference, address: 0000000000000000 <...> RIP: 0010:regmr_cmd+0x170/0x1c0 [erdma] <...> Call Trace: ? __die+0x20/0x70 ? page_fault_oops+0x66/0x150 ? do_user_addr_fault+0x61/0x660 ? exc_page_fault+0x65/0x140 ? asm_exc_page_fault+0x22/0x30 ? regmr_cmd+0x170/0x1c0 [erdma] ? preempt_count_add+0x70/0xa0 ? _raw_spin_lock_irqsave+0x19/0x50 ? _raw_spin_unlock_irqrestore+0x1b/0x40 ? erdma_alloc_idx+0x51/0x90 [erdma] erdma_get_dma_mr+0xa3/0x120 [erdma] __ib_alloc_pd+0xeb/0x1c0 [ib_core] Fixes: 7244b4aa4221 ("RDMA/erdma: Refactor the storage structure of MTT entries") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/3d140c1d-524a-4dbe-a51c-aee4f7ecafdb@moroto.mountain/ Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230908060559.80203-1-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 10:42:19 +03:00
Dan Carpenter	6b5f0749ce	RDMA/erdma: Fix error code in erdma_create_scatter_mtt() The erdma_create_scatter_mtt() function is supposed to return error pointers. Returning NULL will lead to an Oops. Fixes: ed10435d3583 ("RDMA/erdma: Implement hierarchical MTT") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/1eb400d5-d8a3-4a8e-b3da-c43c6c377f86@moroto.mountain Acked-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 09:12:00 +03:00
Konstantin Meskhidze	c489800e0d	RDMA/uverbs: Fix typo of sizeof argument Since size of 'hdr' pointer and '*hdr' structure is equal on 64-bit machines issue probably didn't cause any wrong behavior. But anyway, fixing of typo is required. Fixes: da0f60df7bd5 ("RDMA/uverbs: Prohibit write() calls with too small buffers") Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20230905103258.1738246-1-konstantin.meskhidze@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 15:01:09 +03:00
Artem Chernyshev	8fb8a82086	RDMA/cxgb4: Check skb value for failure to allocate get_skb() can fail to allocate skb, so check it. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 5be78ee924ae ("RDMA/cxgb4: Fix LE hash collision bug for active open connection") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Link: https://lore.kernel.org/r/20230905124048.284165-1-artem.chernyshev@red-soft.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:55:53 +03:00
Bernard Metzler	53a3f77704	RDMA/siw: Fix connection failure handling In case immediate MPA request processing fails, the newly created endpoint unlinks the listening endpoint and is ready to be dropped. This special case was not handled correctly by the code handling the later TCP socket close, causing a NULL dereference crash in siw_cm_work_handler() when dereferencing a NULL listener. We now also cancel the useless MPA timeout, if immediate MPA request processing fails. This patch furthermore simplifies MPA processing in general: Scheduling a useless TCP socket read in sk_data_ready() upcall is now surpressed, if the socket is already moved out of TCP_ESTABLISHED state. Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230905145822.446263-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:53:23 +03:00
Bart Van Assche	e193b7955d	RDMA/srp: Do not call scsi_done() from srp_abort() After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler callback, it performs one of the following actions: * Call scsi_queue_insert(). * Call scsi_finish_command(). * Call scsi_eh_scmd_add(). Hence, SCSI abort handlers must not call scsi_done(). Otherwise all the above actions would trigger a use-after-free. Hence remove the scsi_done() call from srp_abort(). Keep the srp_free_req() call before returning SUCCESS because we may not see the command again if SUCCESS is returned. Cc: Bob Pearson <rpearsonhpe@gmail.com> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230823205727.505681-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:15:58 +03:00
Linus Torvalds	b89b029377	SCSI misc on 20230902 Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and the usual minor updates and bug fixes but no significant core changes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZPLlcCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXH4AP9Mm8Hk ICCySnXJ/VcxrAU2ekUlPxGT8CNT2WqRhIqKegEAu0jagPyNcTjX+oBCNVOM33zU p5uU6IEHDA3d36qu04w= =8gks -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and the usual minor updates and bug fixes but no significant core changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits) scsi: storvsc: Handle additional SRB status values scsi: libsas: Delete sas_ata_task.retry_count scsi: libsas: Delete sas_ata_task.stp_affil_pol scsi: libsas: Delete sas_ata_task.set_affil_pol scsi: libsas: Delete sas_ssp_task.task_prio scsi: libsas: Delete sas_ssp_task.enable_first_burst scsi: libsas: Delete sas_ssp_task.retry_count scsi: libsas: Delete struct scsi_core scsi: libsas: Delete enum sas_phy_type scsi: libsas: Delete enum sas_class scsi: libsas: Delete sas_ha_struct.lldd_module scsi: target: Fix write perf due to unneeded throttling scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE scsi: pm8001: Remove unused declarations scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock scsi: elx: sli4: Remove code duplication scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s scsi: qla2xxx: Remove unused declarations scsi: pmcraid: Use pci_dev_id() to simplify the code scsi: pm80xx: Set RETFIS when requested by libsas ...	2023-09-02 12:02:41 -07:00
Linus Torvalds	f7e97ce269	v6.6 merge window RDMA pull request Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu C27jj4cztemMBb8mgwyxPw/wLg7NLwY= =pC6k -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits) IB/hfi1: Reduce printing of errors during driver shut down RDMA/hfi1: Move user SDMA system memory pinning code to its own file RDMA/hfi1: Use list_for_each_entry() helper RDMA/mlx5: Fix trailing */ formatting in block comment RDMA/rxe: Fix redundant break statement in switch-case. RDMA/efa: Fix wrong resources deallocation order RDMA/siw: Call llist_reverse_order in siw_run_sq RDMA/siw: Correct wrong debug message RDMA/siw: Balance the reference of cep->kref in the error path Revert "IB/isert: Fix incorrect release of isert connection" RDMA/bnxt_re: Fix kernel doc errors RDMA/irdma: Prevent zero-length STAG registration RDMA/erdma: Implement hierarchical MTT RDMA/erdma: Refactor the storage structure of MTT entries RDMA/erdma: Renaming variable names and field names of struct erdma_mem RDMA/hns: Support hns HW stats RDMA/hns: Dump whole QP/CQ/MR resource in raw RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() RDMA/mlx4: Copy union directly RDMA/irdma: Drop unused kernel push code ...	2023-09-01 16:49:33 -07:00

1 2 3 4 5 ...

13975 Commits