linux

iv/linux

Author	SHA1	Message	Date
Chengchang Tang	e785018a7f	RDMA/hns: Fix memory leak in free_mr_init() [ Upstream commit 288f535951aa81ed674f5e5477ab11b9d9351b8c ] When a reserved QP fails to be created, the memory of the remaining created reserved QPs is leaked. Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-25 15:35:34 -08:00
Junxian Huang	ada27426b0	RDMA/hns: Fix inappropriate err code for unsupported operations [ Upstream commit f45b83ad39f8033e717b1eee57e81811113d5a84 ] EOPNOTSUPP is more situable than EINVAL for allocating XRCD while XRC is not supported and unsupported resizing SRQ. Fixes: 32548870d438 ("RDMA/hns: Add support for XRC on HIP09") Fixes: 221109e64316 ("RDMA/hns: Add interception for resizing SRQs") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231114123449.1106162-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-25 15:35:31 -08:00
Leon Romanovsky	fff32018b0	RDMA/usnic: Silence uninitialized symbol smatch warnings [ Upstream commit b9a85e5eec126d6ae6c362f94b447c223e8fe6e4 ] The patch 1da177e4c3f4: "Linux-2.6.12-rc2" from Apr 16, 2005 (linux-next), leads to the following Smatch static checker warning: drivers/infiniband/hw/mthca/mthca_cmd.c:644 mthca_SYS_EN() error: uninitialized symbol 'out'. drivers/infiniband/hw/mthca/mthca_cmd.c 636 int mthca_SYS_EN(struct mthca_dev dev) 637 { 638 u64 out; 639 int ret; 640 641 ret = mthca_cmd_imm(dev, 0, &out, 0, 0, CMD_SYS_EN, CMD_TIME_CLASS_D); We pass out here and it gets used without being initialized. err = mthca_cmd_post(dev, in_param, out_param ? out_param : 0, ^^^^^^^^^^ in_modifier, op_modifier, op, context->token, 1); It's the same in mthca_cmd_wait() and mthca_cmd_poll(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/533bc3df-8078-4397-b93d-d1f6cec9b636@moroto.mountain Link: https://lore.kernel.org/r/c559cb7113158c02d75401ac162652072ef1b5f0.1699867650.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-25 15:35:31 -08:00
Moshe Shemesh	04ebb29dc9	RDMA/mlx5: Fix mkey cache WQ flush [ Upstream commit a53e215f90079f617360439b1b6284820731e34c ] The cited patch tries to ensure no pending works on the mkey cache workqueue by disabling adding new works and call flush_workqueue(). But this workqueue also has delayed works which might still be pending the delay time to be queued. Add cancel_delayed_work() for the delayed works which waits to be queued and then the flush_workqueue() will flush all works which are already queued and running. Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/b8722f14e7ed81452f791764a26d2ed4cfa11478.1698256179.git.leon@kernel.org Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-01-10 17:16:55 +01:00
Patrisious Haddad	885faf3c7e	RDMA/mlx5: Change the key being sent for MPV device affiliation commit 02e7d139e5e24abb5fde91934fc9dc0344ac1926 upstream. Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices. Instead use a unique value from inside the core device which is already initialized at this stage. Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-20 17:02:06 +01:00
Patrisious Haddad	fdd350fe5e	RDMA/mlx5: Send events from IB driver about device affiliation state [ Upstream commit 0d293714ac32650bfb669ceadf7cc2fad8161401 ] Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 762a55a54eec ("net/mlx5e: Disable IPsec offload support if not FW steering") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-20 17:01:44 +01:00
Shifeng Li	fc054130cd	RDMA/irdma: Avoid free the non-cqp_request scratch [ Upstream commit e3e82fcb79eeb3f1a88a89f676831773caff514a ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Mike Marciniszyn	926c1c7a8f	RDMA/irdma: Fix support for 64k pages [ Upstream commit 03769f72d66edab82484449ed594cb6b00ae0223 ] Virtual QP and CQ require a 4K HW page size but the driver passes PAGE_SIZE to ib_umem_find_best_pgsz() instead. Fix this by using the appropriate 4k value in the bitmap passed to ib_umem_find_best_pgsz(). Fixes: 693a5386eff0 ("RDMA/irdma: Split mr alloc and free into new functions") Link: https://lore.kernel.org/r/20231129202143.1434-4-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Mike Marciniszyn	12a77574f0	RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned [ Upstream commit 0a5ec366de7e94192669ba08de6ed336607fd282 ] The SQ is shared for between kernel and used by storing the kernel page pointer and passing that to a kmap_atomic(). This then requires that the alignment is PAGE_SIZE aligned. Fix by adding an iWarp specific alignment check. Fixes: e965ef0e7b2c ("RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qp") Link: https://lore.kernel.org/r/20231129202143.1434-3-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:16 +01:00
Shifeng Li	55b6b95737	RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info() [ Upstream commit 2b78832f50c4d711e161b166d7d8790968051546 ] When removing the irdma driver or unplugging its aux device, the ccq queue is released before destorying the cqp_cmpl_wq queue. But in the window, there may still be completion events for wqes. That will cause a UAF in irdma_sc_ccq_get_cqe_info(). [34693.333191] BUG: KASAN: use-after-free in irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333194] Read of size 8 at addr ffff889097f80818 by task kworker/u67:1/26327 [34693.333194] [34693.333199] CPU: 9 PID: 26327 Comm: kworker/u67:1 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [34693.333200] Hardware name: SANGFOR Inspur/NULL, BIOS 4.1.13 08/01/2016 [34693.333211] Workqueue: cqp_cmpl_wq cqp_compl_worker [irdma] [34693.333213] Call Trace: [34693.333220] dump_stack+0x71/0xab [34693.333226] print_address_description+0x6b/0x290 [34693.333238] ? irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333240] kasan_report+0x14a/0x2b0 [34693.333251] irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333264] ? irdma_free_cqp_request+0x151/0x1e0 [irdma] [34693.333274] irdma_cqp_ce_handler+0x1fb/0x3b0 [irdma] [34693.333285] ? irdma_ctrl_init_hw+0x2c20/0x2c20 [irdma] [34693.333290] ? __schedule+0x836/0x1570 [34693.333293] ? strscpy+0x83/0x180 [34693.333296] process_one_work+0x56a/0x11f0 [34693.333298] worker_thread+0x8f/0xf40 [34693.333301] ? __kthread_parkme+0x78/0xf0 [34693.333303] ? rescuer_thread+0xc50/0xc50 [34693.333305] kthread+0x2a0/0x390 [34693.333308] ? kthread_destroy_worker+0x90/0x90 [34693.333310] ret_from_fork+0x1f/0x40 Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Shifeng Li <lishifeng1992@126.com> Link: https://lore.kernel.org/r/20231121101236.581694-1-lishifeng1992@126.com Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:14 +01:00
Kalesh AP	cfaede20f5	RDMA/bnxt_re: Correct module description string [ Upstream commit 422b19f7f006e813ee0865aadce6a62b3c263c42 ] The word "Driver" is repeated twice in the "modinfo bnxt_re" output description. Fix it. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1700555387-6277-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:14 +01:00
Mustafa Ismail	9b1b8ab2bd	RDMA/irdma: Add wait for suspend on SQD [ Upstream commit bd6da690c27d75cae432c09162d054b34fa2156f ] Currently, there is no wait for the QP suspend to complete on a modify to SQD state. Add a wait, after the modify to SQD state, for the Suspend Complete AE. While we are at it, update the suspend timeout value in irdma_prep_tc_change to use IRDMA_EVENT_TIMEOUT_MS too. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Mustafa Ismail	951c6d336e	RDMA/irdma: Do not modify to SQD on error [ Upstream commit ba12ab66aa83a2340a51ad6e74b284269745138c ] Remove the modify to SQD before going to ERROR state. It is not needed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Junxian Huang	38772f6672	RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm [ Upstream commit efb9cbf66440482ceaa90493d648226ab7ec2ebf ] Add a default congest control algorithm so that driver won't return an error when the configured algorithm is invalid. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231028093242.670325-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-13 18:45:12 +01:00
Ilpo Järvinen	69319be69f	RDMA/hfi1: Use FIELD_GET() to extract Link Width [ Upstream commit 8bf7187d978610b9e327a3d92728c8864a575ebd ] Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting, and remove extract_width() which only wraps that FIELD_GET(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230919125648.1920-2-ilpo.jarvinen@linux.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-28 17:19:42 +00:00
George Kennedy	6387f269d8	IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF [ Upstream commit 2ef422f063b74adcc4a4a9004b0a87bb55e0a836 ] In the unlikely event that workqueue allocation fails and returns NULL in mlx5_mkey_cache_init(), delete the call to mlx5r_umr_resource_cleanup() (which frees the QP) in mlx5_ib_stage_post_ib_reg_umr_init(). This will avoid attempted double free of the same QP when __mlx5_ib_add() does its cleanup. Resolves a splat: Syzkaller reported a UAF in ib_destroy_qp_user workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642): failed to create work queue infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642): mr cache init failed -12 ================================================================== BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642 Call Trace: <TASK> kasan_report (mm/kasan/report.c:590) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... </TASK> Allocated by task 1642: __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026 mm/slab_common.c:1039) create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720 ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209) ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347) mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Freed by task 1642: __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076 drivers/infiniband/hw/mlx5/main.c:4065) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") Link: https://lore.kernel.org/r/1698170518-4006-1-git-send-email-george.kennedy@oracle.com Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:23 +01:00
Leon Romanovsky	6ce9304d26	RDMA/hfi1: Workaround truncation compilation error [ Upstream commit d4b2d165714c0ce8777d5131f6e0aad617b7adc4 ] Increase name array to be large enough to overcome the following compilation error. drivers/infiniband/hw/hfi1/efivar.c: In function ‘read_hfi1_efi_var’: drivers/infiniband/hw/hfi1/efivar.c:124:44: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:124:9: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/efivar.c:133:52: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:133:17: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:243: drivers/infiniband/hw/hfi1/efivar.o] Error 1 Fixes: c03c08d50b3d ("IB/hfi1: Check upper-case EFI variables") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/238fa39a8fd60e87a5ad7e1ca6584fcdf32e9519.1698159993.git.leonro@nvidia.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:22 +01:00
Junxian Huang	cd294f7c60	RDMA/hns: Fix init failure of RoCE VF and HIP08 [ Upstream commit 07f06e0e5cd99555c861e874716d9e2627655fd5 ] During device init, a struct for HW stats will be allocated. As HW stats are not supported for VF and HIP08, currently hns_roce_alloc_hw_port_stats() returns NULL in this case. However, ib-core considers the returned NULL pointer as memory allocation failure and returns ENOMEM, eventually leading to the failure of VF and HIP08 init. In the case where the driver does not support the .alloc_hw_port_stats() ops, ib-core will return EOPNOTSUPP and ignore this error code in the upper layer function. So for VF and HIP08, just don't set the HW stats ops to ib-core. Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Junxian Huang	1210137c7f	RDMA/hns: Fix unnecessary port_num transition in HW stats allocation [ Upstream commit b4a797b894dc91a541ea230db6fa00cc74683bfd ] The num_ports capability of devices should be compared with the number of port(i.e. the input param "port_num") but not the port index(i.e. port_num - 1). Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Luoyouming	ceb48ed167	RDMA/hns: The UD mode can only be configured with DCQCN [ Upstream commit 27c5fd271d8b8730fc0bb1b6cae953ad7808a874 ] Due to hardware limitations, only DCQCN is supported for UD. Therefore, the default algorithm for UD is set to DCQCN. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Luoyouming	74161f98a4	RDMA/hns: Add check for SL [ Upstream commit 5e617c18b1f34ec57ad5dce44f09de603cf6bd6c ] SL set by users may exceed the capability of devices. So add check for this situation. Fixes: fba429fcf9a5 ("RDMA/hns: Fix missing fields in address vector") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: f0cb411aad23 ("RDMA/hns: Use new interface to modify QP context") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	b2dcb5207e	RDMA/hns: Fix signed-unsigned mixed comparisons [ Upstream commit b5f9efff101b06fd06a5e280a2b00b1335f5f476 ] The ib_mtu_enum_to_int() and uverbs_attr_get_len() may returns a negative value. In this case, mixed comparisons of signed and unsigned types will throw wrong results. This patch adds judgement for this situation. Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	d7e674b71f	RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common() [ Upstream commit c64e9710f9241e38a1c761ed1c1a30854784da66 ] ucmd in hns_roce_create_qp_common() are not initialized. But it works fine until new member sdb_addr is added to struct hns_roce_ib_create_qp. If the user-mode driver uses an old version ABI, then the value of the new member will be undefined after ib_copy_from_udata(). This patch fixes it by initialize this variable to 0. And the default value of the new member sdb_addr will be 0 which is invalid. Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Chengchang Tang	1a9db0d265	RDMA/hns: Fix printing level of asynchronous events [ Upstream commit 9faef73ef4f6666b97e04d99734ac09251098185 ] The current driver will print all asynchronous events. Some of the print levels are set improperly, e.g. SRQ limit reach and SRQ last wqe reach, which may also occur during normal operation of the software. Currently, the information of these event is printed as a warning, which causes a large amount of printing even during normal use of the application. As a result, the service performance deteriorates. This patch fixes the printing storms by modifying the print level. Fixes: b00a92c8f2ca ("RDMA/hns: Move all prints out of irq handle") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Patrisious Haddad	2eeff1f91a	IB/mlx5: Fix rdma counter binding for RAW QP [ Upstream commit c1336bb4aa5e809a622a87d74311275514086596 ] Previously when we had a RAW QP, we bound a counter to it when it moved to INIT state, using the counter context inside RQC. But when we try to modify that counter later in RTS state we used modify QP which tries to change the counter inside QPC instead of RQC. Now we correctly modify the counter set_id inside of RQC instead of QPC for the RAW QP. Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/2e5ab6713784a8fe997d19c508187a0dfecf2dfc.1696847964.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-11-20 11:59:21 +01:00
Leon Romanovsky	c99a7457e5	RDMA/mlx5: Remove not-used cache disable flag During execution of mlx5_mkey_cache_cleanup(), there is a guarantee that MR are not registered and/or destroyed. It means that we don't need newly introduced cache disable flag. Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-10-02 14:32:44 +03:00
Shay Drory	374012b004	RDMA/mlx5: Fix mkey cache possible deadlock on cleanup Fix the deadlock by refactoring the MR cache cleanup flow to flush the workqueue without holding the rb_lock. This adds a race between cache cleanup and creation of new entries which we solve by denied creation of new entries after cache cleanup started. Lockdep: WARNING: possible circular locking dependency detected [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted [ 2785.339778 ] ------------------------------------------------------ [ 2785.340848 ] devlink/53872 is trying to acquire lock: [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900 [ 2785.343403 ] [ 2785.343403 ] but task is already holding lock: [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] [ 2785.346273 ] [ 2785.346273 ] which lock already depends on the new lock. [ 2785.346273 ] [ 2785.347720 ] [ 2785.347720 ] the existing dependency chain (in reverse order) is: [ 2785.349003 ] [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}: [ 2785.350160 ] __mutex_lock+0x14c/0x15c0 [ 2785.350962 ] delayed_cache_work_func+0x2d1/0x610 [mlx5_ib] [ 2785.352044 ] process_one_work+0x7c2/0x1310 [ 2785.352879 ] worker_thread+0x59d/0xec0 [ 2785.353636 ] kthread+0x28f/0x330 [ 2785.354370 ] ret_from_fork+0x1f/0x30 [ 2785.355135 ] [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}: [ 2785.356515 ] __lock_acquire+0x2d8a/0x5fe0 [ 2785.357349 ] lock_acquire+0x1c1/0x540 [ 2785.358121 ] __flush_work+0xe8/0x900 [ 2785.358852 ] __cancel_work_timer+0x2c7/0x3f0 [ 2785.359711 ] mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib] [ 2785.360781 ] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib] [ 2785.361969 ] __mlx5_ib_remove+0x68/0x120 [mlx5_ib] [ 2785.362960 ] mlx5r_remove+0x63/0x80 [mlx5_ib] [ 2785.363870 ] auxiliary_bus_remove+0x52/0x70 [ 2785.364715 ] device_release_driver_internal+0x3c1/0x600 [ 2785.365695 ] bus_remove_device+0x2a5/0x560 [ 2785.366525 ] device_del+0x492/0xb80 [ 2785.367276 ] mlx5_detach_device+0x1a9/0x360 [mlx5_core] [ 2785.368615 ] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core] [ 2785.369934 ] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core] [ 2785.371292 ] devlink_reload+0x439/0x590 [ 2785.372075 ] devlink_nl_cmd_reload+0xaef/0xff0 [ 2785.372973 ] genl_family_rcv_msg_doit.isra.0+0x1bd/0x290 [ 2785.374011 ] genl_rcv_msg+0x3ca/0x6c0 [ 2785.374798 ] netlink_rcv_skb+0x12c/0x360 [ 2785.375612 ] genl_rcv+0x24/0x40 [ 2785.376295 ] netlink_unicast+0x438/0x710 [ 2785.377121 ] netlink_sendmsg+0x7a1/0xca0 [ 2785.377926 ] sock_sendmsg+0xc5/0x190 [ 2785.378668 ] __sys_sendto+0x1bc/0x290 [ 2785.379440 ] __x64_sys_sendto+0xdc/0x1b0 [ 2785.380255 ] do_syscall_64+0x3d/0x90 [ 2785.381031 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 2785.381967 ] [ 2785.381967 ] other info that might help us debug this: [ 2785.381967 ] [ 2785.383448 ] Possible unsafe locking scenario: [ 2785.383448 ] [ 2785.384544 ] CPU0 CPU1 [ 2785.385383 ] ---- ---- [ 2785.386193 ] lock(&dev->cache.rb_lock); [ 2785.386940 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.388327 ] lock(&dev->cache.rb_lock); [ 2785.389425 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.390414 ] [ 2785.390414 ] * DEADLOCK * [ 2785.390414 ] [ 2785.391579 ] 6 locks held by devlink/53872: [ 2785.392341 ] #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 [ 2785.393630 ] #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0 [ 2785.395324 ] #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core] [ 2785.397322 ] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core] [ 2785.399231 ] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 2785.400864 ] #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] Fixes: b95845178328 ("RDMA/mlx5: Change the cache structure to an RB-tree") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-09-26 12:33:53 +03:00
Shay Drory	dab994bcc6	RDMA/mlx5: Fix NULL string error checkpath is complaining about NULL string, change it to 'Unknown'. Fixes: 37aa5c36aa70 ("IB/mlx5: Add UARs write-combining and non-cached mapping") Signed-off-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:44 +03:00
Hamdan Igbaria	2fad8f06a5	RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation The mutex was not unlocked on some of the error flows. Moved the unlock location to include all the error flow scenarios. Fixes: e1f4a52ac171 ("RDMA/mlx5: Create an indirect flow table for steering anchor") Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Link: https://lore.kernel.org/r/1244a69d783da997c0af0b827c622eb00495492e.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:40 +03:00
Michael Guralnik	4f14c6c021	RDMA/mlx5: Fix assigning access flags to cache mkeys After the change to use dynamic cache structure, new cache entries can be added and the mkey allocation can no longer assume that all mkeys created for the cache have access_flags equal to zero. Example of a flow that exposes the issue: A user registers MR with RO on a HCA that cannot UMR RO and the mkey is created outside of the cache. When the user deregisters the MR, a new cache entry is created to store mkeys with RO. Later, the user registers 2 MRs with RO. The first MR is reused from the new cache entry. When we try to get the second mkey from the cache we see the entry is empty so we go to the MR cache mkey allocation flow which would have allocated a mkey with no access flags, resulting the user getting a MR without RO. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:33 +03:00
Christophe JAILLET	d7f393430a	IB/mlx4: Fix the size of a buffer in add_port_entries() In order to be sure that 'buff' is never truncated, its size should be 12, not 11. When building with W=1, this fixes the following warnings: drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’: drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 268 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 268 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 286 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 286 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-23 21:53:24 +03:00
Selvin Xavier	a83c692789	RDMA/bnxt_re: Decrement resource stats correctly rc_qp_count and ud_qp_count is not decremented during qp destroy. Fix this. Fixes: cb95709e0dca ("bnxt_re: Update the hw counters for resource stats") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
Selvin Xavier	9fc5f9a92f	RDMA/bnxt_re: Fix the handling of control path response data Flag that indicate control path command completion should be cleared only after copying the command response data. As soon as the is_in_used flag is clear, the waiting thread can proceed with wrong response data. This wrong data is causing multiple issues like wrong lkey used in data traffic and wrong AH Id etc. Use a memory barrier to ensure that the response data is copied and visible to the process waiting on a different cpu core before clearing the is_in_used flag. Clear the is_in_used after copying the command response. Fixes: bcfee4ce3e01 ("RDMA/bnxt_re: remove redundant cmdq_bitmap") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
Cheng Xu	b2abdffb50	RDMA/erdma: Fix NULL pointer access in regmr_cmd Fix the crash of regmr_cmd called by erdma_ib_alloc_mr. The reason is that mr->mem.mtt is not initialized but it is accessed in regmr_cmd. The call trace information: BUG: kernel NULL pointer dereference, address: 0000000000000000 <...> RIP: 0010:regmr_cmd+0x170/0x1c0 [erdma] <...> Call Trace: ? __die+0x20/0x70 ? page_fault_oops+0x66/0x150 ? do_user_addr_fault+0x61/0x660 ? exc_page_fault+0x65/0x140 ? asm_exc_page_fault+0x22/0x30 ? regmr_cmd+0x170/0x1c0 [erdma] ? preempt_count_add+0x70/0xa0 ? _raw_spin_lock_irqsave+0x19/0x50 ? _raw_spin_unlock_irqrestore+0x1b/0x40 ? erdma_alloc_idx+0x51/0x90 [erdma] erdma_get_dma_mr+0xa3/0x120 [erdma] __ib_alloc_pd+0xeb/0x1c0 [ib_core] Fixes: 7244b4aa4221 ("RDMA/erdma: Refactor the storage structure of MTT entries") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/3d140c1d-524a-4dbe-a51c-aee4f7ecafdb@moroto.mountain/ Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230908060559.80203-1-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 10:42:19 +03:00
Dan Carpenter	6b5f0749ce	RDMA/erdma: Fix error code in erdma_create_scatter_mtt() The erdma_create_scatter_mtt() function is supposed to return error pointers. Returning NULL will lead to an Oops. Fixes: ed10435d3583 ("RDMA/erdma: Implement hierarchical MTT") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/1eb400d5-d8a3-4a8e-b3da-c43c6c377f86@moroto.mountain Acked-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 09:12:00 +03:00
Artem Chernyshev	8fb8a82086	RDMA/cxgb4: Check skb value for failure to allocate get_skb() can fail to allocate skb, so check it. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 5be78ee924ae ("RDMA/cxgb4: Fix LE hash collision bug for active open connection") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Link: https://lore.kernel.org/r/20230905124048.284165-1-artem.chernyshev@red-soft.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:55:53 +03:00
Linus Torvalds	f7e97ce269	v6.6 merge window RDMA pull request Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu C27jj4cztemMBb8mgwyxPw/wLg7NLwY= =pC6k -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits) IB/hfi1: Reduce printing of errors during driver shut down RDMA/hfi1: Move user SDMA system memory pinning code to its own file RDMA/hfi1: Use list_for_each_entry() helper RDMA/mlx5: Fix trailing */ formatting in block comment RDMA/rxe: Fix redundant break statement in switch-case. RDMA/efa: Fix wrong resources deallocation order RDMA/siw: Call llist_reverse_order in siw_run_sq RDMA/siw: Correct wrong debug message RDMA/siw: Balance the reference of cep->kref in the error path Revert "IB/isert: Fix incorrect release of isert connection" RDMA/bnxt_re: Fix kernel doc errors RDMA/irdma: Prevent zero-length STAG registration RDMA/erdma: Implement hierarchical MTT RDMA/erdma: Refactor the storage structure of MTT entries RDMA/erdma: Renaming variable names and field names of struct erdma_mem RDMA/hns: Support hns HW stats RDMA/hns: Dump whole QP/CQ/MR resource in raw RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() RDMA/mlx4: Copy union directly RDMA/irdma: Drop unused kernel push code ...	2023-09-01 16:49:33 -07:00
Linus Torvalds	bd6c11bc43	Networking changes for 6.6. Core ---- - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations. - Store netdevs in an xarray, to simplify iterating over netdevs. - Refactor nexthop selection for multipath routes. - Improve sched class lifetime handling. - Add backup nexthop ID support for bridge. - Implement drop reasons support in openvswitch. - Several data races annotations and fixes. - Constify the sk parameter of routing functions. - Prepend kernel version to netconsole message. Protocols --------- - Implement support for TCP probing the peer being under memory pressure. - Remove hard coded limitation on IPv6 specific info placement inside the socket struct. - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor. - Scaling-up the IPv6 expired route GC via a separated list of expiring routes. - In-kernel support for the TLS alert protocol. - Better support for UDP reuseport with connected sockets. - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size. - Get rid of additional ancillary per MPTCP connection struct socket. - Implement support for BPF-based MPTCP packet schedulers. - Format MPTCP subtests selftests results in TAP. - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation. BPF --- - Multi-buffer support in AF_XDP. - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds. - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it. - Add SO_REUSEPORT support for TC bpf_sk_assign. - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64. - Support defragmenting IPv(4\|6) packets in BPF. - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling. - Introduce bpf map element count and enable it for all program types. - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy. - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress. - Add uprobe support for the bpf_get_func_ip helper. - Check skb ownership against full socket. - Support for up to 12 arguments in BPF trampoline. - Extend link_info for kprobe_multi and perf_event links. Netfilter --------- - Speed-up process exit by aborting ruleset validation if a fatal signal is pending. - Allow NLA_POLICY_MASK to be used with BE16/BE32 types. Driver API ---------- - Page pool optimizations, to improve data locality and cache usage. - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers. - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info. - Extend and use the yaml devlink specs to [re]generate the split ops. - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes. - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool. - Remove phylink legacy mode support. - Support offload LED blinking to phy. - Add devlink port function attributes for IPsec. New hardware / drivers ---------------------- - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers ------- - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP DCW8VGFOZjZM =9CsR -----END PGP SIGNATURE----- Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4\|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...	2023-08-29 11:33:01 -07:00
Linus Torvalds	615e95831e	v6.6-vfs.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2 XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0= =eJz5 -----END PGP SIGNATURE----- Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...	2023-08-28 09:31:32 -07:00
Jakub Kicinski	3c5066c6b0	Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== mlx5 MACsec RoCEv2 support From Patrisious: This series extends previously added MACsec offload support to cover RoCE traffic either. In order to achieve that, we need configure MACsec with offload between the two endpoints, like below: REMOTE_MAC=10:70:fd:43:71:c0 * ip addr add 1.1.1.1/16 dev eth2 * ip link set dev eth2 up * ip link add link eth2 macsec0 type macsec encrypt on * ip macsec offload macsec0 mac * ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 * ip macsec add macsec0 rx port 1 address $REMOTE_MAC * ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5 * ip addr add 10.1.0.1/16 dev macsec0 * ip link set dev macsec0 up And in a similar manner on the other machine, while noting the keys order would be reversed and the MAC address of the other machine. RDMA traffic is separated through relevant GID entries and in case of IP ambiguity issue - meaning we have a physical GIDs and a MACsec GIDs with the same IP/GID, we disable our physical GID in order to force the user to only use the MACsec GID. v0: https://lore.kernel.org/netdev/20230813064703.574082-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion net/mlx5: Add RoCE MACsec steering infrastructure in core net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic net/mlx5: Configure MACsec steering for egress RoCEv2 traffic IB/core: Reorder GID delete code for RoCE net/mlx5: Add MACsec priorities in RDMA namespaces RDMA/mlx5: Implement MACsec gid addition and deletion net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering net/mlx5: Remove netdevice from MACsec steering net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style net/mlx5: Remove dependency of macsec flow steering on ethernet net/mlx5e: Move MACsec flow steering operations to be used as core library macsec: add functions to get macsec real netdevice and check offload ==================== Link: https://lore.kernel.org/r/20230821073833.59042-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-24 11:32:18 -07:00
Petr Pavlu	7d22b1cb9d	mlx4: Connect the infiniband part to the auxiliary bus Use the auxiliary bus to perform device management of the infiniband part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:28 +01:00
Petr Pavlu	73d68002a0	mlx4: Replace the mlx4_interface.event callback with a notifier Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Petr Pavlu	7ba189ac52	mlx4: Use 'void ' as the event param of mlx4_dispatch_event() Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void ' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void ' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void '. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Petr Pavlu	71ab55a9af	mlx4: Get rid of the mlx4_interface.get_dev callback Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Douglas Miller	f5acc36b07	IB/hfi1: Reduce printing of errors during driver shut down The driver prints unnecessary prints for error conditions on shutdown remove them to quiet it down. Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/169271327832.1855761.3756156924805531643.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:31:45 +03:00
Brendan Cunningham	d2c0234634	RDMA/hfi1: Move user SDMA system memory pinning code to its own file Move user SDMA system memory page-pinning code from user_sdma.c to pin_system.c. Put declarations for non-static functions in pinning.h. System memory pinning is necessary for processing user SDMA requests but actual steps are invisible to user SDMA request-processing code. Moving system memory pinning code for user SDMA to its own file makes this distinction apparent. These changes have no effect on userspace. Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/169271327311.1855761.4736714053318724062.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:31:45 +03:00
Jinjie Ruan	3d91dfe72a	RDMA/hfi1: Use list_for_each_entry() helper Convert list_for_each() to list_for_each_entry() so that the pos list_head pointer and list_entry() call are no longer needed, which can reduce a few lines of code. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230822033539.3692453-1-ruanjinjie@huawei.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:28:27 +03:00
Rohit Chavan	d3c2245754	RDMA/mlx5: Fix trailing / formatting in block comment Resolved a formatting issue where the trailing / in a block comment was placed on a same line instead of separate line. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230822120451.8215-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:27:16 +03:00
Yonatan Nachum	dc202c57e9	RDMA/efa: Fix wrong resources deallocation order When trying to destroy QP or CQ, we first decrease the refcount and potentially free memory regions allocated for the object and then request the device to destroy the object. If the device fails, the object isn't fully destroyed so the user/IB core can try to destroy the object again which will lead to underflow when trying to decrease an already zeroed refcount. Deallocate resources in reverse order of allocating them to safely free them. Fixes: ff6629f88c52 ("RDMA/efa: Do not delay freeing of DMA pages") Reviewed-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://lore.kernel.org/r/20230822082725.31719-1-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:21:53 +03:00
Leon Romanovsky	c6c0052df2	RDMA/bnxt_re: Fix kernel doc errors Fix set of the following errors due to use of wrong kernel doc format to describe function parameters: drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:68: warning: Function parameter or member 'rcfw' not described in '__wait_for_resp' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308180600.oOnkIAQV-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180401.iaj2ktTc-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180214.Lt9NAhbM-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180055.6zM4AK6V-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308172136.ipx1wvs6-lkp@intel.com/ Link: https://lore.kernel.org/r/4b22c385f1b68590ace8f82f2985d14b20999432.1692539554.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-21 10:40:38 +03:00

1 2 3 4 5 ...

8575 Commits