linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	af3877265d	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...	2023-04-29 17:21:24 -07:00
Linus Torvalds	556eb8b791	Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...	2023-04-27 11:53:57 -07:00
Linus Torvalds	b9dff2195f	Merge tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux Pull ITER_UBUF updates from Jens Axboe: "This turns singe vector imports into ITER_UBUF, rather than ITER_IOVEC. The former is more trivial to iterate and advance, and hence a bit more efficient. From some very unscientific testing, ~60% of all iovec imports are single vector" * tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux: iov_iter: Mark copy_compat_iovec_from_user() noinline iov_iter: import single vector iovecs as ITER_UBUF iov_iter: convert import_single_range() to ITER_UBUF iov_iter: overlay struct iovec and ubuf/len iov_iter: set nr_segs = 1 for ITER_UBUF iov_iter: remove iov_iter_iovec() iov_iter: add iter_iov_addr() and iter_iov_len() helpers ALSA: pcm: check for user backed iterator, not specific iterator type IB/qib: check for user backed iterator, not specific iterator type IB/hfi1: check for user backed iterator, not specific iterator type iov_iter: add iter_iovec() helper block: ensure bio_alloc_map_data() deals with ITER_UBUF correctly	2023-04-24 10:29:28 -07:00
Yonatan Nachum	531094dc71	RDMA/efa: Add rdma write capability to device caps Add rdma write capability that is propagated from the device to rdma-core. Enable MR creation with remote write permissions according to this device capability. Link: https://lore.kernel.org/r/20230404154313.35194-1-ynachum@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-21 19:18:58 -03:00
Mark Zhang	746aa3c8cb	RDMA/mlx5: Use correct device num_ports when modify DC Just like other QP types, when modify DC, the port_num should be compared with dev->num_ports, instead of HCA_CAP.num_ports. Otherwise Multi-port vHCA on DC may not work. Fixes: `776a3906b6` ("IB/mlx5: Add support for DC target QP") Link: https://lore.kernel.org/r/20230420013906.1244185-1-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-21 12:36:47 -03:00
Tejun Heo	109205b40a	RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call Workqueue is in the process of cleaning up the distinction between unbound workqueues w/ @nr_active==1 and ordered workqueues. Explicit WQ_UNBOUND isn't needed for alloc_ordered_workqueue() and will trigger a warning in the future. Let's remove it. This doesn't cause any functional changes. Link: https://lore.kernel.org/r/ZEGW-IcFReR1juVM@slm.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-21 12:35:31 -03:00
Mark Bloch	3e358ea861	RDMA/mlx5: Fix flow counter query via DEVX Commit cited in "fixes" tag added bulk support for flow counters but it didn't account that's also possible to query a counter using a non-base id if the counter was allocated as bulk. When a user performs a query, validate the flow counter id given in the mailbox is inside the valid range taking bulk value into account. Fixes: `208d70f562` ("IB/mlx5: Support flow counters offset for bulk counters") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-18 08:47:10 +03:00
Avihai Horon	bd4ba605c4	RDMA/mlx5: Allow relaxed ordering read in VFs and VMs According to PCIe spec, Enable Relaxed Ordering value in the VF's PCI config space is wired to 0 and PF relaxed ordering (RO) setting should be applied to the VF. In QEMU (and maybe others), when assigning VFs, the RO bit in PCI config space is not emulated properly and is always set to 0. Therefore, pcie_relaxed_ordering_enabled() always returns 0 for VFs and VMs and thus MKeys can't be created with RO read even if the PF supports it. pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_read_pci_enabled HCA capability is out of sync with FW. With the new relaxed_ordering_read capability this can't happen, as it's set regardless of RO value in PCI config space and thus can't change during runtime. Hence, to allow RO read in VFs and VMs, use the new HCA capability relaxed_ordering_read without checking pcie_relaxed_ordering_enabled(). The old capability checks are kept for backward compatibility with older FWs. Allowing RO in VFs and VMs is valuable since it can greatly improve performance on some setups. For example, testing throughput of a VF on an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance improvement. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Link: https://lore.kernel.org/r/e7048640d66c341a8fa0465e099926e7989184bc.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 13:29:26 +03:00
Avihai Horon	ccbbfe0682	net/mlx5: Update relaxed ordering read HCA capabilities Rename existing HCA capability relaxed_ordering_read to relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM change to better describe the capability, as it's set only if both the device supports relaxed ordering (RO) read and RO is enabled in PCI config space. In addition, add new HCA capability relaxed_ordering_read which is set if the device supports RO read, regardless of RO in PCI config space. This will be used in the following patch to allow RO in VFs and VMs. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 13:29:19 +03:00
Avihai Horon	d43b020b0f	RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR relaxed_ordering_read HCA capability is set if both the device supports relaxed ordering (RO) read and RO is set in PCI config space. RO in PCI config space can change during runtime. This will change the value of relaxed_ordering_read HCA capability in FW, but the driver will not see it since it queries the capabilities only once. This can lead to the following scenario: 1. RO in PCI config space is enabled. 2. User creates MKey without RO. 3. RO in PCI config space is disabled. As a result, relaxed_ordering_read HCA capability is turned off in FW but remains on in driver copy of the capabilities. 4. User requests to reconfig the MKey with RO via UMR. 5. Driver will try to reconfig the MKey with RO read although it shouldn't (as relaxed_ordering_read HCA capability is really off). To fix this, check pcie_relaxed_ordering_enabled() before setting RO read in UMR. Fixes: `896ec97353` ("RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/8d39eb8317e7bed1a354311a20ae707788fd94ed.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 13:29:14 +03:00
Avihai Horon	ed4b0661cc	RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_{read,write} HCA capabilities are out of sync with FW. While this can happen with relaxed_ordering_read, it can't happen with relaxed_ordering_write as it's set if the device supports RO write, regardless of RO in PCI config space, and thus can't change during runtime. Therefore, drop the pcie_relaxed_ordering_enabled() check for relaxed_ordering_write while keeping it for relaxed_ordering_read. Doing so will also allow the usage of RO write in VFs and VMs (where RO in PCI config space is not reported/emulated properly). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/7e8f55e31572c1702d69cae015a395d3a824a38a.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 13:29:07 +03:00
Christophe JAILLET	a2e20b29cf	RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() There is no need to zero 'pktsize' bytes of 'buf', only the header needs to be cleared, to be safe. All the other bytes are already written with some memcpy() at the end of the function. Doing so also gives the opportunity to the compiler to avoid the memset() call. It can be inlined now that the length is known as compile time. Link: https://lore.kernel.org/r/098e3c397be0436f1867899245ecfe656c472110.1675369386.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-13 12:17:45 -03:00
Patrick Kelsey	866694afd6	IB/hfi1: Place struct mmu_rb_handler on cache line start Place struct mmu_rb_handler on cache line start like so: struct mmu_rb_handler h; void free_ptr; int ret; free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL); if (!free_ptr) return -ENOMEM; h = PTR_ALIGN(free_ptr, cache_line_size()); Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to start after the next cacheline using the "____cacheline_aligned_in_smp" annotation. Allocating an additional cache_line_size() - 1 bytes to place struct mmu_rb_handler on a cache line start does increase memory consumption. However, few struct mmu_rb_handler are created when hfi1 is in use. As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed frequently, the advantage of having them both within a cache line is expected to outweigh the disadvantage of the additional memory consumption per struct mmu_rb_handler. Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-09 13:27:34 +03:00
Patrick Kelsey	00cbce5cbf	IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests hfi1 user SDMA request processing has two bugs that can cause data corruption for user SDMA requests that have multiple payload iovecs where an iovec other than the tail iovec does not run up to the page boundary for the buffer pointed to by that iovec.a Here are the specific bugs: 1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len. Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec to the packet, even if some of those bytes are past iovec->iov.iov_len and are thus not intended to be in the packet. 2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the next iovec in user_sdma_request->iovs when the current iovec is not PAGE_SIZE and does not contain enough data to complete the packet. The transmitted packet will contain the wrong data from the iovec pages. This has not been an issue with SDMA packets from hfi1 Verbs or PSM2 because they only produce iovecs that end short of PAGE_SIZE as the tail iovec of an SDMA request. Fixing these bugs exposes other bugs with the SDMA pin cache (struct mmu_rb_handler) that get in way of supporting user SDMA requests with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So this commit fixes those issues as well. Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec payload user SDMA requests can hit: 1. Overlapping memory ranges in mmu_rb_handler will result in duplicate pinnings. 2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node), the mmu_rb code (1) removes the existing entry under a lock, (2) releases that lock, pins the new pages, (3) then reacquires the lock to insert the extended mmu_rb_node. If someone else comes in and inserts an overlapping entry between (2) and (3), insert in (3) will fail. The failure path code in this case unpins _all_ pages in either the original mmu_rb_node or the new mmu_rb_node that was inserted between (2) and (3). 3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node could be evicted by another thread that gets mmu_rb_handler->lock and checks mmu_rb_node->refcount before mmu_rb_node->refcount is incremented. 4. Related to #2 above, SDMA request submission failure path does not check mmu_rb_node->refcount before freeing mmu_rb_node object. If there are other SDMA requests in progress whose iovecs have pointers to the now-freed mmu_rb_node(s), those pointers to the now-freed mmu_rb nodes will be dereferenced when those SDMA requests complete. Fixes: `7be85676f1` ("IB/hfi1: Don't remove RB entry when not needed.") Fixes: `7724105686` ("IB/hfi1: add driver files") Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-09 13:27:34 +03:00
Patrick Kelsey	9fe8fec5e4	IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node. As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly least-recently used nodes. This could be a performance issue for an application when that application: - Uses some long-lived buffers frequently. - Uses a large number of buffers once. - Hits the mmu_rb_handler cache size or pinned-page limits, forcing mmu_rb_handler cache entries to be evicted. In this case, the one-time use buffers cause the long-lived buffer entries to eventually filter to the end of the LRU list where hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived entry instead of evicting one of the one-time use entries. Fix this by inserting new mmu_rb_node at the tail of mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of mmu_rb_handler->lru_list when the mmu_rb_node is a hit in hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict from the head of mmu_rb_handler->lru_list instead of the tail. Fixes: `0636e9ab83` ("IB/hfi1: Add cache evict LRU list") Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-09 13:27:34 +03:00
Ehab Ababneh	cf0455f1a9	IB/hfi1: Suppress useless compiler warnings These warnings can cause build failure: In file included from ./include/trace/define_trace.h:102, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’: ./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct trace_event_raw_##call __maybe_unused entry; \ ^~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ In file included from ./include/trace/define_trace.h:102, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’: ./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct trace_event_raw_##call entry; \ ^~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ In file included from ./include/trace/define_trace.h:103, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’: ./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct hlist_head *head; \ ^~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ Solution adapted here is similar to the one in `fbbc95a49d` Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-09 13:27:34 +03:00
Dean Luick	d2590edc93	IB/hfi1: Remove trace newlines The hfi1_cdbg trace mechanism appends a newline. Remove trailing newlines from all format strings. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-09 13:27:34 +03:00
Selvin Xavier	f13bcef04b	RDMA/bnxt_re: Enable congestion control by default Enable Congesion control by default. Issue FW command enable the CC during driver load and disable it during unload. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	c682c6eda0	RDAM/bnxt_re: Use tlv apis while processing the slow path commands Use the new TLV APIs for existing slow path commands. The TLV APIs will be used to populate extended headers for some of the Firmware commands, which will be introduced in the patches that follow. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	0722f1f7bf	RDMA/bnxt_re: RoCE slow path TLV support Header file to support TLV encapsulated commands. These functions will be used by the driver in the follow up patches. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	ff015bcd21	RDMA/bnxt_re: Reduce number of argumets to control path command APIs Reducing the number of arguments to bnxt_qplib_rcfw_send_message by enclosing all its arguments into a command message structure. Use the same struct while passing the command information to send_message. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	e576adf583	RDMA/bnxt_re: Convert RCFW_CMD_PREP macro to static inline function Convert RCFW_CMD_PREP macro to static inline function. Also, remove the cmd_flags passed as none of the functions are using it. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	b400acee06	RDMA/bnxt_re: Remove HW queue mapping from RoCE Driver bnxt_en driver does the queue mapping for RoCE traffic. Removing the queue mapping from RoCE driver. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Selvin Xavier	a9a457f338	RDMA/bnxt_re: Update HW interface headers Updating the HW structures to the latest version. This is copied from the code maintained internally. No functionality changes in this patch. Code is re-organized to match the file maintained in the internal tree. Also, New HW interface structures are added, which will be used by the drivers in future. CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-04 09:17:21 +03:00
Tom Rix	e7706c4bbf	IB/qib: Remove unused cnt variable clang with W=1 reports drivers/infiniband/hw/qib/qib_file_ops.c:487:20: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] u32 tid, ctxttid, cnt, limit, tidcnt; ^ drivers/infiniband/hw/qib/qib_file_ops.c:1771:9: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] int i, cnt = 0, maxtid = ctxt_tidbase + dd->rcvtidcnt; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230330235800.1845815-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-03 15:46:53 +03:00
Tom Rix	081c27b3bc	RDMA/mlx5: Remove unused num_alloc_xa_entries variable clang with W=1 reports drivers/infiniband/hw/mlx5/devx.c:1996:6: error: variable 'num_alloc_xa_entries' set but not used [-Werror,-Wunused-but-set-variable] int num_alloc_xa_entries = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230330153607.1838750-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-03 15:46:47 +03:00
Jens Axboe	da67ba07b4	IB/qib: check for user backed iterator, not specific iterator type In preparation for switching single segment iterators to using ITER_UBUF, swap the check for whether we are user backed or not. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 08:12:29 -06:00
Jens Axboe	23ecdcd0c0	IB/hfi1: check for user backed iterator, not specific iterator type In preparation for switching single segment iterators to using ITER_UBUF, swap the check for whether we are user backed or not. While at it, move it outside the srcu locking area to clean up the code a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 08:12:29 -06:00
Jens Axboe	de4f5fed3f	iov_iter: add iter_iovec() helper This returns a pointer to the current iovec entry in the iterator. Only useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF and ITER_IOVEC identically for the first segment. Rename struct iov_iter->iov to iov_iter->__iov to find any potentially troublesome spots, and also to prevent anyone from adding new code that accesses iter->iov directly. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 08:12:29 -06:00
Tom Rix	cba968e33e	RDMA/ocrdma: remove unused discard_cnt variable clang with W=1 reports drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable 'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable] int discard_cnt = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 14:05:38 +03:00
Tom Rix	1b69f1e3d7	RDMA/bnxt_re: remove unused num_srqne_processed and num_cqne_processed variables clang with W=1 reports drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable 'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable] int num_srqne_processed = 0; ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable 'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable] int num_cqne_processed = 0; ^ These variables are not used so remove them. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 14:03:56 +03:00
Cai Huoqing	fc36ce35e9	RDMA/usnic: Remove redundant pci_clear_master Remove pci_clear_master to simplify the code, the bus-mastering is also cleared in do_pci_disable_device, like this: ./drivers/pci/pci.c:2197 static void do_pci_disable_device(struct pci_dev *dev) { u16 pci_command; pci_read_config_word(dev, PCI_COMMAND, &pci_command); if (pci_command & PCI_COMMAND_MASTER) { pci_command &= ~PCI_COMMAND_MASTER; pci_write_config_word(dev, PCI_COMMAND, pci_command); } pcibios_disable_device(dev); }. And dev->is_busmaster is set to 0 in pci_disable_device. Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 13:08:22 +03:00
Patrisious Haddad	d22467a71e	RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics Previously for switchdev only per device counters were supported. Currently we allocate counters for switchdev per port, which also includes the ports that belong to VF representors in order to expose them to users through the rdma tool, allowing the host to track the VFs statistics through their representors counters. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 10:09:23 +03:00
Selvin Xavier	d54bd5abf4	RDMA/bnxt_re: Add resize_cq support Add resize_cq verb support for user space CQs. Resize operation for kernel CQs are not supported now. Driver should free the current CQ only after user library polls for all the completions and switch to new CQ. So after the resize_cq is returned from the driver, user library polls for existing completions and store it as temporary data. Once library reaps all completions in the current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about the resize_cq completion. Adding a check for user CQs in driver's poll_cq and complete the resize operation for user CQs. Updating uverbs_cmd_mask with poll_cq to support this. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 10:03:43 +03:00
Cheng Xu	d649c638dc	RDMA/erdma: Use fixed hardware page size Hardware's page size is 4096, but the kernel's page size may vary. Driver should use hardware's page size when communicating with hardware. Fixes: `1550557717` ("RDMA/erdma: Add verbs implementation") Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:31:22 -03:00
Leon Romanovsky	602fb42057	Enable IB out-of-order by default in mlx5 This series from Or changes default of IB out-of-order feature and allows to the RDMA users to decide if they need to wait for completion for all segments or it is enough to wait for last segment completion only. Thanks Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-23 10:22:32 +02:00
Or Har-Toov	742948cc02	RDMA/mlx5: Disable out-of-order in integrity enabled QPs Set retry_mode to GO_BACK_N when qp is created with INTEGRITY_EN flag because out-of-order is not supported when doing HW offload of signature operations. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/362de42cdc7a541afa5b1fd0ec6ae706061764a2.1679230449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-23 10:21:13 +02:00
Yonatan Nachum	6dddd93938	RDMA/efa: Add data polling capability feature bit Add feature bit to existing device caps field. EFA supports data polling of 128 bytes blocks. The flag indicates that the NIC guarentees that a 128 byte aligned block is written in order, ie that observing the last 8 bits of the block mean the prior 127 bytes are also written. It is useful for "last data polling" acceleration techniques. Link: https://lore.kernel.org/r/20230219081328.10419-1-mrgolin@amazon.com Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Acked-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-22 14:41:58 -03:00
Cheng Xu	901d9d6241	RDMA/erdma: Minor refactor of device init flow After necessary configuration, driver should wait hardware finishing initialization. The wait sets at CMDQ related function though it has nothing to do with CMDQ. Refactor this part to make code cleaner. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-22 13:10:46 +02:00
Cheng Xu	72769dba6d	RDMA/erdma: Eliminate unnecessary casting of EQ doorbells Using void * to define EQ doorbell pointer can eliminate unnecessary casting when performing assignment. Also rename db_addr to db for a shorter name. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-22 13:10:46 +02:00
Cheng Xu	de19ec778c	RDMA/erdma: Unify byte ordering APIs usage Replace __be32_to_cpu/__cpu_to_be16 with be32_to_cpu/cpu_to_be16. And use be32_to_cpu_array to copy and swap byte order to hide the loop. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-22 13:10:46 +02:00
Cheng Xu	6bd1bca858	RDMA/erdma: Defer probing if netdevice can not be found ERDMA device may be probed before its associated netdevice, returning -EPROBE_DEFER allows OS try to probe erdma device later. Fixes: `d55e6fb480` ("RDMA/erdma: Add the erdma module") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-5-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-20 12:53:24 +02:00
Cheng Xu	0dd83a4d77	RDMA/erdma: Inline mtt entries into WQE if supported The max inline mtt count supported is ERDMA_MAX_INLINE_MTT_ENTRIES. When mr->mem.mtt_nents == ERDMA_MAX_INLINE_MTT_ENTRIES, inline mtt is also supported, fix it. Fixes: `1550557717` ("RDMA/erdma: Add verbs implementation") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-20 12:53:24 +02:00
Cheng Xu	6256aa9ae9	RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192 Max EQ depth of hardware is 32K, the current default EQ depth is too small for some applications, so change the default depth to 4096. Max send WRs the hardware can support is 8K, but the driver limits the value to 4K. Remove this limitation. Fixes: `be3cff0f24` ("RDMA/erdma: Add the hardware related definitions") Fixes: `db23ae64ca` ("RDMA/erdma: Add verbs header file") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-20 12:53:24 +02:00
Cheng Xu	3fe26c0493	RDMA/erdma: Fix some typos FAA is short for atomic fetch and add, not FAD. Fix this. Fixes: `0ca9c2e284` ("RDMA/erdma: Implement atomic operations support") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-20 12:53:24 +02:00
Maher Sanalla	88c9483faf	IB/mlx5: Add support for 400G_8X lane speed Currently, when driver queries PTYS to report which link speed is being used on its RoCE ports, it does not check the case of having 400Gbps transmitted over 8 lanes. Thus it fails to report the said speed and instead it defaults to report 10G over 4 lanes. Add a check for the said speed when querying PTYS and report it back correctly when needed. Fixes: `08e8676f16` ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-20 09:51:17 +02:00
Bjorn Helgaas	697d5cf073	IB/qib: Drop redundant pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since `f26e58bf6f` ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-19 16:16:06 +02:00
Bjorn Helgaas	512ed1199e	IB/hfi1: Drop redundant pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since `f26e58bf6f` ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-19 16:15:44 +02:00
Rohit Chavan	c4526fe2e4	RDMA/mlx5: Coding style fix reported by checkpatch Block comments should align the * on each line on line 2849 Avoid line continuations in quoted strings on line 3848 Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230319100847.5566-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-19 14:03:34 +02:00
Tatyana Nikolova	e4522c097e	RDMA/irdma: Add ipv4 check to irdma_find_listener() Add ipv4 check to irdma_find_listener(). Otherwise the function incorrectly finds and returns a listener with a different addr family for the zero IP addr, if a listener with a zero IP addr and the same port as the one searched for has already been created. Fixes: `146b9756f1` ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-19 11:37:56 +02:00

1 2 3 4 5 ...

8365 Commits