13945 Commits

Author SHA1 Message Date
Tom Rix
cba968e33e RDMA/ocrdma: remove unused discard_cnt variable
clang with W=1 reports
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable
  'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable]
        int discard_cnt = 0;
            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 14:05:38 +03:00
Tom Rix
1b69f1e3d7 RDMA/bnxt_re: remove unused num_srqne_processed and num_cqne_processed variables
clang with W=1 reports
drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable
  'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
        int num_srqne_processed = 0;
            ^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable
  'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
        int num_cqne_processed = 0;
            ^
These variables are not used so remove them.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 14:03:56 +03:00
Cai Huoqing
fc36ce35e9 RDMA/usnic: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
	u16 pci_command;

	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
	if (pci_command & PCI_COMMAND_MASTER) {
		pci_command &= ~PCI_COMMAND_MASTER;
		pci_write_config_word(dev, PCI_COMMAND, pci_command);
	}

	pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 13:08:22 +03:00
Patrisious Haddad
d22467a71e RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics
Previously for switchdev only per device counters were supported.

Currently we allocate counters for switchdev per port, which also
includes the ports that belong to VF representors in order to expose
them to users through the rdma tool, allowing the host to track the VFs
statistics through their representors counters.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 10:09:23 +03:00
Selvin Xavier
d54bd5abf4 RDMA/bnxt_re: Add resize_cq support
Add resize_cq verb support for user space CQs. Resize operation for
kernel CQs are not supported now.

Driver should free the current CQ only after user library polls
for all the completions and switch to new CQ. So after the resize_cq
is returned from the driver, user library polls for existing completions
and store it as temporary data. Once library reaps all completions in the
current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about
the resize_cq completion. Adding a check for user CQs in driver's
poll_cq and complete the resize operation for user CQs.
Updating uverbs_cmd_mask with poll_cq to support this.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 10:03:43 +03:00
Greg Kroah-Hartman
75a2d4226b driver core: class: mark the struct class for sysfs callbacks as constant
struct class should never be modified in a sysfs callback as there is
nothing in the structure to modify, and frankly, the structure is almost
never used in a sysfs callback, so mark it as constant to allow struct
class to be moved to read-only memory.

While we are touching all class sysfs callbacks also mark the attribute
as constant as it can not be modified.  The bonding code still uses this
structure so it can not be removed from the function callbacks.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steve French <sfrench@samba.org>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: linux-cifs@vger.kernel.org
Cc: linux-gpio@vger.kernel.org
Cc: linux-mtd@lists.infradead.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: netdev@vger.kernel.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29 07:54:58 +02:00
Martin K. Petersen
62d15dba0a Merge patch series "Constify most SCSI host templates"
Bart Van Assche <bvanassche@acm.org> says:

It helps humans and the compiler if it is made explicit that SCSI host
templates are not modified. Hence this patch series that constifies most
SCSI host templates. Please consider this patch series for the next merge
window.

Link: https://lore.kernel.org/r/20230322195515.1267197-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24 20:13:03 -04:00
Bart Van Assche
80602aca4f scsi: iscsi: Declare SCSI host template const
Make it explicit that the SCSI host template is not modified.

Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230322195515.1267197-50-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24 19:19:57 -04:00
Bart Van Assche
4281af9d9f scsi: RDMA/srp: Declare the SCSI host template const
Make it explicit that the SRP host template is not modified.

Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230322195515.1267197-7-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24 19:19:19 -04:00
Martin K. Petersen
ae2fb3cb0f Merge patch series "target: TMF and recovery fixes"
Mike Christie <michael.christie@oracle.com> says:

The following patches apply over Martin's 6.4 branches and Linus's tree.
They fix a couple regressions in iscsit that occur when there are TMRs
executing and a connection is closed. It also includes Dimitry's fixes in
related code paths for cmd cleanup when ERL2 is used and the write pending
hang during conn cleanup.

This version of the patchset brings it back to just regressions and fixes
for bugs we have a lot of users hitting. I'm going to fix isert and get it
hooked into iscsit properly in a second patchset, because this one was
getting so large. I've also moved my cleanup type of patches for a 3rd
patchset.

Link: https://lore.kernel.org/r/20230319015620.96006-1-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24 17:39:15 -04:00
Mike Christie
6d256bee60 scsi: target: iscsit: isert: Alloc per conn cmd counter
This has iscsit allocate a per conn cmd counter and converts iscsit/isert
to use it instead of the per session one.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230319015620.96006-5-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24 17:32:23 -04:00
Cheng Xu
d649c638dc RDMA/erdma: Use fixed hardware page size
Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.

Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:31:22 -03:00
Bob Pearson
d946716325 RDMA/rxe: Rewrite rxe_task.c
This patch is a major rewrite of the tasklet routines in rxe_task.c.  The
main motivation for this is the realization that the code violates the
safety of the qp pointer by correct reference counting.  When a tasklet is
scheduled from a verbs API the calling thread has a valid reference to the
qp and schedules the tasklet to run at a later time carrying a pointer to
the qp. Once the calling code returns however the qp can be destroyed at
any time. In order to correct this a reference to the qp must be taken
when the task is scheduled and held until it finishes running. This is
complicated by the tasklet library not alwys running a task that is
scheduled depending on whether someone else has scheduled it.

This patch moves the logic for deciding whether to run or schedule a task
outside of do_task() and guarantees that there is only one copy of the
task scheduled or running at a time.

Secondly the separate flags controlling teardown and draining of the task
are included in the task state machine and all references to the state are
protected by spinlocks to avoid consistency and memory barrier issues.

Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
f455a1bc97 RDMA/rxe: Make tasks schedule each other
Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but they
run the risk of causing deadlocks.

Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
960ebe97e5 RDMA/rxe: Remove __rxe_do_task()
The subroutine __rxe_do_task is not thread safe and it has no way to
guarantee that the tasks, which are designed with the assumption that they
are non-reentrant, are not reentered. All of its uses are non-performance
critical.

This patch replaces calls to __rxe_do_task with calls to
rxe_sched_task. It also removes irrelevant or unneeded if tests.

Instead of calling the task machinery a single call to the tasklet
function (rxe_requester, etc.) is sufficient to draing the queues if task
execution has been disabled or stopped.

Together these changes allow the removal of __rxe_do_task.

Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
a246aa2e8a RDMA/rxe: Remove qp reference counting in tasks
Currently each of the three tasklets requester, completer and responder in
the rxe driver take and release a reference to the qp argument at the
beginning and end of the subroutines. The caller passing in the qp
argument should be responsible for holding a reference to qp so these are
not required. Further doing so breaks the qp cleanup code in
rxe_qp_do_cleanup which calls these routines after all the references have
been dropped so they cannot drain the packet and work request queues as
intended.

In fact if these routines are deferred by calling tasklet_schedule there
is no guarantee that the calling code does have a qp reference.  That is a
bug in rxe_task.c which will be fixed later in this series.

Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
fbdeb828a2 RDMA/rxe: Cleanup error state handling in rxe_comp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. Make the same as rxe_resp.c

Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
49dc9c1f0c RDMA/rxe: Cleanup reset state handling in rxe_resp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. The error state does about the same thing as the others
but has code spread all over.

This patch combines them in a cleaner way.

Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
3946fc2a42 RDMA/rxe: Convert tasklet args to queue pairs
Originally is was thought that the tasklet machinery in rxe_task.c would
be used in other applications but that has not happened for years. This
patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to
the tasklet calls. This change will have no affect on performance but may
make the code a little clearer.

Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:14:38 -03:00
Bob Pearson
5bf944f241 RDMA/rxe: Add error messages
This patch adds error and debug messages so that every interaction
with rdma-core through a verbs API call or a completion error return
will generate at least one error message backed up by debug messages
with more detail.

With dynamic debugging one can follow up after seeing an error message
by turning on the appropriate debug messages.

Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9ac01f434a RDMA/rxe: Extend dbg log messages to err and info
Extend the dbg log messages (e.g. rxe_dbg_xxx) to include
err and info types. rxe.c is modified to use these new log
messages as examples.

Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
a9fb328721 RDMA/rxe: Change rxe_dbg to rxe_dbg_dev
Replace the name rxe_dbg with rxe_dbg_dev which better matches
the remaining rxe_dbg_xxx macros for debug messages with a
rxe device parameter. Reuse the name rxe_dbg for debug messages
which do not have a rxe device parameter.

Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9168d125ea RDMA/rxe: Replace exists by rxe in rxe.c
'exists' looks like a boolean. This patch replaces it by the
normal name used for the rxe device, 'rxe', which should be a
little less confusing. The second rxe_dbg() message is
incorrect since rxe is known to be NULL and this will cause a
seg fault if this message were ever sent. Replace it by pr_debug
for the moment.

Fixes: c6aba5ea0055 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c")
Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:48 -03:00
Gustavo A. R. Silva
aa4d540b41 RDMA/core: Fix multiple -Warray-bounds warnings
GCC-13 (and Clang)[1] does not like to access a partially allocated
object, since it cannot reason about it for bounds checking.

In this case 140 bytes are allocated for an object of type struct
ib_umad_packet:

        packet = kzalloc(sizeof(*packet) + IB_MGMT_RMPP_HDR, GFP_KERNEL);

However, notice that sizeof(*packet) is only 104 bytes:

struct ib_umad_packet {
        struct ib_mad_send_buf *   msg;                  /*     0     8 */
        struct ib_mad_recv_wc *    recv_wc;              /*     8     8 */
        struct list_head           list;                 /*    16    16 */
        int                        length;               /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct ib_user_mad         mad __attribute__((__aligned__(8))); /*    40    64 */

        /* size: 104, cachelines: 2, members: 5 */
        /* sum members: 100, holes: 1, sum holes: 4 */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 4 */
        /* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));

and 36 bytes extra bytes are allocated for a flexible-array member in
struct ib_user_mad:

include/rdma/ib_mad.h:
120 enum {
...
123         IB_MGMT_RMPP_HDR = 36,
... }

struct ib_user_mad {
        struct ib_user_mad_hdr     hdr;                  /*     0    64 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        __u64                      data[] __attribute__((__aligned__(8))); /*    64     0 */

        /* size: 64, cachelines: 1, members: 2 */
        /* forced alignments: 1 */
} __attribute__((__aligned__(8)));

So we have sizeof(*packet) + IB_MGMT_RMPP_HDR == 140 bytes

Then the address of the flex-array member (for which only 36 bytes were
allocated) is casted and copied into a pointer to struct ib_rmpp_mad,
which, in turn, is of size 256 bytes:

        rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;

struct ib_rmpp_mad {
        struct ib_mad_hdr          mad_hdr;              /*     0    24 */
        struct ib_rmpp_hdr         rmpp_hdr;             /*    24    12 */
        u8                         data[220];            /*    36   220 */

        /* size: 256, cachelines: 4, members: 3 */
};

The thing is that those 36 bytes allocated for flex-array member data
in struct ib_user_mad onlly account for the size of both struct ib_mad_hdr
and struct ib_rmpp_hdr, but nothing is left for array u8 data[220].
So, the compiler is legitimately complaining about accessing an object
for which not enough memory was allocated.

Apparently, the only members of struct ib_rmpp_mad that are relevant
(that are actually being used) in function ib_umad_write() are mad_hdr
and rmpp_hdr. So, instead of casting packet->mad.data to
(struct ib_rmpp_mad *) create a new structure

struct ib_rmpp_mad_hdr {
        struct ib_mad_hdr       mad_hdr;
        struct ib_rmpp_hdr      rmpp_hdr;
} __packed;

and cast packet->mad.data to (struct ib_rmpp_mad_hdr *).

Notice that

        IB_MGMT_RMPP_HDR == sizeof(struct ib_rmpp_mad_hdr) == 36 bytes

Refactor the rest of the code, accordingly.

Fix the following warnings seen under GCC-13 and -Warray-bounds:
drivers/infiniband/core/user_mad.c:564:50: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:566:42: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:618:25: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:622:44: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]

Link: https://github.com/KSPP/linux/issues/273
Link: https://godbolt.org/z/oYWaGM4Yb [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZBpB91qQcB10m3Fw@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 12:19:01 +02:00
Mark Zhang
58e84f6b3e RDMA/cma: Allow UD qp_type to join multicast only
As for multicast:
- The SIDR is the only mode that makes sense;
- Besides PS_UDP, other port spaces like PS_IB is also allowed, as it is
  UD compatible. In this case qkey also needs to be set [1].

This patch allows only UD qp_type to join multicast, and set qkey to
default if it's not set, to fix an uninit-value error: the ib->rec.qkey
field is accessed without being initialized.

=====================================================
BUG: KMSAN: uninit-value in cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
BUG: KMSAN: uninit-value in cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
 cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
 cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
 cma_iboe_join_multicast drivers/infiniband/core/cma.c:4782 [inline]
 rdma_join_multicast+0x2b83/0x30a0 drivers/infiniband/core/cma.c:4814
 ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479
 ucma_join_multicast+0x1e3/0x250 drivers/infiniband/core/ucma.c:1546
 ucma_write+0x639/0x6d0 drivers/infiniband/core/ucma.c:1732
 vfs_write+0x8ce/0x2030 fs/read_write.c:588
 ksys_write+0x28c/0x520 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __ia32_sys_write+0xdb/0x120 fs/read_write.c:652
 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
 __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Local variable ib.i created at:
cma_iboe_join_multicast drivers/infiniband/core/cma.c:4737 [inline]
rdma_join_multicast+0x586/0x30a0 drivers/infiniband/core/cma.c:4814
ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479

CPU: 0 PID: 29874 Comm: syz-executor.3 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================

[1] https://lore.kernel.org/linux-rdma/20220117183832.GD84788@nvidia.com/

Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Reported-by: syzbot+8fcbb77276d43cc8b693@syzkaller.appspotmail.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/58a4a98323b5e6b1282e83f6b76960d06e43b9fa.1679309909.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 10:28:59 +02:00
Leon Romanovsky
602fb42057 Enable IB out-of-order by default in mlx5
This series from Or changes default of IB out-of-order feature and
allows to the RDMA users to decide if they need to wait for completion
for all segments or it is enough to wait for last segment completion only.

Thanks

Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 10:22:32 +02:00
Or Har-Toov
742948cc02 RDMA/mlx5: Disable out-of-order in integrity enabled QPs
Set retry_mode to GO_BACK_N when qp is created with INTEGRITY_EN flag
because out-of-order is not supported when doing HW offload of signature
operations.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/362de42cdc7a541afa5b1fd0ec6ae706061764a2.1679230449.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 10:21:13 +02:00
Yonatan Nachum
6dddd93938 RDMA/efa: Add data polling capability feature bit
Add feature bit to existing device caps field. EFA supports data polling
of 128 bytes blocks.

The flag indicates that the NIC guarentees that a 128 byte aligned block
is written in order, ie that observing the last 8 bits of the block mean
the prior 127 bytes are also written.

It is useful for "last data polling" acceleration techniques.

Link: https://lore.kernel.org/r/20230219081328.10419-1-mrgolin@amazon.com
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Acked-by: Gal Pressman <gal.pressman@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-22 14:41:58 -03:00
Cheng Xu
901d9d6241 RDMA/erdma: Minor refactor of device init flow
After necessary configuration, driver should wait hardware finishing
initialization. The wait sets at CMDQ related function though it has
nothing to do with CMDQ. Refactor this part to make code cleaner.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22 13:10:46 +02:00
Cheng Xu
72769dba6d RDMA/erdma: Eliminate unnecessary casting of EQ doorbells
Using void * to define EQ doorbell pointer can eliminate unnecessary
casting when performing assignment. Also rename *db_addr* to *db* for a
shorter name.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22 13:10:46 +02:00
Cheng Xu
de19ec778c RDMA/erdma: Unify byte ordering APIs usage
Replace __be32_to_cpu/__cpu_to_be16 with be32_to_cpu/cpu_to_be16.
And use be32_to_cpu_array to copy and swap byte order to hide the
loop.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22 13:10:46 +02:00
Cheng Xu
6bd1bca858 RDMA/erdma: Defer probing if netdevice can not be found
ERDMA device may be probed before its associated netdevice, returning
-EPROBE_DEFER allows OS try to probe erdma device later.

Fixes: d55e6fb4803c ("RDMA/erdma: Add the erdma module")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-5-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20 12:53:24 +02:00
Cheng Xu
0dd83a4d77 RDMA/erdma: Inline mtt entries into WQE if supported
The max inline mtt count supported is ERDMA_MAX_INLINE_MTT_ENTRIES.
When mr->mem.mtt_nents == ERDMA_MAX_INLINE_MTT_ENTRIES, inline mtt
is also supported, fix it.

Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20 12:53:24 +02:00
Cheng Xu
6256aa9ae9 RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
Max EQ depth of hardware is 32K, the current default EQ depth is too small
for some applications, so change the default depth to 4096.
Max send WRs the hardware can support is 8K, but the driver limits the
value to 4K. Remove this limitation.

Fixes: be3cff0f242d ("RDMA/erdma: Add the hardware related definitions")
Fixes: db23ae64caac ("RDMA/erdma: Add verbs header file")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20 12:53:24 +02:00
Cheng Xu
3fe26c0493 RDMA/erdma: Fix some typos
FAA is short for atomic fetch and add, not FAD. Fix this.

Fixes: 0ca9c2e2844a ("RDMA/erdma: Implement atomic operations support")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20 12:53:24 +02:00
Maher Sanalla
88c9483faf IB/mlx5: Add support for 400G_8X lane speed
Currently, when driver queries PTYS to report which link speed is being
used on its RoCE ports, it does not check the case of having 400Gbps
transmitted over 8 lanes. Thus it fails to report the said speed and
instead it defaults to report 10G over 4 lanes.

Add a check for the said speed when querying PTYS and report it back
correctly when needed.

Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20 09:51:17 +02:00
Jason Baron
7ce9372909 dyndbg: cleanup dynamic usage in ib_srp.c
Currently, in dynamic_debug.h we only provide
DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH()
definitions if CONFIG_DYNAMIC_CORE is enabled. Thus, drivers
such as infiniband srp (see: drivers/infiniband/ulp/srp/ib_srp.c)
must provide their own definitions for !CONFIG_DYNAMIC_CORE.

Thus, let's move this !CONFIG_DYNAMIC_CORE case into dynamic_debug.h.
However, the dynamic debug interfaces should really only be defined
if CONFIG_DYNAMIC_DEBUG is set or CONFIG_DYNAMIC_CORE is set along
with DYNAMIC_DEBUG_MODULE, (see:
Documentation/admin-guide/dynamic-debug-howto.rst). Thus, the
undefined case becomes: !((CONFIG_DYNAMIC_DEBUG ||
(CONFIG_DYNAMIC_CORE && DYNAMIC_DEBUG_MODULE)).
With those changes in place, we can remove the !CONFIG_DYNAMIC_CORE
case from ib_srp.c

This change was prompted by a build breakeage in ib_srp.c stemming
from the inclusion of dynamic_debug.h unconditionally in module.h, due
to commit 7deabd674988 ("dyndbg: use the module notifier callbacks").
In that case, if we have CONFIG_DYNAMIC_CORE=y and
CONFIG_DYNAMIC_DEBUG=n then the definitions for
DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH() are defined
once in ib_srp.c and then again in the dynamic_debug.h. This had been
working prior to the above referenced commit because dynamic_debug.h
was only pulled into ib_srp.c conditinally via printk.h if
CONFIG_DYNAMIC_DEBUG was set.

Also, the exported functions in lib/dynamic_debug.c itself may
not have a prototype if CONFIG_DYNAMIC_DEBUG=n and
CONFIG_DYNAMIC_CORE=y. This would trigger the -Wmissing-prototypes
warning.

The exported functions are behind (include/linux/dynamic_debug.h):

if defined(CONFIG_DYNAMIC_DEBUG) || \
 (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))

Thus, by adding -DDYNAMIC_CONFIG_MODULE to the lib/Makefile we
can ensure that the exported functions have a prototype in all cases,
since lib/dynamic_debug.c is built whenever
CONFIG_DYNAMIC_DEBUG_CORE=y.

Fixes: 7deabd674988 ("dyndbg: use the module notifier callbacks")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303071444.sIbZTDCy-lkp@intel.com/
Signed-off-by: Jason Baron <jbaron@akamai.com>
[mcgrof: adjust commit log, and remove urldefense from URL]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-19 13:25:20 -07:00
Bjorn Helgaas
697d5cf073 IB/qib: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages.  Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.

Note that this only controls ERR_* Messages from the device.  An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 16:16:06 +02:00
Bjorn Helgaas
512ed1199e IB/hfi1: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages.  Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.

Note that this only controls ERR_* Messages from the device.  An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 16:15:44 +02:00
Rohit Chavan
c4526fe2e4 RDMA/mlx5: Coding style fix reported by checkpatch
Block comments should align the * on each line on line 2849
Avoid line continuations in quoted strings on line 3848

Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230319100847.5566-1-roheetchavan@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 14:03:34 +02:00
Tatyana Nikolova
e4522c097e RDMA/irdma: Add ipv4 check to irdma_find_listener()
Add ipv4 check to irdma_find_listener(). Otherwise the function
incorrectly finds and returns a listener with a different addr family for
the zero IP addr, if a listener with a zero IP addr and the same port as
the one searched for has already been created.

Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-5-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:37:56 +02:00
Mustafa Ismail
8385a875c9 RDMA/irdma: Increase iWARP CM default rexmit count
When running perftest with large number of connections in iWARP mode, the
passive side could be slow to respond. Increase the rexmit counter default
to allow scaling connections.

Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:37:56 +02:00
Mustafa Ismail
b69a6979db RDMA/irdma: Fix memory leak of PBLE objects
On rmmod of irdma, the PBLE object memory is not being freed. PBLE object
memory are not statically pre-allocated at function initialization time
unlike other HMC objects. PBLEs objects and the Segment Descriptors (SD)
for it can be dynamically allocated during scale up and SD's remain
allocated till function deinitialization.

Fix this leak by adding IRDMA_HMC_IW_PBLE to the iw_hmc_obj_types[] table
and skip pbles in irdma_create_hmc_obj but not in irdma_del_hmc_objects().

Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:37:55 +02:00
Mustafa Ismail
30ed9ee9a1 RDMA/irdma: Do not generate SW completions for NOPs
Currently, artificial SW completions are generated for NOP wqes which can
generate unexpected completions with wr_id = 0. Skip the generation of
artificial completions for NOPs.

Fixes: 81091d7696ae ("RDMA/irdma: Add SW mechanism to generate completions on error")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:37:55 +02:00
Sindhu Devale
cc8997c94b RDMA/irdma: Refactor PBLE functions
Refactor PBLE functions using a bit mask to represent the PBLE level
desired versus 2 parameters use_pble and lvl_one_only which makes the
code confusing.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-5-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:36:33 +02:00
Michal Swiatkowski
99f96b4552 RDMA/irdma: Change name of interrupts
Add more information in interrupt names.

Before this patch it was:
irdma
CEQ
CEQ
...

Now:
irdma-0000:18:00.0-AEQ
irdma-0000:18:00.0-CEQ-0
irdma-0000:18:00.0-CEQ-1
...

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Suggested-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:36:26 +02:00
Tatyana Nikolova
0219ad5d3a RDMA/irdma: Remove a redundant irdma_arp_table() call
Remove a redundant function call in irdma_modify_qp_roce, since
irdma_arp_table() with IRDMA_ARP_RESOLVE action is called after the if/else
ipv check as part of irdma_add_arp().

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:36:03 +02:00
Krzysztof Czurylo
5a711e5807 RDMA/irdma: Refactor HW statistics
Refactor HW statistics which,

- Unifies HW statistics support for all HW generations.

- Unifies support of 32- and 64-bit counters.

- Removes duplicated code and simplifies implementation.

- Fixes roll-over handling.

- Removes unneeded last_hw_stats.

With new implementation, there is no separate handling and no separate
arrays for 32- and 64-bit counters (offsets, regs, values). Instead,
there is a HW stats map array for each HW revision, which defines
HW-specific width and location of each counter in the statistics buffer.

Once the statistics are gathered (either via CQP op, or by reading HW
registers), counter values are extracted from the statistics buffer using
the stats map and the delta between the last and new values is computed.
Finally, the counter values in rdma_hw_stats are incremented by those
deltas.

From the OS perspective, all the counters are 64-bit and their order in
rdma_hw_stats->value[] array, as well as in irdma_hw_stat_names[], is the
same for all HW gens.  New statistics should always be added at the end.

Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:36:03 +02:00
Ira Weiny
6ea0c2de31 RDMA/qib: Remove deprecated kmap() call
kmap() has been deprecated in favor of the kmap_local_page() call.
kmap_local_page() is thread local.

In the sdma coalesce case the page allocated is potentially free'ed in a
different context through qib_sdma_get_complete() ->
qib_user_sdma_make_progress().  The use of kmap_local_page() is
inappropriate in this call path.  However, the page is allocated using
GFP_KERNEL and will never be from highmem.

Remove the use of kmap calls and use page_address() in this case.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20230217-kmap-qib-v1-1-e5a6fde167e0@intel.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:35:53 +02:00
Dan Carpenter
d50b3c73f1 RDMA/mlx4: Prevent shift wrapping in set_user_sq_size()
The ucmd->log_sq_bb_count variable is controlled by the user so this
shift can wrap.  Fix it by using check_shl_overflow() in the same way
that it was done in commit 515f60004ed9 ("RDMA/hns: Prevent undefined
behavior in hns_roce_set_user_sq_size()").

Fixes: 839041329fd3 ("IB/mlx4: Sanity check userspace send queue sizes")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/a8dfbd1d-c019-4556-930b-bab1ded73b10@kili.mountain
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19 11:35:46 +02:00