11458 Commits

Author SHA1 Message Date
Danil Kipnis
09e0dbbeed RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
In order to avoid all the clients to start reconnecting at the same time
schedule the reconnect dwork with a random jitter of +[0,8] seconds.

Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200724111508.15734-2-haris.iqbal@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:26:53 -03:00
Leon Romanovsky
81530ab08e RDMA/mlx5: Allow providing extra scatter CQE QP flag
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.

Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.

Existing user applications are failing on this new validity check.

Fixes: 90ecb37a751b ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f76 ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/20200728120255.805733-1-leon@kernel.org
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:19:01 -03:00
Jason Gunthorpe
f6a9d47ae6 RDMA/cma: Execute rdma_cm destruction from a handler properly
When a rdma_cm_id needs to be destroyed after a handler callback fails,
part of the destruction pattern is open coded into each call site.

Unfortunately the blind assignment to state discards important information
needed to do cma_cancel_operation(). This results in active operations
being left running after rdma_destroy_id() completes, and the
use-after-free bugs from KASAN.

Consolidate this entire pattern into destroy_id_handler_unlock() and
manage the locking correctly. The state should be set to
RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher
handlers are called.

Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org
Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com
Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
cc9c037343 RDMA/cma: Remove unneeded locking for req paths
The REQ flows are concerned that once the handler is called on the new
cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at
any time.

However, this is not true, while the ULP can call rdma_destroy_id(), it
immediately blocks on the handler_mutex which prevents anything harmful
from running concurrently.

Remove the confusing extra locking and refcounts and make the
handler_mutex protecting state during destroy more clear.

Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
3647a28de1 RDMA/cma: Using the standard locking pattern when delivering the removal event
Whenever an event is delivered to the handler it should be done under the
handler_mutex and upon any non-zero return from the handler it should
trigger destruction of the cm_id.

cma_process_remove() skips some steps here, it is not necessarily wrong
since the state change should prevent any races, but it is confusing and
unnecessary.

Follow the standard pattern here, with the slight twist that the
transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation().

Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
d54f23c09e RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
cma_process_remove() triggers an unconditional rdma_destroy_id() for
internal_id's and skips the event deliver and transition through
RDMA_CM_DEVICE_REMOVAL.

This is confusing and unnecessary. internal_id always has
cma_listen_handler() as the handler, have it catch the
RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal.

This way the FSM sequence never skips the DEVICE_REMOVAL case and the
logic in this hard to test area is simplified.

Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:01 -03:00
Gal Pressman
d4f9cb5c5b RDMA/efa: Add EFA 0xefa1 PCI ID
Add support for 0xefa1 devices.

Link: https://lore.kernel.org/r/20200722140312.3651-5-galpress@amazon.com
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:40 -03:00
Gal Pressman
a5d87b6985 RDMA/efa: User/kernel compatibility handshake mechanism
Introduce a mechanism that performs an handshake between the userspace
provider and kernel driver which verifies that the user supports all
required features in order to operate correctly.

The handshake verifies the needed functionality by comparing the reported
device caps and the provider caps. If the device reports a non-zero
capability the appropriate comp mask is required from the userspace
provider in order to allocate the context.

Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:40 -03:00
Gal Pressman
da2924bdca RDMA/efa: Expose minimum SQ size
The device reports the minimum SQ size required for creation.

This patch queries the min SQ size and reports it back to the userspace
library.

Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:39 -03:00
Gal Pressman
556c811f24 RDMA/efa: Expose maximum TX doorbell batch
The device reports the maximum number of bytes to be written before
ringing the doorbell (zero means unlimited).

This patch queries the max batch size and reports it back to the userspace
library.

Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:39 -03:00
Yamin Friedman
c804af2c1d IB/srpt: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  This
provides the advantage of improved efficiency handling interrupts.

Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:32 -03:00
Yamin Friedman
c6e6630723 IB/isert: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  Since
this provides similar functionality to iser_comp it has been removed.

Now there is no reason to allocate very large CQs when the driver is
loaded while gaining the advantage of shared CQs. Previously when a single
connection was opened a CQ was opened for every core with enough space for
eight connections, this is a very large overhead that in most cases will
not be utilized.

Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:31 -03:00
Yamin Friedman
d56a7852ec IB/iser: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  Since
this provides similar functionality to iser_comp it has been removed. Now
there is no reason to allocate very large CQs when the driver is loaded
while gaining the advantage of shared CQs.

Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Acked-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:31 -03:00
Leon Romanovsky
71cab8ef5c RDMA/mlx5: Delete unreachable code
Delete two occurrences of unreachable code discovered by the Coverity.

Link: https://lore.kernel.org/r/20200727095746.495915-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28 16:25:37 -03:00
Colin Ian King
3540669761 qed: fix assignment of n_rq_elems to incorrect params field
Currently n_rq_elems is being assigned to params.elem_size instead of the
field params.num_elems.  Coverity is detecting this as a double assingment
to params.elem_size and reporting this as an usused value on the first
assignment.  Fix this.

Addresses-Coverity: ("Unused value")
Fixes: b6db3f71c976 ("qed: simplify chain allocation with init params struct")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:46:28 -07:00
Michael Chan
bfc6e5fbcb bnxt_en: Update firmware interface to 1.10.1.54.
Main changes are 200G support and fixing the definitions of discard and
error counters to match the hardware definitions.

Because the HWRM_PORT_PHY_QCFG message size has now exceeded the max.
encapsulated response message size of 96 bytes from the PF to the VF,
we now need to cap this message to 96 bytes for forwarding.  The forwarded
response only needs to contain the basic link status and speed information
and can be capped without adding the new information.

v2: Fix bnxt_re compile error.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 11:47:33 -07:00
Li Heng
47fda651d5 RDMA/core: Fix return error value in _ib_modify_qp() to negative
The error codes in _ib_modify_qp() are supposed to be negative errno.

Fixes: 7a5c938b9ed0 ("IB/core: Check for rdma_protocol_ib only after validating port_num")
Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Li Heng <liheng40@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 12:13:34 -03:00
Jason Gunthorpe
5351a56b1a RDMA/mlx5: Fix prefetch memory leak if get_prefetchable_mr fails
destroy_prefetch_work() must always be called if the work is not going
to be queued. The num_sge also should have been set to i, not i-1
which avoids the condition where it shouldn't have been called in the
first place.

Cc: stable@vger.kernel.org
Fixes: fb985e278a30 ("RDMA/mlx5: Use SRCU properly in ODP prefetch")
Link: https://lore.kernel.org/r/20200727095712.495652-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:50:20 -03:00
Jason Gunthorpe
31142a4ba6 RDMA/cm: Add min length checks to user structure copies
These are missing throughout ucma, it harmlessly copies garbage from
userspace, but in this new code which uses min to compute the copy length
it can result in uninitialized stack memory. Check for minimum length at
the very start.

  BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091
  CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1df/0x240 lib/dump_stack.c:118
   kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
   __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
   ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091
   ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764
   do_loop_readv_writev fs/read_write.c:737 [inline]
   do_iter_write+0x710/0xdc0 fs/read_write.c:1020
   vfs_writev fs/read_write.c:1091 [inline]
   do_writev+0x42d/0x8f0 fs/read_write.c:1134
   __do_sys_writev fs/read_write.c:1207 [inline]
   __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204
   __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204
   do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 34e2ab57a911 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters")
Fixes: 0cb15372a615 ("RDMA/cma: Connect ECE to rdma_accept")
Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com
Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com
Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:50:00 -03:00
Jason Gunthorpe
7923774368 Merge branch 'mlx5_uar' into rdma.git /for-next
Meir Lichtinger says:

====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_uar':
  RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
  RDMA/mlx5: Use MLX5_SET macro instead of local structure
  RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
2020-07-27 11:44:36 -03:00
Meir Lichtinger
896ec97353 RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
2224635938 RDMA/mlx5: Use MLX5_SET macro instead of local structure
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.

Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
David S. Miller
a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Gustavo A. R. Silva
6f24b15925 IB/hfi1: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the
new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7-rc7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Link: https://lore.kernel.org/r/20200721133455.GA14363@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:59:55 -03:00
Leon Romanovsky
9b8d846924 RDMA/uverbs: Silence shiftTooManyBitsSigned warning
Fix reported by kbuild warning.

   drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
    BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
                                                 ^
Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Leon Romanovsky
29f3fe1d68 RDMA/uverbs: Remove redundant assignments
The kbuild reported the following warning, so clean whole uverbs_cmd.c
file.

   drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment]
    ret = uverbs_request(attrs, &cmd, sizeof(cmd));
        ^
   drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used.
    int    ret = -EINVAL;
   ^

Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Maor Gottlieb
d4d7f59643 RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flow
According to the locking scheme, mlx5_ib_update_xlt() should be called
with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the
below WARN from lockdep_assert_held():

  WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
  CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G        W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00
  RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940
  RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100
  RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001
  R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec
  R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000
  FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib]
   pagefault_mr+0x315/0x440 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib]
   process_one_work+0x215/0x5c0
   worker_thread+0x3c/0x380
   ? process_one_work+0x5c0/0x5c0
   kthread+0x133/0x150
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Hold the SRCU during prefetch, even though it strictly isn't needed since
prefetch is holding the num_deferred_work it does make it easier to reason
about.

Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:44:06 -03:00
Leon Romanovsky
16e51f78a9 RDMA/core: Update write interface to use automatic object lifetime
The automatic object lifetime model allows us to change the write()
interface to have the same logic as the ioctl() path. Update the
create/alloc functions to be in the following format, so the code flow
will be the same:

 * Allocate objects
 * Initialize them
 * Call to the drivers, this is last step that is allowed to fail
 * Finalize object
 * Return response and allow to core code to handle abort/commit
   respectively.

Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:22:30 -03:00
Leon Romanovsky
0f63ef1dd5 RDMA/core: Align abort/commit object scheme for write() and ioctl() paths
Create the same logic flow for the write() interface as we have for the
ioctl() path by making sure that the object is committed or aborted
automatically after HW object creation.

Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:57:22 -03:00
Maor Gottlieb
c94e272b57 RDMA/mlx5: Allow SQ modification
Currently the SQ is set to a ready state when the RAW QP is modified to
INIT.  When the TIS is modified, e.g. to change the lag_tx_affinity, then
SQs which are already in the ready state will not be affected.

Open a window to modify the SQ behavior by setting the SQ as ready only
when QP was modified to RTS.

Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:49:19 -03:00
Alexander Lobakin
155065866b qed: add support for different page sizes for chains
Extend current infrastructure to store chain page size in a struct
and use it in all functions instead of fixed QED_CHAIN_PAGE_SIZE.
Its value remains the default one, but can be overridden in
qed_chain_init_params before chain allocation.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
b6db3f71c9 qed: simplify chain allocation with init params struct
To simplify qed_chain_alloc() prototype and call sites, introduce struct
qed_chain_init_params to specify chain params, and pass a pointer to
filled struct to the actual qed_chain_alloc() instead of a long list
of separate arguments.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Jason Gunthorpe
a862192e92 RDMA/mlx5: Prevent prefetch from racing with implicit destruction
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run
concurrently with destruction of the implicit MR. The num_deferred_work
was intended to serialize this, but there is a race:

       CPU0                                          CPU1

    mlx5_ib_free_implicit_mr()
      xa_erase(odp_mkeys)
      synchronize_srcu()
      __xa_erase(implicit_children)
                                      mlx5_ib_prefetch_mr_work()
                                        pagefault_mr()
                                         pagefault_implicit_mr()
                                          implicit_get_child_mr()
                                           xa_cmpxchg()
                                        atomic_dec_and_test(num_deferred_mr)
      wait_event(imr->q_deferred_work)
      ib_umem_odp_release(odp_imr)
        kfree(odp_imr)

At this point in mlx5_ib_free_implicit_mr() the implicit_children list is
supposed to be empty forever so that destroy_unused_implicit_child_mr()
and related are not and will not be running.

Since it is not empty the destroy_unused_implicit_child_mr() flow ends up
touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the
imr parent.

The solution is to flush out the prefetch wq by driving num_deferred_work
to zero after creation of new prefetch work is blocked.

Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065435.130722-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-21 13:51:35 -03:00
Devesh Sharma
2bb3c32c5c RDMA/bnxt_re: Change wr posting logic to accommodate variable wqes
Modifying the post-send and post-recv to initialize the wqes slot by slot
dynamically depending on the number of max sges requested by consumer at
the time of QP creation.

Changed the QP creation logic to determine the size of SQ and RQ in 16B
slots based on the number of wqe and number of SGEs requested by consumer

Link: https://lore.kernel.org/r/1594822619-4098-6-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
54ace98443 RDMA/bnxt_re: Add helper data structures
Adding few helper data structure which are useful to initialize hardware
send wqe in variable wqe mode.

Adding a qp flag in HSI to indicate variable wqe is enabled for this qp.

Link: https://lore.kernel.org/r/1594822619-4098-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
5ac5396a6c RDMA/bnxt_re: Pull psn buffer dynamically based on prod
Changing the PSN management memory buffers from statically initialized to
dynamic pull scheme.

During create qp only the start pointers are initialized and during
post-send the psn buffer is pulled based on current producer index.

Adjusting post_send code to accommodate dynamic psn-pull and changing
post_recv code to match post-send code wrt pseudo flush wqe generation.

Link: https://lore.kernel.org/r/1594822619-4098-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
159fb4ceac RDMA/bnxt_re: introduce a function to allocate swq
The bnxt_re driver now allocates shadow sq and rq to maintain per wqe
wr_id and few other flags required to support variable wqe. Segregated the
allocation of shadow queue in a separate function and adjust the cqe
polling logic. The new polling logic is based on shadow queue indices.

Link: https://lore.kernel.org/r/1594822619-4098-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
1da968e0ef RDMA/bnxt_re: introduce wqe mode to select execution path
The bnxt_re driver need to decide on how much SQ and RQ memory should to
be allocated and which wqe posting/polling algorithm to use.

Making changes to set the wqe-mode to a default value during device
registration sequence. The wqe-mode is passed to the lower layer driver as
well. Going forward in the lower layer driver wqe-mode will be used to
decide execution path. Initializing the wqe-mode to static wqe type for
now.

Link: https://lore.kernel.org/r/1594822619-4098-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Kamal Heib
ca4beeee98 RDMA/qedr: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed from the common ops and moved to
the RoCE only ops within the qedr driver.

Link: https://lore.kernel.org/r/20200714183414.61069-8-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:17 -03:00
Kamal Heib
c1c5e9fd3a RDMA/i40iw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-7-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ce07f1c6a8 RDMA/cxgb4: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-6-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
c4995bd354 RDMA/siw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ab75a6cb8c RDMA/core: Remove query_pkey from the mandatory ops
The query_pkey() isn't mandatory for the iwarp providers, so remove this
requirement from the RDMA core.

Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
6f38efca9b RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is set
Allocate the pkey cache only if the pkey_tbl_len is set by the provider,
also add checks to avoid accessing the pkey cache when it not initialized.

Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
90efc8b2d4 RDMA/core: Expose pkeys sysfs files only if pkey_tbl_len is set
Expose the pkeys sysfs files only if the pkey_tbl_len is set by the
providers.

Link: https://lore.kernel.org/r/20200714183414.61069-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Christoph Hellwig
2f9237d4f6 dma-mapping: make support for dma ops optional
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19 09:29:23 +02:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Mikhail Malygin
5f0b2a6093 RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue
rxe_post_send_kernel() iterates over linked list of wr's, until the
wr->next ptr is NULL.  However if we've got an interrupt after last wr is
posted, control may be returned to the code after send completion callback
is executed and wr memory is freed.

As a result, wr->next pointer may contain incorrect value leading to
panic. Store the wr->next on the stack before posting it.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200716190340.23453-1-m.malygin@yadro.com
Signed-off-by: Mikhail Malygin <m.malygin@yadro.com>
Signed-off-by: Sergey Kojushev <s.kojushev@yadro.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:12:07 -03:00
Michal Kalderon
eb7f84e379 RDMA/qedr: Add EDPM max size to alloc ucontext response
User space should receive the maximum edpm size from kernel driver,
similar to other edpm/ldpm related limits.  Add an additional parameter to
the alloc_ucontext_resp structure for the edpm maximum size.

In addition, pass an indication from user-space to kernel
(and not just kernel to user) that the DPM sizes are supported.

This is for supporting backward-forward compatibility between driver and
lib for everything related to DPM transaction and limit sizes.

This should have been part of commit mentioned in Fixes tag.

Link: https://lore.kernel.org/r/20200707063100.3811-3-michal.kalderon@marvell.com
Fixes: 93a3d05f9d68 ("RDMA/qedr: Add kernel capability flags for dpm enabled mode")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00
Michal Kalderon
bbe4f42452 RDMA/qedr: Add EDPM mode type for user-fw compatibility
In older FW versions the completion flag was treated as the ack flag in
edpm messages.  commit ff937b916eb6 ("qed: Add EDPM mode type for user-fw
compatibility") exposed the FW option of setting which mode the QP is in
by adding a flag to the qedr <-> qed API.

This patch adds the qedr <-> libqedr interface so that the libqedr can set
the flag appropriately and qedr can pass it down to FW.  Flag is added for
backward compatibility with libqedr.

For older libs, this flag didn't exist and therefore set to zero.

Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20200707063100.3811-2-michal.kalderon@marvell.com
Signed-off-by: Yuval Bason <yuval.bason@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00