73 Commits

Author SHA1 Message Date
Jason Gunthorpe
c85f4abe66 RDMA/core: Fix double destruction of uobject
Fix use after free when user user space request uobject concurrently for
the same object, within the RCU grace period.

In that case, remove_handle_idr_uobject() is called twice and we will have
an extra put on the uobject which cause use after free.  Fix it by leaving
the uobject write locked after it was removed from the idr.

Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of
UVERBS_LOOKUP_WRITE will do the work.

  refcount_t: underflow; use-after-free.
  WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x94/0xce
   panic+0x234/0x56f
   __warn+0x1cc/0x1e1
   report_bug+0x200/0x310
   fixup_bug.part.11+0x32/0x80
   do_error_trap+0xd3/0x100
   do_invalid_op+0x31/0x40
   invalid_op+0x1e/0x30
  RIP: 0010:refcount_warn_saturate+0xfe/0x1a0
  Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c
  RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc
  RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3
  R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c
  R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08
   uverbs_uobject_put+0xfd/0x140
   __uobj_perform_destroy+0x3d/0x60
   ib_uverbs_close_xrcd+0x148/0x170
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49
  RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc
  R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate")
Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 14:22:57 -03:00
Jason Gunthorpe
c485b19d52 RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event
The commit below moved all of the destruction to the disassociate step and
cleaned up the event channel during destroy_uobj.

However, when ib_uverbs_free_hw_resources() pushes IB_EVENT_DEVICE_FATAL
and then immediately goes to destroy all uobjects this causes
ib_uverbs_free_event_queue() to discard the queued event if userspace
hasn't already read() it.

Unlike all other event queues async FD needs to defer the
ib_uverbs_free_event_queue() until FD release. This still unregisters the
handler from the IB device during disassociation.

Fixes: 3e032c0e92aa ("RDMA/core: Make ib_uverbs_async_event_file into a uobject")
Link: https://lore.kernel.org/r/20200507063348.98713-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 17:02:25 -03:00
Leon Romanovsky
f0abc761bb RDMA/core: Fix race between destroy and release FD object
The call to ->lookup_put() was too early and it caused an unlock of the
read/write protection of the uobject after the FD was put. This allows a
race:

     CPU1                                 CPU2
 rdma_lookup_put_uobject()
   lookup_put_fd_uobject()
     fput()
				   fput()
				     uverbs_uobject_fd_release()
				       WARN_ON(uverbs_try_lock_object(uobj,
					       UVERBS_LOOKUP_WRITE));
   atomic_dec(usecnt)

Fix the code by changing the order, first unlock and call to
->lookup_put() after that.

Fixes: 3832125624b7 ("IB/core: Add support for idr types")
Link: https://lore.kernel.org/r/20200423060122.6182-1-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 15:40:41 -03:00
Leon Romanovsky
83a2670212 RDMA/core: Fix overwriting of uobj in case of error
In case of failure to get file, the uobj is overwritten and causes to
supply bad pointer as an input to uverbs_uobject_put().

  BUG: KASAN: null-ptr-deref in atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
  BUG: KASAN: null-ptr-deref in refcount_sub_and_test include/linux/refcount.h:253 [inline]
  BUG: KASAN: null-ptr-deref in refcount_dec_and_test include/linux/refcount.h:281 [inline]
  BUG: KASAN: null-ptr-deref in kref_put include/linux/kref.h:64 [inline]
  BUG: KASAN: null-ptr-deref in uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
  Write of size 4 at addr 0000000000000030 by task syz-executor.4/1691

  CPU: 1 PID: 1691 Comm: syz-executor.4 Not tainted 5.6.0 #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x10c/0x190 mm/kasan/report.c:515
   kasan_report+0x32/0x50 mm/kasan/common.c:625
   check_memory_region_inline mm/kasan/generic.c:187 [inline]
   check_memory_region+0x16d/0x1c0 mm/kasan/generic.c:193
   atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
   refcount_sub_and_test include/linux/refcount.h:253 [inline]
   refcount_dec_and_test include/linux/refcount.h:281 [inline]
   kref_put include/linux/kref.h:64 [inline]
   uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
   alloc_begin_fd_uobject+0x1d0/0x250 drivers/infiniband/core/rdma_core.c:486
   rdma_alloc_begin_uobject+0xa8/0xf0 drivers/infiniband/core/rdma_core.c:509
   __uobj_alloc include/rdma/uverbs_std_types.h:117 [inline]
   ib_uverbs_create_comp_channel+0x16d/0x230 drivers/infiniband/core/uverbs_cmd.c:982
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:665
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:295
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007efe9f6a7c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000018 RSI: 0000000020000040 RDI: 0000000000000003
  RBP: 00007efe9f6a86bc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
  R13: 0000000000000bf2 R14: 00000000004cb80a R15: 00000000006fefc0

Fixes: 849e149063bd ("RDMA/core: Do not allow alloc_commit to fail")
Link: https://lore.kernel.org/r/20200421082929.311931-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:42 -03:00
Leon Romanovsky
0fb00941dc RDMA/core: Prevent mixed use of FDs between shared ufiles
FDs can only be used on the ufile that created them, they cannot be mixed
to other ufiles. We are lacking a check to prevent it.

  BUG: KASAN: null-ptr-deref in atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
  BUG: KASAN: null-ptr-deref in atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
  BUG: KASAN: null-ptr-deref in fput_many+0x1a/0x140 fs/file_table.c:336
  Write of size 8 at addr 0000000000000038 by task syz-executor179/284

  CPU: 0 PID: 284 Comm: syz-executor179 Not tainted 5.5.0-rc5+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x18f/0x1b7 mm/kasan/report.c:510
   kasan_report+0xe/0x20 mm/kasan/common.c:639
   check_memory_region_inline mm/kasan/generic.c:185 [inline]
   check_memory_region+0x15d/0x1b0 mm/kasan/generic.c:192
   atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
   atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
   fput_many+0x1a/0x140 fs/file_table.c:336
   rdma_lookup_put_uobject+0x85/0x130 drivers/infiniband/core/rdma_core.c:692
   uobj_put_read include/rdma/uverbs_std_types.h:96 [inline]
   _ib_uverbs_lookup_comp_file drivers/infiniband/core/uverbs_cmd.c:198 [inline]
   create_cq+0x375/0xba0 drivers/infiniband/core/uverbs_cmd.c:1006
   ib_uverbs_create_cq+0x114/0x140 drivers/infiniband/core/uverbs_cmd.c:1089
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:769
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x44ef99
  Code: 00 b8 00 01 00 00 eb e1 e8 74 1c 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c4 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffc0b74c028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007ffc0b74c030 RCX: 000000000044ef99
  RDX: 0000000000000040 RSI: 0000000020000040 RDI: 0000000000000005
  RBP: 00007ffc0b74c038 R08: 0000000000401830 R09: 0000000000401830
  R10: 00007ffc0b74c038 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000000000000000 R14: 00000000006be018 R15: 0000000000000000

Fixes: cf8966b3477d ("IB/core: Add support for fd objects")
Link: https://lore.kernel.org/r/20200421082929.311931-2-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:23 -03:00
Jason Gunthorpe
da57db2567 RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() path
This lock only serializes ucontext creation. Instead of checking the
ucontext_lock during destruction hold the existing hw_destroy_rwsem during
creation, which is the standard pattern for object creation.

The simplification of locking is needed for the next patch.

Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
39e83af817 RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobject
Now that all callers provide a non-NULL attrs the ufile is redundant.
Adjust things so that the context handling is done inside alloc_uobj,
and the ib_uverbs_get_ucontext_file() is avoided if we already have the
context.

Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
849e149063 RDMA/core: Do not allow alloc_commit to fail
This is a left over from an earlier version that creates a lot of
complexity for error unwind, particularly for FD uobjects.

The only reason this was done is so that anon_inode_get_file() could be
called with the final fops and a fully setup uobject. Both need to be
setup since unwinding anon_inode_get_file() via fput will call the
driver's release().

Now that the driver does not provide release, we no longer need to worry
about this complicated sequence, simply create the struct file at the
start and allow the core code's release function to deal with the abort
case.

This allows all the confusing error paths around commit to be removed.

Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
f7c8416cce RDMA/core: Simplify destruction of FD uobjects
FD uobjects have a weird split between the struct file and uobject
world. Simplify this to make them pure uobjects and use a generic release
method for all struct file operations.

This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always
called before erasing the linked list contents to make the concurrancy
simpler to understand.

For this to work the uobject destruction must fence anything that it is
cleaning up - the design must not rely on struct file lifetime.

Only deliver_event() relies on the struct file to when adding new events
to the queue, add a is_destroyed check under lock to block it.

Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
6898d1c661 RDMA/mlx5: Use RCU and direct refcounts to keep memory alive
dispatch_event_fd() runs from a notifier with minimal locking, and relies
on RCU and a file refcount to keep the uobject and eventfd alive.

As the next patch wants to remove the file_operations release function
from the drivers, re-organize things so that the devx_event_notifier()
path uses the existing RCU to manage the lifetime of the uobject and
eventfd.

Move the refcount puts to a call_rcu so that the objects are guaranteed to
exist and remove the indirect file refcount.

Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
8bdf9dd984 RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_class
After device disassociation the uapi_objects are destroyed and freed,
however it is still possible that core code can be holding a kref on the
uobject. When it finally goes to uverbs_uobject_free() via the kref_put()
it can trigger a use-after-free on the uapi_object.

Since needs_kfree_rcu is a micro optimization that only benefits file
uobjects, just get rid of it. There is no harm in using kfree_rcu even if
it isn't required, and the number of involved objects is small.

Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:18 -04:00
Michal Kalderon
3411f9f01b RDMA/core: Create mmap database and cookie helper functions
Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:00 -04:00
Matthew Wilcox
b9b0f34531 uverbs: Convert idr to XArray
The word 'idr' is scattered throughout the API, so I haven't changed it,
but the 'idr' variable is now an XArray.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 12:27:11 -03:00
Jason Gunthorpe
feec576a6a IB: When attrs.udata/ufile is available use that instead of uobject
The ucontext and ufile should not be accessed via the uobject, all these
cases have an attrs so use that instead.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Shamir Rabinovitch
a6a3797df2 IB: Pass uverbs_attr_bundle down uobject destroy path
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:36 -03:00
Shamir Rabinovitch
70f06b26f0 IB: ucontext should be set properly for all cmd & ioctl paths
the Attempt to use the below commit to initialize the ucontext for the
uobject destroy path has shown that the below commit is incomplete.

Parts were reverted and the ucontext set up in the uverbs_attr_bundle was
moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX
macros and rdma_alloc_begin_uobject which is called when uobject is
created.

Fixes: 3d9dfd060391 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:03 -03:00
Leon Romanovsky
a2a074ef39 RDMA: Handle ucontext allocations by IB/core
Following the PD conversion patch, do the same for ucontext allocations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 14:11:37 -07:00
Shamir Rabinovitch
3d9dfd0603 IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows
as soon as the flow has ib_uobject.

In addition, remove rdma_get_ucontext helper function that is only used by
ib_umem_get.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 11:16:21 -07:00
Yishai Hadas
6bf8f22aea IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation.

This object is from type class FD and will be used to read DEVX async
commands completion.

The core layer should allow the driver to set object from type FD in a
safe mode, this option was added with a matching comment in place.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:32:43 -07:00
Kamal Heib
3023a1e936 RDMA: Start use ib_device_ops
Make all the required change to start use the ib_device_ops structure.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:16 -07:00
Yishai Hadas
4d7e8cc574 IB/core: Introduce UVERBS_IDR_ANY_OBJECT
Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object.

Once used, the infrastructure skips checking for the IDR type, it
becomes the driver handler responsibility.

This enables drivers to get in a given method an object from various of
types.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:46:41 -05:00
Leon Romanovsky
12d23a9198 RDMA/uverbs: Annotate alloc/deallloc paths with context tracking
Add restrack annotations to track allocations of ucontexts.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03 14:58:25 -05:00
Jason Gunthorpe
7106a97697 RDMA/uverbs: Make write() handlers return 0 on success
Currently they return the command length, while all other handlers return
0. This makes the write path closer to the write_ex and ioctl path.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26 16:48:07 -07:00
Jason Gunthorpe
8313c10fa8 RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for write
Now that we can add meta-data to the description of write() methods we
need to pass the uverbs_attr_bundle into all write based handlers so
future patches can use it as a container for any new data transferred out
of the core.

This is the first step to bringing the write() and ioctl() methods to a
common interface signature.

This is a simple search/replace, and we push the attr down into the uobj
and other APIs to keep changes minimal.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26 16:48:07 -07:00
Jason Gunthorpe
2a3ccfdbeb RDMA/uverbs: Get rid of ucontext->tgid
Nothing uses this now, just delete it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe
ce92db1ca8 RDMA/ucontext: Get rid of the old disassociate flow
The disassociate_ucontext function in every driver is now empty, so we
don't need this ugly and wrong code that was messing with tgids.

rdma_user_mmap_io does this same work in a better way.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe
5f9794dc94 RDMA/ucontext: Add a core API for mmaping driver IO memory
To support disassociation and PCI hot unplug, we have to track all the
VMAs that refer to the device IO memory. When disassociation occurs the
VMAs have to be revised to point to the zero page, not the IO memory, to
allow the physical HW to be unplugged.

The three drivers supporting this implemented three different versions
of this algorithm, all leaving something to be desired. This new common
implementation has a few differences from the driver versions:

- Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
  the private data allocation to the lifetime of the vma. This avoids any
  tricks with setting vm_ops which Linus didn't like. (see link)
- Support multiple mms, and support properly tracking mmaps triggered by
  processes other than the one first opening the uverbs fd. This makes
  fork behavior of disassociation enabled drivers the same as fork support
  in normal drivers.
- Don't use crazy get_task stuff.
- Simplify the approach for to racing between vm_ops close and
  disassociation, fixing the related bugs most of the driver
  implementations had. Since we are in core code the tracking list can be
  placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
  than any VMAs created by mmap on the uverbs FD.

Link: https://www.spinics.net/lists/stable/msg248747.html
Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Artemy Kovalyov
e4ff3d22c1 IB/core: Release object lock if destroy failed
The object lock was supposed to always be released during destroy, but
when the destruction retry series was integrated with the destroy series
it created a failure path that missed the unlock.

Keep with convention, if destroy fails the caller must undo all locking.

Fixes: 87ad80abc70d ("IB/uverbs: Consolidate uobject destruction")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-04 15:07:55 -06:00
Jason Gunthorpe
0a3173a5f0 Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window

Conflicts:
 drivers/infiniband/core/rdma_core.c
   - Use the rdma code and revise with the new spelling for
     atomic_fetch_add_unless
 drivers/nvme/host/rdma.c
   - Replace max_sge with max_send_sge in new blk code
 drivers/nvme/target/rdma.c
   - Use the blk code and revise to use NULL for ib_post_recv when
     appropriate
   - Replace max_sge with max_recv_sge in new blk code
 net/rds/ib_send.c
   - Use the net code and revise to use NULL for ib_post_recv when
     appropriate

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:21:29 -06:00
Jason Gunthorpe
51d0a2b4cf IB/uverbs: Remove struct uverbs_root_spec and all supporting code
Everything now uses the uverbs_uapi data structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13 09:17:19 -06:00
Jason Gunthorpe
6b0d08f4a2 IB/uverbs: Use uverbs_api to manage the object type inside the uobject
Currently the struct uverbs_obj_type stored in the ib_uobject is part of
the .rodata segment of the module that defines the object. This is a
problem if drivers define new uapi objects as we will be left with a
dangling pointer after device disassociation.

Switch the uverbs_obj_type for struct uverbs_api_object, which is
allocated memory that is part of the uverbs_api and is guaranteed to
always exist. Further this moves the 'type_class' into this memory which
means access to the IDR/FD function pointers is also guaranteed. Drivers
cannot define new types.

This makes it safe to continue to use all uobjects, including driver
defined ones, after disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe
0f50d88a6e IB/uverbs: Allow all DESTROY commands to succeed after disassociate
The disassociate function was broken by design because it failed all
commands. This prevents userspace from calling destroy on a uobject after
it has detected a device fatal error and thus reclaiming the resources in
userspace is prevented.

This fix is now straightforward, when anything destroys a uobject that is
not the user the object remains on the IDR with a NULL context and object
pointer. All lookup locking modes other than DESTROY will fail. When the
user ultimately calls the destroy function it is simply dropped from the
IDR while any related information is returned.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
cc2e14e680 IB/uverbs: Lower the test for ongoing disassociation
Commands that are reading/writing to objects can test for an ongoing
disassociation during their initial call to rdma_lookup_get_uobject.  This
directly prevents all of these commands from conflicting with an ongoing
disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
1e857e65d4 IB/uverbs: Allow uobject allocation to work concurrently with disassociate
After all the recent structural changes this is now straightforward, hold
the hw_destroy_rwsem across the entire uobject creation. We already take
this semaphore on the success path, so holding it a bit longer is not
going to change the performance.

After this change none of the create callbacks require the
disassociate_srcu lock to be correct.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
7452a3c745 IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate
After all the recent structural changes this is now straightfoward, hoist
the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around
the uobject write lock as well as the destroy.

This is necessary as obtaining a write lock concurrently with
uverbs_destroy_ufile_hw() will cause malfunction.

After this change none of the destroy callbacks require the
disassociate_srcu lock to be correct.

This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the
IOCTL interface needs to hold an unlocked kref until all command
verification is completed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
9867f5c669 IB/uverbs: Convert 'bool exclusive' into an enum
This is more readable, and future patches will need a 3rd lookup type.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
87ad80abc7 IB/uverbs: Consolidate uobject destruction
There are several flows that can destroy a uobject and each one is
minimized and sprinkled throughout the code base, making it difficult to
understand and very hard to modify the destroy path.

Consolidate all of these into uverbs_destroy_uobject() and call it in all
cases where a uobject has to be destroyed.

This makes one change to the lifecycle, during any abort (eg when
alloc_commit is not called) we always call out to alloc_abort, even if
remove_commit needs to be called to delete a HW object.

This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
clarify its actual usage and revises some of the comments to reflect what
the life cycle is for the type implementation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
32ed5c00ac IB/uverbs: Make the write path destroy methods use the same flow as ioctl
The ridiculous dance with uobj_remove_commit() is not needed, the write
path can follow the same flow as ioctl - lock and destroy the HW object
then use the data left over in the uobject to form the response to
userspace.

Two helpers are introduced to make this flow straightforward for the
caller.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe
aa72c9a5f9 IB/uverbs: Remove rdma_explicit_destroy() from the ioctl methods
The core code will destroy the HW object on behalf of the method, if the
method provides an implementation it must simply copy data from the stub
uobj into the response. Destroy methods cannot touch the HW object.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:37 -06:00
Jason Gunthorpe
22fa27fbc6 IB/uverbs: Fix locking around struct ib_uverbs_file ucontext
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:46 -06:00
Jason Gunthorpe
aba94548c9 IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit
Allocating the struct file during alloc_begin creates this strange
asymmetry with IDR, where the FD has two krefs pointing at it during the
pre-commit phase. In particular this makes the abort process for FD very
strange and confusing.

For instance abort currently calls the type's destroy_object twice, and
the fops release once if abort is done. This is very counter intuitive. No
fops should be called until alloc_commit succeeds, and destroy_object
should only ever be called once.

Moving the struct file allocation to the alloc_commit is now simple, as we
already support failure of rdma_alloc_commit_uobject, with all the
required rollback pieces.

This creates an understandable symmetry with IDR and simplifies/fixes the
abort handling for FD types.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
2c96eb7d62 IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()
The ioctl framework already does this correctly, but the write path did
not. This is trivially fixed by simply using a standard pattern to return
uobj_alloc_commit() as the last statement in every function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
e951747a08 IB/uverbs: Rework the locking for cleaning up the ucontext
The locking here has always been a bit crazy and spread out, upon some
careful analysis we can simplify things.

Create a single function uverbs_destroy_ufile_hw() that internally handles
all locking. This pulls together pieces of this process that were
sprinkled all over the places into one place, and covers them with one
lock.

This eliminates several duplicate/confusing locks and makes the control
flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
simple.

Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
logically part of the rwsem and provides the 'down write, fail if write
locked, wait if read locked' semantic we require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
87064277c4 IB/uverbs: Revise and clarify the rwsem and uobjects_lock
Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
to the type destroy function (aka 'hw' destroy). The main purpose of this
lock is to prevent normal add and destroy from running concurrently with
uverbs_cleanup_ufile()

Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
we can eliminate the uobjects_lock in the cleanup function. This allows
converting that lock to a very simple spinlock with a narrow critical
section.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
e6d5d5ddd0 IB/uverbs: Clarify and revise uverbs_close_fd
The locking requirements here have changed slightly now that we can rely
on the ib_uverbs_file always existing and containing all the necessary
locking infrastructure.

That means we can get rid of the cleanup_mutex usage (this was protecting
the check on !uboj->context).

Otherwise, follow the same pattern that IDR uses for destroy, acquire
exclusive write access, then call destroy and the undo the 'lookup'.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
5671f79b42 IB/uverbs: Revise the placement of get/puts on uobject
This wasn't wrong, but the placement of two krefs didn't make any
sense. Follow some simple rules.

- A kref is held inside uobjects_list
- A kref is held inside the IDR
- A kref is held inside file->private
- A stack based kref is passed bettwen alloc_begin and
  alloc_abort/alloc_commit

Any place we destroy one of the above pointers, we stick a put,
or 'move' the kref into another pointer.

The key functions have sensible semantics:
- alloc_uobj fully initializes the common members in uobj, including
  the list
- Get rid of the uverbs_idr_remove_uobj helper since IDR remove
  does require put, but it depends on the situation. Later
  patches will re-consolidate this differently.
- alloc_abort always consumes the passed kref, done in the type
- alloc_commit always consumes the passed kref, done in the type
- rdma_remove_commit_uobject always pairs with a lookup_get

After it is all done the only control flow change is to:
- move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
- add a put to remove_commit_idr_uobject
- Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
  the right place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
c561c28846 IB/uverbs: Clarify the kref'ing ordering for alloc_commit
The alloc_commit callback makes the uobj visible to other threads,
and it does so using a 'move' semantic of the uobj kref on the stack
into the public storage (eg the IDR, uobject list and file_private_data)

Once this is done another thread could start up and trigger deletion
of the kref. Fortunately cleanup_rwsem happens to prevent this from
being a bug, but that is a fantastically unclear side effect.

Re-organize things so that alloc_commit is that last thing to touch
the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
and add a comment reminding that uobj is no longer kref'd after
alloc_commit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe
1250c3048c IB/uverbs: Handle IDR and FD types without truncation
Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
we ended up implicitly casting these ABI values into an 'int'. For ioctl()
we use a s64 for FDs and a u64 for IDRs, again casting to an int.

The various casts to int are all missing range checks which can cause
userspace values that should be considered invalid to be accepted.

Fix this by making the generic lookup routine accept a s64, which does not
truncate the write API's u32/s32 or the ioctl API's s64. Then push the
detailed range checking down to the actual type implementations to be
shared by both interfaces.

Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
if we ever wish to return a negative value for a FD it is carried
properly.

This ensures that userspace values are never weirdly interpreted due to
the various trunctations and everything that is really out of range gets
an EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe
3df593bfe6 IB/uverbs: Get rid of null_obj_type
If the method fails after calling rdma_explicit_destroy (eg if
copy_to_user faults) then it will trigger a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
SMP PTI
CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
 ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
 ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
 ? find_held_lock+0x2d/0x90
 ? __might_fault+0x39/0x90
 ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
 ? do_vfs_ioctl+0xa0/0x6d0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x24/0x40
 ? syscall_trace_enter+0x138/0x1d0
 ? ksys_ioctl+0x35/0x60
 ? __x64_sys_ioctl+0x11/0x20
 ? do_syscall_64+0x5b/0x1c0
 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because the type was replaced with the null_type during explicit
destroy that cannot complete the destruction.

One of the side effects of replacing the type is to make the object
handle totally unreachable - so no other command could attempt to use
it, even though it remains on the uboject list.

We can get the same end result by just fully destroying the object inside
rdma_explicit_destroy and leaving the caller the residual kref for the
uobj with no attached HW object, and no presence in the ubojects list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe
d0259e82e7 IB/uverbs: Remove ib_uobject_file
The only purpose for this structure was to hold the ib_uobject_file
pointer, but now that is part of the standard ib_uobject the structure
no longer makes any sense, so get rid of it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00