9620 Commits

Author SHA1 Message Date
Leon Romanovsky
0ad699c0ed RDMA/core: Simplify restrack interface
In the current implementation, we have one restrack root per-device and
all users are simply providing it directly. Let's simplify the interface
and have callers provide the ib_device and internally access the
restrack_root.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:15:47 -07:00
Leon Romanovsky
659067b0b5 RDMA/nldev: Prepare CAP_NET_ADMIN checks for .doit callbacks
The .doit callbacks don't have a netlink_callback to check capabilities so
in order to use the same fill_res_func for both .dump and .doit, we need
to do the capability check outside of those functions.

For .doit callbacks, it is possible to check CAP_NET_ADMIN directly on the
received sk_buff.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:12:33 -07:00
Leon Romanovsky
8be565e65f RDMA/nldev: Factor out the PID namespace check
The PID namespace is going to be used in the .doit callback, so generalize
its implementation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:11:45 -07:00
Leon Romanovsky
f732e7135b RDMA/nldev: Dynamically generate restrack dumpit callbacks
There is no need to manually write same callbacks, automatically generate
them using C-macro language.

This macro is going to be extended to generate doit callbacks too, so use
general name for this macro.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:10:21 -07:00
Parav Pandit
cf34e1fe52 IB/mlx5: Consider vlan of lower netdev for macvlan GID entries
When a given netdev of the GID entry is macvlan netdevice, and if the
lower netdevice is vlan device, GID entry for macvlan based IP address
needs to inherit the vlan of the lower netdevice.

Therefore, attempt to find out if the lower device exist and if so, if
it is vlan device and setup the vlan tag correctly.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:43:46 -07:00
Gal Pressman
cfc30ad3d0 IB/usnic: Remove stub functions
Lack of mandatory verbs no longer fail device registration, the device
will be marked as a non-kverbs provider.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Tested-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:32:25 -07:00
Gal Pressman
6780c4fa9d RDMA: Add indication for in kernel API support to IB device
Drivers that do not provide kernel verbs support should not be used by ib
kernel clients at all.

In case a device does not implement all mandatory verbs for kverbs usage
mark it as a non kverbs provider and prevent its usage for all clients
except for uverbs.

The device is marked as a non kverbs provider using the 'kverbs_provider'
flag which should only be set by the core code.  The clients can choose
whether kverbs are requested for its usage using the 'no_kverbs_req' flag
which is currently set for uverbs only.

This patch allows drivers to remove mandatory verbs stubs and simply set
the callbacks to NULL. The IB device will be registered as a non-kverbs
provider. Note that verbs that are required for the device registration
process must be implemented.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:32:25 -07:00
Leon Romanovsky
459cc69fa4 RDMA: Provide safe ib_alloc_device() function
All callers to ib_alloc_device() provide a larger size than struct
ib_device and rely on the fact that struct ib_device is embedded in their
driver specific structure as the first member.

Provide a safer variant of ib_alloc_device() that checks and enforces this
approach to make sure the drivers are using it right.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:52:30 -07:00
Kamal Heib
e5c1bb47cc IB/mlx5: Remove set but not used variable
Remove 'del_mkey' variable that is set but not used.

Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Kamal Heib
f3ffed0ce4 IB/mlx5: Make mlx5_ib_stage_odp_cleanup() static
The function mlx5_ib_stage_odp_cleanup() is only used in main.c

Fixes: d5d284b829a6 ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Bart Van Assche
0b5cb3300a RDMA/srp: Increase max_segment_size
The default behavior of the SCSI core is to set the block layer request
queue parameter max_segment_size to 64 KB. That means that elements of
scatterlists are limited to 64 KB. Since RDMA adapters support larger
sizes, increase max_segment_size for the SRP initiator.

Notes:
- The SCSI max_segment_size parameter was introduced in kernel v5.0. See
  also commit 50c2e9107f17 ("scsi: introduce a max_segment_size
  host_template parameters").
- Some other block drivers already set max_segment_size to UINT_MAX,
  e.g. nbd and rbd.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:04:16 -07:00
Michael J. Ruhl
db421a5499 IB/{hfi1, qib, rvt} Cleanup open coded sge usage
Several locations for manipulating sges use an open coded sequence
that is covered by helper functions.

Use the appropriate helper functions.

Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-30 14:22:32 -05:00
Michael J. Ruhl
87fc34b575 IB/{hfi1,qib}: Cleanup open coded sge sizing
Sge sizing is done in several places using an open coded method.

This can cause maintenance issues.  The open coded method is
encapsulated in a helper routine.  The helper was introduced with
commit:

1198fcea8a78 ("IB/hfi1, rdmavt: Move SGE state helper routines into
rdmavt")

Update all call sites that have the open coded path with the helper
routine.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-30 14:22:32 -05:00
Kamal Heib
ed0bc2658e IB/ipoib: Make ipoib_intercept_dev_id_attr() static
The function ipoib_intercept_dev_id_attr() is only used in ipoib_main.c

Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 21:50:36 -07:00
Adit Ranadive
8aa04ad3b3 RDMA/vmw_pvrdma: Support upto 64-bit PFNs
Update the driver to use the new device capability to report 64-bit UAR
PFNs.

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 21:40:28 -07:00
Yishai Hadas
7b21b69ab2 IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
The vma->vm_mm can become impossible to get before rdma_umap_close() is
called, in this case we must not try to get an mm that is already
undergoing process exit. In this case there is no need to wait for
anything as the VMA will be destroyed by another thread soon and is
already effectively 'unreachable' by userspace.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 1 PID: 2050 Comm: bash Tainted: G        W  OE 4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__rb_erase_color+0xb9/0x280
 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89
               58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d
               10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f
 RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246
 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101
 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828
 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838
 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000
 FS:  00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0
 Call Trace:
  unlink_file_vma+0x3b/0x50
  free_pgtables+0xa1/0x110
  exit_mmap+0xca/0x1a0
  ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib]
  mmput+0x54/0x140
  uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs]
  uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs]
  ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
  ib_unregister_device+0xfb/0x200 [ib_core]
  mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
  mlx5_remove_device+0xc1/0xd0 [mlx5_core]
  mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
  remove_one+0x2a/0x90 [mlx5_core]
  pci_device_remove+0x3b/0xc0
  device_release_driver_internal+0x16d/0x240
  unbind_store+0xb2/0x100
  kernfs_fop_write+0x102/0x180
  __vfs_write+0x36/0x1a0
  ? __alloc_fd+0xa9/0x170
  ? set_close_on_exec+0x49/0x70
  vfs_write+0xad/0x1a0
  ksys_write+0x52/0xc0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: <stable@vger.kernel.org> # 4.19
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:57:22 -07:00
Jason Gunthorpe
55c293c38e Merge branch 'devx-async' into k.o/for-next
Yishai Hadas says:

Enable DEVX asynchronous query commands

This series enables querying a DEVX object in an asynchronous mode.

The userspace application won't block when calling the firmware and it will be
able to get the response back once that it will be ready.

To enable the above functionality:

- DEVX asynchronous command completion FD object was introduced.
- The applicable file operations were implemented to enable using it by
  the user application.
- Query asynchronous method was added to the DEVX object, it will call the
  firmware asynchronously and manages the response on the given input FD.
- Hot unplug support was added for the FD to work properly upon
  unbind/disassociate.
- mlx5 core fence for asynchronous commands was implemented and used to
  prevent racing upon unbind/disassociate.

This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* branch 'devx-async':
  IB/mlx5: Implement DEVX hot unplug for async command FD
  IB/mlx5: Implement the file ops of DEVX async command FD
  IB/mlx5: Introduce async DEVX obj query API
  IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:49:31 -07:00
Yishai Hadas
eaebaf77e7 IB/mlx5: Implement DEVX hot unplug for async command FD
Implement DEVX hot unplug for the async command FD.

This is done by managing a list of the inflight commands and wait until
all launched work is completed as part of
devx_hot_unplug_async_cmd_event_file.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
4accbb3fd2 IB/mlx5: Implement the file ops of DEVX async command FD
Implement the file ops of the DEVX async command FD, this enables using
the FD for reading the events and manage other options on the FD.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
a124edba26 IB/mlx5: Introduce async DEVX obj query API
Introduce async DEVX obj query API to get the command response back to
user space once it's ready without blocking when calling the firmware.

The event's data includes a header with some meta data then the firmware
output command data.

The header includes:
- The input 'wr_id' to let application recognizing the response.

The input FD attribute is used to have the event data ready on.
Downstream patches from this series will implement the file ops to let
application read it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
6bf8f22aea IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation.

This object is from type class FD and will be used to read DEVX async
commands completion.

The core layer should allow the driver to set object from type FD in a
safe mode, this option was added with a matching comment in place.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:32:43 -07:00
Masahiro Yamada
b360ce3b2b infiniband: prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].

To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.

Having whitespaces after -I does not matter since commit 48f6e3cf5bc6
("kbuild: do not drop -I without parameter").

[1]: https://patchwork.kernel.org/patch/9632347/

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 15:28:50 -07:00
Masahiro Yamada
ed4cdf4a21 infiniband: remove unneeded header search paths
The included headers are located in include/target/. I was able to
build these drivers without the extra header search paths.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 15:28:50 -07:00
Feras Daoud
6ab4aba00f IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
The following BUG was reported by kasan:

 BUG: KASAN: use-after-free in ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
 Read of size 80 at addr ffff88034c30bcd0 by task kworker/u16:1/24020

 Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib]
 Call Trace:
  dump_stack+0x9a/0xeb
  print_address_description+0xe3/0x2e0
  kasan_report+0x18a/0x2e0
  ? ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
  memcpy+0x1f/0x50
  ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
  ? kvm_clock_read+0x1f/0x30
  ? ipoib_cm_skb_reap+0x610/0x610 [ib_ipoib]
  ? __lock_is_held+0xc2/0x170
  ? process_one_work+0x880/0x1960
  ? process_one_work+0x912/0x1960
  process_one_work+0x912/0x1960
  ? wq_pool_ids_show+0x310/0x310
  ? lock_acquire+0x145/0x440
  worker_thread+0x87/0xbb0
  ? process_one_work+0x1960/0x1960
  kthread+0x314/0x3d0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 0:
  kasan_kmalloc+0xa0/0xd0
  kmem_cache_alloc_trace+0x168/0x3e0
  path_rec_create+0xa2/0x1f0 [ib_ipoib]
  ipoib_start_xmit+0xa98/0x19e0 [ib_ipoib]
  dev_hard_start_xmit+0x159/0x8d0
  sch_direct_xmit+0x226/0xb40
  __dev_queue_xmit+0x1d63/0x2950
  neigh_update+0x889/0x1770
  arp_process+0xc47/0x21f0
  arp_rcv+0x462/0x760
  __netif_receive_skb_core+0x1546/0x2da0
  netif_receive_skb_internal+0xf2/0x590
  napi_gro_receive+0x28e/0x390
  ipoib_ib_handle_rx_wc_rss+0x873/0x1b60 [ib_ipoib]
  ipoib_rx_poll_rss+0x17d/0x320 [ib_ipoib]
  net_rx_action+0x427/0xe30
  __do_softirq+0x28e/0xc42

 Freed by task 26680:
  __kasan_slab_free+0x11d/0x160
  kfree+0xf5/0x360
  ipoib_flush_paths+0x532/0x9d0 [ib_ipoib]
  ipoib_set_mode_rss+0x1ad/0x560 [ib_ipoib]
  set_mode+0xc8/0x150 [ib_ipoib]
  kernfs_fop_write+0x279/0x440
  __vfs_write+0xd8/0x5c0
  vfs_write+0x15e/0x470
  ksys_write+0xb8/0x180
  do_syscall_64+0x9b/0x420
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 The buggy address belongs to the object at ffff88034c30bcc8
                which belongs to the cache kmalloc-512 of size 512
 The buggy address is located 8 bytes inside of
                512-byte region [ffff88034c30bcc8, ffff88034c30bec8)
 The buggy address belongs to the page:

The following race between change mode and xmit flow is the reason for
this use-after-free:

Change mode     Send packet 1 to GID XX      Send packet 2 to GID XX
     |                    |                             |
   start                  |                             |
     |                    |                             |
     |                    |                             |
     |         Create new path for GID XX               |
     |           and update neigh path                  |
     |                    |                             |
     |                    |                             |
     |                    |                             |
 flush_paths              |                             |
                          |                             |
               queue_work(cm.start_task)                |
                          |                 Path for GID XX not found
                          |                      create new path
                          |
                          |
               start_task runs with old
                    released path

There is no locking to protect the lifetime of the path through the
ipoib_cm_tx struct, so delete it entirely and always use the newly looked
up path under the priv->lock.

Fixes: 546481c2816e ("IB/ipoib: Fix memory corruption in ipoib cm mode connect flow")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 12:17:54 -07:00
Yishai Hadas
f8ade8e242 IB/uverbs: Fix ioctl query port to consider device disassociation
Methods cannot peak into the ufile, the only way to get a ucontext and
hence a device is via the ib_uverbs_get_ucontext() call or inspecing a
locked uobject.

Otherwise during/after disassociation the pointers may be null or free'd.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
 PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW  OE     4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs]
 Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24
               08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89
               ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31  c9 31 d2
 RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000
 RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18
 RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000
 R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000
 R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0
 FS:  00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0
 Call Trace:
  ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs]
  ? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs]
  ? arch_tlb_finish_mmu+0x16/0xc0
  ? tlb_finish_mmu+0x1f/0x30
  ? unmap_region+0xd9/0x120
  ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs]
  do_vfs_ioctl+0xa9/0x620
  ? __do_munmap+0x29f/0x3a0
  ksys_ioctl+0x60/0x90
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f55a62cb567

Fixes: 641d1207d2ed ("IB/core: Move query port to ioctl")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Mark Bloch
c1b03c25f5 RDMA/mlx5: Fix flow creation on representors
The intention of the flow_is_supported was to disable the entire tree and
methods that allow raw flow creation, but the grammar syntax has this
disable the entire UVERBS_FLOW object. Since the method requires a
MLX5_IB_OBJECT_FLOW_MATCHER there is no need to do anything, as it is
automatically disabled when matchers are disabled.

This restores the ability to create flow steering rules on representors
via regular verbs.

Fixes: a1462351b590 ("RDMA/mlx5: Fail early if user tries to create flows on IB representors")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Yishai Hadas
425784aa5b IB/uverbs: Fix OOPs upon device disassociation
The async_file might be freed before the disassociation has been ended,
causing qp shutdown to use after free on it.

Since uverbs_destroy_ufile_hw is not a fence, it returns if a
disassociation is ongoing in another thread. It has to be written this way
to avoid deadlock. However this means that the ufile FD close cannot
destroy anything that may still be used by an active kref, such as the the
async_file.

To fix that move the kref_put() to be in ib_uverbs_release_file().

 BUG: unable to handle kernel paging request at ffffffffba682787
 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061
 Oops: 0003 [#1] SMP PTI
 CPU: 1 PID: 32410 Comm: bash Tainted: G           OE 4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0
 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d
		ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85
		d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83
 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006
 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001
 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787
 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294
 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00
 FS:  00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0
 Call Trace:
  _raw_spin_lock_irq+0x27/0x30
  ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs]
  uverbs_free_qp+0x7e/0x90 [ib_uverbs]
  destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
  __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs]
  uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs]
  ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
  ib_unregister_device+0xfb/0x200 [ib_core]
  mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
  mlx5_remove_device+0xc1/0xd0 [mlx5_core]
  mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
  remove_one+0x2a/0x90 [mlx5_core]
  pci_device_remove+0x3b/0xc0
  device_release_driver_internal+0x16d/0x240
  unbind_store+0xb2/0x100
  kernfs_fop_write+0x102/0x180
  __vfs_write+0x36/0x1a0
  ? __alloc_fd+0xa9/0x170
  ? set_close_on_exec+0x49/0x70
  vfs_write+0xad/0x1a0
  ksys_write+0x52/0xc0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fac551aac60

Cc: <stable@vger.kernel.org> # 4.2
Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Artemy Kovalyov
a2093dd35f RDMA/umem: Add missing initialization of owning_mm
When allocating a umem leaf for implicit ODP MR during page fault the
field owning_mm was not set.

Initialize and take a reference on this field to avoid kernel panic when
trying to access this field.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
 PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
 RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core]
 Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a
 RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202
 RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c
 RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80
 RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77
 R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00
 R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c
 FS:  0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0
 Call Trace:
  pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib]
  mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib]
  ? __switch_to+0xe1/0x470
  process_one_work+0x174/0x390
  worker_thread+0x4f/0x3e0
  kthread+0x102/0x140
  ? drain_workqueue+0x130/0x130
  ? kthread_stop+0x110/0x110
  ret_from_fork+0x1f/0x30

Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 09:55:48 -07:00
Lijun Ou
9d9d4ff788 RDMA/hns: Update the kernel header file of hns
The hns_roce_ib_create_srq_resp is used to interact with the user for
data, this was open coded to use a u32 directly, instead use a properly
sized structure.

Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 09:55:48 -07:00
Yuval Avnery
535005ca8e IB/core: Destroy QP if XRC QP fails
The open-coded variant missed destroy of SELinux created QP, reuse already
existing ib_detroy_qp() call and use this opportunity to clean
ib_create_qp() from double prints and unclear exit paths.

Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Ira Weiny
ff0244bb59 RDMA/qib: Use GUP longterm for PSM page pining
Similar to the core change commit 5f1d43de5416 ("IB/core: disable memory
registration of filesystem-dax vmas")

PSM should be prevented from using filesystem DAX pages.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li
0e40dc2f70 RDMA/hns: Add timer allocation support for hip08
This patch adds qpc timer and cqc timer allocation support for hardware
timeout retransmission in kernel space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li
aa84fa1874 RDMA/hns: Add SCC context clr support for hip08
This patch adds SCC context clear support for DCQCN in kernel space
driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li
6a157f7d1b RDMA/hns: Add SCC context allocation support for hip08
This patch adds SCC context allocation and initialization support for
DCQCN in kernel space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Moni Shoua
da6a496a34 IB/mlx5: Ranges in implicit ODP MR inherit its write access
A sub-range in ODP implicit MR should take its write permission from the
MR and not be set always to allow.

Fixes: d07d1d70ce1a ("IB/umem: Update on demand page (ODP) support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Jason Gunthorpe
8ba0ddd094 RDMA/iw_cxgb4: Drop __GFP_NOFAIL
There is no reason for this __GFP_NOFAIL, none of the other routines in
this file use it, and there is an error unwind here. NOFAIL should be
reserved for special cases, not used by network drivers.

Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's")
Reported-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Bart Van Assche
0a353c2e94 IB/mlx5: Declare local functions 'static'
This patch avoids that sparse complains about missing function
declarations.

Fixes: c9990ab39b6e ("RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Bart Van Assche
f373859190 IB/core: Declare local functions 'static'
This patch avoids that sparse complains about missing function
declarations.

Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman
2e061c691c infiniband: ipoib: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman
316bcda81d infiniband: usnic: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman
2537672966 infiniband: ocrdma: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman
73eb8f03f0 infiniband: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman
0d0336cf54 infiniband: qib: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman
e775118025 infiniband: hfi1: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman
5c43276499 infiniband: hfi1: drop crazy DEBUGFS_SEQ_FILE_CREATE() macro
The macro was just making things harder to follow, and audit, so remove
it and call debugfs_create_file() directly.  Also, the macro did not
need to warn about the call failing as no one should ever care about any
debugfs functions failing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman
8283d78725 infiniband: cxgb4: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Parav Pandit
039d713a59 IB/umad: Do not check status of nonseekable_open()
As the comment block of nonseekable_open() describes, nonseekable_open()
can never fail. Several places in kernel depend on this behavior.
Therefore, simplify the umad module to depend on this basic kernel
functionality.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Jason Gunthorpe
e355477ed9 net/mlx5: Make mlx5_cmd_exec_cb() a safe API
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24 14:25:26 +02:00
Parav Pandit
ee848721f6 IB/umad: Avoid additional device reference during open()/close()
ib_umad_init_port_dev() holds the reference of a ib_umad_device instance.
ib_umad_device contains standard core device and cdev.  cdev holds the
reference of its parent core device.  file ops holds the reference to cdev
using core kernel.

Therefore, there is no need to hold additional reference while opening
umad related char devices.

While at it, add comments to bring clarity on releasing references to
ib_umd_device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-23 11:37:05 -07:00
Maor Gottlieb
6113cc4401 IB/mlx5: Don't override existing ip_protocol
Two flow specifications can set the ip protocol field in
the flow table entry:

1) IB_FLOW_SPEC_TCP/UDP/GRE - set the ip protocol accordingly.
2) IB_FLOW_SPEC_IPV4/6 - has ip_protocol field for users
who want to receive specific L4 packets.

We need to avoid overriding of the ip_protocol with zeros,
in case that the user first put the L4 specification and
only then the L3.

Fixes: ca0d47538528b ('IB/mlx5: Add support in TOS and protocol to flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:10:06 -07:00