1013545 Commits

Author SHA1 Message Date
Bob Pearson
0a67c46d2e RDMA/rxe: Protect user space index loads/stores
Modify the queue APIs to protect all user space index loads with
smp_load_acquire() and all user space index stores with
smp_store_release(). Base this on the types of the queues which can be one
of ..KERNEL, ..FROM_USER, ..TO_USER. Kernel space indices are protected by
locks which also provide memory barriers.

Link: https://lore.kernel.org/r/20210527194748.662636-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-03 15:53:01 -03:00
Bob Pearson
59daff49f2 RDMA/rxe: Add a type flag to rxe_queue structs
To create optimal code only want to use smp_load_acquire() and
smp_store_release() for user indices in rxe_queue APIs since kernel
indices are protected by locks which also act as memory barriers. By
adding a type to the queues we can determine which indices need to be
protected.

Link: https://lore.kernel.org/r/20210527194748.662636-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-03 15:53:01 -03:00
Jason Gunthorpe
50971e3915 Merge branch 'irdma' into rdma.git for-next
Shiraz Saleem says:

====================
Add Intel Ethernet Protocol Driver for RDMA (irdma)

The following patch series introduces a unified Intel Ethernet Protocol
Driver for RDMA (irdma) for the X722 iWARP device and a new E810 device
which supports iWARP and RoCEv2. The irdma module replaces the legacy
i40iw module for X722 and extends the ABI already defined for i40iw. It is
backward compatible with legacy X722 rdma-core provider (libi40iw).

X722 and E810 are PCI network devices that are RDMA capable. The RDMA
block of this parent device is represented via an auxiliary device
exported to 'irdma' using the core auxiliary bus infrastructure recently
added for 5.11 kernel.  The parent PCI netdev drivers 'i40e' and 'ice'
register auxiliary RDMA devices with private data/ops encapsulated that
bind to auxiliary drivers registered in irdma module.

Currently, default is RoCEv2 for E810. Runtime support for protocol switch
to iWARP will be made available via devlink in a future patch.
====================

Link: https://lore.kernel.org/r/20210602205138.889-1-shiraz.saleem@intel.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* branch 'irdma':
  RDMA/irdma: Update MAINTAINERS file
  RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw
  RDMA/irdma: Add ABI definitions
  RDMA/irdma: Add dynamic tracing for CM
  RDMA/irdma: Add miscellaneous utility definitions
  RDMA/irdma: Add user/kernel shared libraries
  RDMA/irdma: Add RoCEv2 UD OP support
  RDMA/irdma: Implement device supported verb APIs
  RDMA/irdma: Add PBLE resource manager
  RDMA/irdma: Add connection manager
  RDMA/irdma: Add QoS definitions
  RDMA/irdma: Add privileged UDA queue implementation
  RDMA/irdma: Add HMC backing store setup functions
  RDMA/irdma: Implement HW Admin Queue OPs
  RDMA/irdma: Implement device initialization definitions
  RDMA/irdma: Register auxiliary driver and implement private channel OPs
  i40e: Register auxiliary devices to provide RDMA
  i40e: Prep i40e header for aux bus conversion
  ice: Register auxiliary device to provide RDMA
  ice: Implement iidc operations
  ice: Initialize RDMA support
  iidc: Introduce iidc.h
  i40e: Replace one-element array with flexible-array member
2021-06-02 20:07:59 -03:00
Shiraz Saleem
f6d2bbdf3d RDMA/irdma: Update MAINTAINERS file
Add maintainer entry for irdma driver.

Link: https://lore.kernel.org/r/20210602205138.889-17-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 20:06:36 -03:00
Shiraz Saleem
fa0cf568fd RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw
Add Kconfig and Makefile to build irdma driver.

Remove i40iw driver and add an alias in irdma.

Remove legacy exported symbols i40e_register_client
and i40e_unregister_client from i40e as they are no
longer used.

irdma is the replacement driver that supports X722.

Link: https://lore.kernel.org/r/20210602205138.889-16-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 20:06:36 -03:00
Mustafa Ismail
48d6b3336a RDMA/irdma: Add ABI definitions
Add ABI definitions for irdma.

Link: https://lore.kernel.org/r/20210602205138.889-15-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:19 -03:00
Michael J. Ruhl
ddae5d62f3 RDMA/irdma: Add dynamic tracing for CM
Add dynamic tracing functionality to debug connection
management issues.

Link: https://lore.kernel.org/r/20210602205138.889-14-shiraz.saleem@intel.com
Signed-off-by: "Michael J. Ruhl" <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:19 -03:00
Mustafa Ismail
915cc7ac0f RDMA/irdma: Add miscellaneous utility definitions
Add miscellaneous utility functions and headers.

Link: https://lore.kernel.org/r/20210602205138.889-13-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:19 -03:00
Mustafa Ismail
551c46edc7 RDMA/irdma: Add user/kernel shared libraries
Building the WQE descriptors for different verb
operations are similar in kernel and user-space.
Add these shared libraries.

Link: https://lore.kernel.org/r/20210602205138.889-12-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:18 -03:00
Mustafa Ismail
dd90451fac RDMA/irdma: Add RoCEv2 UD OP support
Add the header, data structures and functions
to populate the WQE descriptors and issue the
Control QP commands that support RoCEv2 UD operations.

Link: https://lore.kernel.org/r/20210602205138.889-11-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:18 -03:00
Mustafa Ismail
b48c24c2d7 RDMA/irdma: Implement device supported verb APIs
Implement device supported verb APIs. The supported APIs
vary based on the underlying transport the ibdev is
registered as (i.e. iWARP or RoCEv2).

Link: https://lore.kernel.org/r/20210602205138.889-10-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:18 -03:00
Mustafa Ismail
e8c4dbc2fc RDMA/irdma: Add PBLE resource manager
Implement a Physical Buffer List Entry (PBLE) resource manager
to manage a pool of PBLE HMC resource objects.

Link: https://lore.kernel.org/r/20210602205138.889-9-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:18 -03:00
Mustafa Ismail
146b9756f1 RDMA/irdma: Add connection manager
Add connection management (CM) implementation for
iWARP including accept, reject, connect, create_listen,
destroy_listen and CM utility functions

Link: https://lore.kernel.org/r/20210602205138.889-8-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:18 -03:00
Mustafa Ismail
3ae331c751 RDMA/irdma: Add QoS definitions
Add definitions for managing the RDMA HW work scheduler (WS) tree.

A WS node is created via a control QP operation with the bandwidth
allocation, arbitration scheme, and traffic class of the QP specified.
The Qset handle returned associates the QoS parameters for the QP.
The Qset is registered with the LAN and an equivalent node is created
in the LAN packet scheduler tree.

Link: https://lore.kernel.org/r/20210602205138.889-7-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:17 -03:00
Mustafa Ismail
a3a06db504 RDMA/irdma: Add privileged UDA queue implementation
Implement privileged UDA queues to handle iWARP connection
packets and receive exceptions.

Link: https://lore.kernel.org/r/20210602205138.889-6-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:17 -03:00
Mustafa Ismail
d1850c005a RDMA/irdma: Add HMC backing store setup functions
HW uses host memory as a backing store for a number of
protocol context objects and queue state tracking.
The Host Memory Cache (HMC) is a component responsible for
managing these objects stored in host memory.

Add the functions and data structures to manage the allocation
of backing pages used by the HMC for the various objects

Link: https://lore.kernel.org/r/20210602205138.889-5-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:17 -03:00
Mustafa Ismail
3f49d68425 RDMA/irdma: Implement HW Admin Queue OPs
The driver posts privileged commands to the HW
Admin Queue (Control QP or CQP) to request administrative
actions from the HW. Implement create/destroy of CQP
and the supporting functions, data structures and headers
to handle the different CQP commands

Link: https://lore.kernel.org/r/20210602205138.889-4-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:17 -03:00
Mustafa Ismail
44d9e52977 RDMA/irdma: Implement device initialization definitions
Implement device initialization routines, interrupt set-up,
and allocate object bit-map tracking structures.
Also, add device specific attributes and register definitions.

Link: https://lore.kernel.org/r/20210602205138.889-3-shiraz.saleem@intel.com
[flexible array transformation]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:16 -03:00
Mustafa Ismail
8498a30e1b RDMA/irdma: Register auxiliary driver and implement private channel OPs
Register auxiliary drivers which can attach to auxiliary RDMA
devices from Intel PCI netdev drivers i40e and ice. Implement the private
channel ops, and register net notifiers.

Link: https://lore.kernel.org/r/20210602205138.889-2-shiraz.saleem@intel.com
[flexible array transformation]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:16 -03:00
Mark Zhang
76039ac909 IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock
During cm_dev deregistration in cm_remove_one(), the cm_device and
cm_ports will be freed, after that they should not be accessed. The
mad_agent needs to be protected as well.

This patch adds a cm_device kref to protect cm_dev and cm_ports, and a
mad_agent_lock spinlock to protect mad_agent.

Link: https://lore.kernel.org/r/501ba7a2ff203dccd0e6755d3f93329772adce52.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang
7345201c39 IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_path
The cm_init_av_for_lap() and cm_init_av_by_path() function calls have the
following issues:

1. Both of them might sleep and should not be called under spinlock.
2. The access of cm_id_priv->av should be under cm_id_priv->lock, which
   means it can't be initialized directly.

This patch splits the calling of 2 functions into two parts: first one
initializes an AV outside of the spinlock, the second one copies AV to
cm_id_priv->av under spinlock.

Fixes: e1444b5a163e ("IB/cm: Fix automatic path migration support")
Link: https://lore.kernel.org/r/038fb8ad932869b4548b0c7708cab7f76af06f18.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang
70076a414e IB/cm: Simplify ib_cancel_mad() and ib_modify_mad() calls
The mad_agent parameter is redundant since the struct ib_mad_send_buf
already has a pointer of it.

Link: https://lore.kernel.org/r/0987c784b25f7bfa72f78691f50cff066de587e1.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Mark Zhang
3595c398f6 Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered"
This reverts commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0, which wasn't
a full fix and still causes to the following panic:

panic @ time 1605623870.843, thread 0xfffffeb63b552000: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe811a94e000
    time = 1605623870
    cpuid = 9, TSC = 0xb7937acc1b6
    Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: --------------------------------------------------
    kernel:vm_fault+0x19da
    kernel:vm_fault_trap+0x6e
    kernel:trap_pfault+0x1f1
    kernel:trap+0x31e
    kernel:cm_destroy_id+0x38c
    kernel:rdma_destroy_id+0x127
    kernel:sdp_shutdown_task+0x3ae
    kernel:taskqueue_run_locked+0x10b
    kernel:taskqueue_thread_loop+0x87
    kernel:fork_exit+0x83

Link: https://lore.kernel.org/r/4346449a7cdacc7a4eedc89cb1b42d8434ec9814.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:58 -03:00
Jason Gunthorpe
efafae6717 IB/cm: Tidy remaining cm_msg free paths
Now that all the free paths are explicit cm_free_msg() will only be called
for msgs's allocated with cm_alloc_msg(), so we can assume the context is
set. Place it after the allocation function it is paired with for clarity.

Also remove a bogus NULL assignment in one place after a cancel. This does
nothing other than disable completions to become events, but changing the
state already did that.

Link: https://lore.kernel.org/r/082fd3552be0d1a2c19b1c4cefb5f3f0e3e68e82.1622629024.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe
c1cf6d9f74 IB/cm: Call the correct message free functions in cm_send_handler()
There are now three destroy functions for the cm_msg, and all places
except the general send completion handler use the correct function.

Fix cm_send_handler() to detect which kind of message is being completed
and destroy it using the correct function with the correct locking.

Link: https://lore.kernel.org/r/62a507195b8db85bb11228d0c6e7fa944204bf12.1622629024.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe
4b4e586ebe IB/cm: Split cm_alloc_msg()
This is being used with two quite different flows, one attaches the
message to the priv and the other does not.

Ensure the message attach is consistently done under the spinlock and
ensure that the free on error always detaches the message from the
cm_id_priv, also always under lock.

This makes read/write to the cm_id_priv->msg consistently locked and
consistently NULL'd when the message is freed, even in all error paths.

Link: https://lore.kernel.org/r/f692b8c89eecb34fd82244f317e478bea6c97688.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Jason Gunthorpe
96376a4095 IB/cm: Pair cm_alloc_response_msg() with a cm_free_response_msg()
This is not a functional change, but it helps make the purpose of all the
cm_free_msg() calls clearer. In this case a response msg has a NULL
context[0], and is never placed in cm_id_priv->msg.

Link: https://lore.kernel.org/r/5cd53163be7df0a94f0d4ef7294546bc674fb74a.1622629024.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:41:57 -03:00
Leon Romanovsky
f974428872 RDMA/core: Sanitize WQ state received from the userspace
The mlx4 and mlx5 implemented differently the WQ input checks.  Instead of
duplicating mlx4 logic in the mlx5, let's prepare the input in the central
place.

The mlx5 implementation didn't check for validity of state input.  It is
not real bug because our FW checked that, but still worth to fix.

Fixes: f213c0527210 ("IB/uverbs: Add WQ support")
Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com
Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:20:11 -03:00
Jack Wang
0e8558476f RDMA/rtrs: Avoid Wtautological-constant-out-of-range-compare
drivers/infiniband/ulp/rtrs/rtrs-clt.c:1786:19: warning: result of comparison of
constant 'MAX_SESS_QUEUE_DEPTH' (65536) with expression of type 'u16'
(aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]

To fix it, limit MAX_SESS_QUEUE_DEPTH to u16 max, which is 65535, and
drop the check in rtrs-clt, as it's the type u16 max.

Link: https://lore.kernel.org/r/20210531122835.58329-1-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-31 15:38:08 -03:00
Shiraz Saleem
f4370a85d6 i40e: Register auxiliary devices to provide RDMA
Convert i40e to use the auxiliary bus infrastructure to export
the RDMA functionality of the device to the RDMA driver.
Register i40e client auxiliary RDMA device on the auxiliary bus per
PCIe device function for the new auxiliary rdma driver (irdma) to
attach to.

The global i40e_register_client and i40e_unregister_client symbols
will be obsoleted once irdma replaces i40iw in the kernel
for the X722 device.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Shiraz Saleem
9ed7533121 i40e: Prep i40e header for aux bus conversion
Add the definitions to the i40e client header file in
preparation to convert i40e to use the new auxiliary bus
infrastructure. This header is shared between the 'i40e'
Intel networking driver providing RDMA support and the
'irdma' driver.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Dave Ertman
f9f5301e7e ice: Register auxiliary device to provide RDMA
Register ice client auxiliary RDMA device on the auxiliary bus per
PCIe device function for the auxiliary driver (irdma) to attach to.
It allows to realize a single RDMA driver (irdma) capable of working with
multiple netdev drivers over multi-generation Intel HW supporting RDMA.
There is no load ordering dependencies between ice and irdma.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Dave Ertman
348048e724 ice: Implement iidc operations
Add implementations for supporting iidc operations for device operation
such as allocation of resources and event notifications.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Dave Ertman
d25a0fc41c ice: Initialize RDMA support
Probe the device's capabilities to see if it supports RDMA. If so, allocate
and reserve resources to support its operation; populate structures with
initial values.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Dave Ertman
e860fa9b69 iidc: Introduce iidc.h
Introduce a shared header file used by the 'ice' Intel networking driver
providing RDMA support and the 'irdma' driver to provide a private
interface.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28 20:11:13 -07:00
Gioh Kim
7ecd7e290b RDMA/rtrs-clt: Fix memory leak of not-freed sess->stats and stats->pcpu_stats
sess->stats and sess->stats->pcpu_stats objects are freed
when sysfs entry is removed. If something wrong happens and
session is closed before sysfs entry is created,
sess->stats and sess->stats->pcpu_stats objects are not freed.

This patch adds freeing of them at three places:
1. When client uses wrong address and session creation fails.
2. When client fails to create a sysfs entry.
3. When client adds wrong address via sysfs add_path.

Fixes: 215378b838df0 ("RDMA/rtrs: client: sysfs interface functions")
Link: https://lore.kernel.org/r/20210528113018.52290-21-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:59 -03:00
Md Haris Iqbal
5b73b799c2 RDMA/rtrs-clt: Check if the queue_depth has changed during a reconnection
The queue_depth is a module parameter for rtrs_server. It is used on the
client side to determing the queue_depth of the request queue for the RNBD
virtual block device.

During a reconnection event for an already mapped device, in case the
rtrs_server module queue_depth has changed, fail the reconnect attempt.

Also stop further auto reconnection attempts. A manual reconnect via
sysfs has to be triggerred.

Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-20-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:59 -03:00
Jack Wang
6bb97a2c1a RDMA/rtrs-srv: Fix memory leak when having multiple sessions
Gioh notice memory leak below
unreferenced object 0xffff8880acda2000 (size 2048):
  comm "kworker/4:1", pid 77, jiffies 4295062871 (age 1270.730s)
  hex dump (first 32 bytes):
    00 20 da ac 80 88 ff ff 00 20 da ac 80 88 ff ff  . ....... ......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000e85d85b5>] rtrs_srv_rdma_cm_handler+0x8e5/0xa90 [rtrs_server]
    [<00000000e31a988a>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm]
    [<000000000eb02c5b>] cm_process_work+0x2d/0x100 [ib_cm]
    [<00000000e1650ca9>] cm_req_handler+0x11bc/0x1c40 [ib_cm]
    [<000000009c28818b>] cm_work_handler+0xe65/0x3cf2 [ib_cm]
    [<000000002b53eaa1>] process_one_work+0x4bc/0x980
    [<00000000da3499fb>] worker_thread+0x78/0x5c0
    [<00000000167127a4>] kthread+0x191/0x1e0
    [<0000000060802104>] ret_from_fork+0x3a/0x50
unreferenced object 0xffff88806d595d90 (size 8):
  comm "kworker/4:1H", pid 131, jiffies 4295062972 (age 1269.720s)
  hex dump (first 8 bytes):
    62 6c 61 00 6b 6b 6b a5                          bla.kkk.
  backtrace:
    [<000000004447d253>] kstrdup+0x2e/0x60
    [<0000000047259793>] kobject_set_name_vargs+0x2f/0xb0
    [<00000000c2ee3bc8>] dev_set_name+0xab/0xe0
    [<000000002b6bdfb1>] rtrs_srv_create_sess_files+0x260/0x290 [rtrs_server]
    [<0000000075d87bd7>] rtrs_srv_info_req_done+0x71b/0x960 [rtrs_server]
    [<00000000ccdf1bb5>] __ib_process_cq+0x94/0x100 [ib_core]
    [<00000000cbcb60cb>] ib_cq_poll_work+0x32/0xc0 [ib_core]
    [<000000002b53eaa1>] process_one_work+0x4bc/0x980
    [<00000000da3499fb>] worker_thread+0x78/0x5c0
    [<00000000167127a4>] kthread+0x191/0x1e0
    [<0000000060802104>] ret_from_fork+0x3a/0x50
unreferenced object 0xffff88806d6bb100 (size 256):
  comm "kworker/4:1H", pid 131, jiffies 4295062972 (age 1269.720s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 00 59 4d 86 ff ff ff ff  .........YM.....
  backtrace:
    [<00000000a18a11e4>] device_add+0x74d/0xa00
    [<00000000a915b95f>] rtrs_srv_create_sess_files.cold+0x49/0x1fe [rtrs_server]
    [<0000000075d87bd7>] rtrs_srv_info_req_done+0x71b/0x960 [rtrs_server]
    [<00000000ccdf1bb5>] __ib_process_cq+0x94/0x100 [ib_core]
    [<00000000cbcb60cb>] ib_cq_poll_work+0x32/0xc0 [ib_core]
    [<000000002b53eaa1>] process_one_work+0x4bc/0x980
    [<00000000da3499fb>] worker_thread+0x78/0x5c0
    [<00000000167127a4>] kthread+0x191/0x1e0
    [<0000000060802104>] ret_from_fork+0x3a/0x50

The problem is we increase device refcount by get_device in process_info_req
for each path, but only does put_deice for last path, which lead to
memory leak.

To fix it, it also calls put_device when dev_ref is not 0.

Fixes: e2853c49477d1 ("RDMA/rtrs-srv-sysfs: fix missing put_device")
Link: https://lore.kernel.org/r/20210528113018.52290-19-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
2371c40354 RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object
When closing a session, currently the rtrs_srv_stats object in the
closing session is freed by kobject release. But if it failed
to create a session by various reasons, it must free the rtrs_srv_stats
object directly because kobject is not created yet.

This problem is found by kmemleak as below:

1. One client machine maps /dev/nullb0 with session name 'bla':
root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device

2. Another machine failed to create a session with the same name 'bla':
root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device
-bash: echo: write error: Connection reset by peer

3. The kmemleak on server machine reported an error:
unreferenced object 0xffff888033cdc800 (size 128):
  comm "kworker/2:1", pid 83, jiffies 4295086585 (age 2508.680s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000a72903b2>] __alloc_sess+0x1d4/0x1250 [rtrs_server]
    [<00000000d1e5321e>] rtrs_srv_rdma_cm_handler+0xc31/0xde0 [rtrs_server]
    [<00000000bb2f6e7e>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm]
    [<00000000e896235d>] cm_process_work+0x2d/0x100 [ib_cm]
    [<00000000b6866c5f>] cm_req_handler+0x11bc/0x1c40 [ib_cm]
    [<000000005f5dd9aa>] cm_work_handler+0xe65/0x3cf2 [ib_cm]
    [<00000000610151e7>] process_one_work+0x4bc/0x980
    [<00000000541e0f77>] worker_thread+0x78/0x5c0
    [<00000000423898ca>] kthread+0x191/0x1e0
    [<000000005a24b239>] ret_from_fork+0x3a/0x50

Fixes: 39c2d639ca183 ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init")
Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
07c1402729 RDMA/rtrs-srv: Duplicated session name is not allowed
If two clients try to use the same session name, rtrs-server generates a
kernel error that it failed to create the sysfs because the filename
is duplicated.

This patch adds code to check if there already exists the same session
name with the different UUID. If a client tries to add more session,
it sends the UUID and the session name. Therefore it is ok if there is
already same session name with the same UUID. The rtrs-server must fail
only-if there is the same session name with the different UUID.

Link: https://lore.kernel.org/r/20210528113018.52290-17-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
64bce1ee97 RDMA/rtrs: Do not reset hb_missed_max after re-connection
When re-connecting, it resets hb_missed_max to 0.
Before the first re-connecting, client will trigger re-connection
when it gets hb-ack more than 5 times. But after the first
re-connecting, clients will do re-connection whenever it does
not get hb-ack because hb_missed_max is 0.

There is no need to reset hb_missed_max when re-connecting.
hb_missed_max should be kept until closing the session.

Fixes: c0894b3ea69d3 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20210528113018.52290-16-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Jack Wang
78df092c38 RDMA/rtrs-srv: convert scnprintf to sysfs_emit
Link: https://lore.kernel.org/r/20210528113018.52290-15-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Md Haris Iqbal
0cdfb3b207 RDMA/rtrs-srv: Replace atomic_t with percpu_ref for ids_inflight
ids_inflight is used to track the inflight IOs. But the use of atomic_t
variable can cause performance drops and can also become a performance
bottleneck.

This commit replaces the use of atomic_t with a percpu_ref structure. The
advantage it offers is, it doesn't check if the reference has fallen to 0,
until the user explicitly signals it to; and that is done by the
percpu_ref_kill() function call. After that, the percpu_ref structure
behaves like an atomic_t and for every put call, checks whether the
reference has fallen to 0 or not.

rtrs_srv_stats_rdma_to_str shows the count of ids_inflight as 0
for user-mode tools not to be confused.

Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-14-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Md Haris Iqbal
41db63a7ef RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
When get_next_path_min_inflight is called to select the next path, it
iterates over the list of available rtrs_clt_sess (paths). It then reads
the number of inflight IOs for that path to select one which has the least
inflight IO.

But it may so happen that rtrs_clt_sess (path) is no longer in the
connected state because closing or error recovery paths can change the status
of the rtrs_clt_Sess.

For example, the client sent the heart-beat and did not get the
response, it would change the session status and stop IO processing.
The added checking of this patch can prevent accessing the broken path
and generating duplicated error messages.

It is ok if the status is changed after checking the status because
the error recovery path does not free memory and only tries to
reconnection. And also it is ok if the session is closed after checking
the status because closing the session changes the session status and
flush all IO beforing free memory. If the session is being accessed for
IO processing, the closing session will wait.

Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-13-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
7a2e0888b0 RDMA/rtrs-clt: Remove redundant 'break'
It is duplicated with the very next line

Link: https://lore.kernel.org/r/20210528113018.52290-12-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
0aedfb695f RDMA/rtrs-srv: Kill __rtrs_srv_change_state
No need since the only user is rtrs_srv_change_state.

Link: https://lore.kernel.org/r/20210528113018.52290-11-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
b0c633c482 RDMA/rtrs-clt: Kill rtrs_clt_disconnect_from_sysfs
The function is just a wrapper of rtrs_clt_close_conns, let's call
rtrs_clt_close_conns directly.

Link: https://lore.kernel.org/r/20210528113018.52290-10-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
5e82ac7c00 RDMA/rtrs-clt: Kill rtrs_clt_{start,stop}_hb
The two wrappers are not needed since we can call rtrs_{start,stop}_hb
directly.

Link: https://lore.kernel.org/r/20210528113018.52290-9-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Dima Stepanov
2d612f0d3d RDMA/rtrs: Use strscpy instead of strlcpy
During checkpatch analyzing the following warning message was found:
  WARNING:STRLCPY: Prefer strscpy over strlcpy - see:
  https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Fix it by using strscpy calls instead of strlcpy.

Link: https://lore.kernel.org/r/20210528113018.52290-8-jinpu.wang@ionos.com
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
3f3d0eabc1 RDMA/rtrs: Define MIN_CHUNK_SIZE
Define MIN_CHUNK_SIZE to replace the hard-coding number.
We need 4k for metadata, so MIN_CHUNK_SIZE should be at least 8k.

Link: https://lore.kernel.org/r/20210528113018.52290-7-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00