Commit Graph

948933 Commits

Author SHA1 Message Date
Bob Pearson
5f9e2822d1 RDMA/rxe: Fix style warnings
Fixed several minor checkpatch warnings in existing rxe source.

Link: https://lore.kernel.org/r/20200820224638.3212-3-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:51:08 -03:00
Lang Cheng
e0ef0f68c4 RDMA/hns: Add a check for current state before modifying QP
It should be considered an illegal operation if the ULP attempts to modify
a QP from another state to the current hardware state. Otherwise, the ULP
can modify some fields of QPC at any time. For example, for a QP in state
of RTS, modify it from RTR to RTS can change the PSN, which is always not
as expected.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1598353674-24270-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:46:07 -03:00
Kamal Heib
b9caebb290 RDMA/usnic: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core, this callback
can be removed from the usnic provider. The libfabric userspace never
touches the pkey.

Link: https://lore.kernel.org/r/20200820125346.111902-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:02:42 -03:00
Jason Gunthorpe
657360d6c7 RDMA/ucma: Remove closing and the close_wq
Use cancel_work_sync() to ensure that the wq is not running and simply
assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the
close_wq since flush_workqueue() is no longer needed.

Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
a1d33b70db RDMA/ucma: Rework how new connections are passed through event delivery
When a new connection is established the RDMA CM creates a new cm_id and
passes it through to the event handler. However inside the UCMA the new ID
is not assigned a ucma_context until the user retrieves the event from a
syscall.

This creates a weird edge condition where a cm_id's context can continue
to point at the listening_id that created it, and a number of additional
edge conditions on event list clean up related to destroying half created
IDs.

There is also a race condition in ucma_get_events() where the
cm_id->context is being assigned without holding the handler_mutex.

Simplify all of this by creating the ucma_context inside the event handler
itself and eliminating the edge case of a half created cm_id. All cm_id's
can be uniformly destroyed via __destroy_id() or via the close_work.

Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
310ca1a7dc RDMA/ucma: Narrow file->mut in ucma_event_handler()
Since the backlog is now an atomic the file->mut is now only protecting
the event_list and ctx_list. Narrow its scope to make it clear

Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
26c15dec49 RDMA/ucma: Change backlog into an atomic
There is no reason to grab the file->mut just to do this inc/dec work. Use
an atomic.

Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
38e03d0926 RDMA/ucma: Add missing locking around rdma_leave_multicast()
All entry points to the rdma_cm from a ULP must be single threaded,
even this error unwinds. Add the missing locking.

Fixes: 7c11910783 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer")
Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
98837c6c3d RDMA/ucma: Fix locking for ctx->events_reported
This value is locked under the file->mut, ensure it is held whenever
touching it.

The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is
already not possible for the write side to run, the movement is just for
clarity.

Fixes: 88314e4dda ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
09e328e47a RDMA/ucma: Fix the locking of ctx->file
ctx->file is changed under the file->mut lock by ucma_migrate_id(), which
is impossible to lock correctly. Instead change ctx->file under the
handler_lock and ctx_table lock and revise all places touching ctx->file
to use this locking when reading ctx->file.

Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
308571debc RDMA/ucma: Do not use file->mut to lock destroying
The only reader of destroying is inside a handler under the handler_mutex,
so directly use the handler_mutex when setting it instead of the larger
file->mut.

As the refcount could be zero here, and the cm_id already freed, and
additional refcount grab around the locking is required to touch the
cm_id.

Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
d114c6feed RDMA/cma: Add missing locking to rdma_accept()
In almost all cases rdma_accept() is called under the handler_mutex by
ULPs from their handler callbacks. The one exception was ucma which did
not get the handler_mutex.

To improve the understand-ability of the locking scheme obtain the mutex
for ucma as well.

This improves how ucma works by allowing it to directly use handler_mutex
for some of its internal locking against the handler callbacks intead of
the global file->mut lock.

There does not seem to be a serious bug here, other than a DISCONNECT event
can be delivered concurrently with accept succeeding.

Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
95fe51096b RDMA/ucma: Remove mc_list and rely on xarray
It is not really necessary to keep a linked list of mcs associated with
each context when we can just scan the xarray to find the right things.

The removes another overloading of file->mut by relying on the xarray
locking for mc instead.

Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
620db1a118 RDMA/ucma: Fix error cases around ucma_alloc_ctx()
The store to ctx->cm_id was based on the idea that _ucma_find_context()
would not return the ctx until it was fully setup.

Without locking this doesn't work properly.

Split things so that the xarray is allocated with NULL to reserve the ID
and once everything is final set the cm_id and store.

Along the way this shows that the error unwind in ucma_get_event() if a
new ctx is created is wrong, fix it up.

Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
c07e12d8e9 RDMA/ucma: Consolidate the two destroy flows
ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate
this duplicated code into a function.

Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
07e266a775 RDMA/ucma: Remove unnecessary locking of file->ctx_list in close
During the file_operations release function it is already not possible
that write() can be running concurrently, remove the extra locking
around the ctx_list.

Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:13 -03:00
Jason Gunthorpe
ca2968c1ef RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()
Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a
wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming
that the refcount can only go to zero from ucma_destroy_id() which also
removes it from the xarray.

Use refcount_inc_not_zero() instead.

Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:13 -03:00
Mark Zhang
7c4b1ab9f1 IB/mlx5: Add DCT RoCE LAG support
When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Mark Zhang
8f3243a047 IB/mlx5: Add tx_affinity support for DCI QP
DCI QP supports tx_affinity as well.

Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Chuck Lever
8dc105befe RDMA/cm: Add tracepoints to track MAD send operations
Surface the operation of MAD exchanges during connection
establishment. Some samples:

[root@klimt ~]# trace-cmd report -F ib_cma
cpus=4
     kworker/0:4-123   [000]    60.677388: icm_send_rep:         local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT
   kworker/u8:11-391   [002]    60.678808: icm_send_req:         local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT
     kworker/0:4-123   [000]    60.679652: icm_send_rtu:         local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT
            nfsd-1954  [001]    60.691350: icm_send_rep:         local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT
            nfsd-1954  [003]    62.017931: icm_send_drep:        local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Chuck Lever
75874b3d50 RDMA/cm: Replace pr_debug() call sites with tracepoints
In the interest of converging on a common instrumentation infrastructure,
modernize the pr_debug() call sites added by commit 119bf81793 ("IB/cm:
Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma"
subsystem.

The conversion is somewhat mechanical. Someone more familiar with the
semantics of the recorded information might suggest additional data
capture.

Some benefits include:

- Tracepoints enable "always on" reporting of these errors
- The error records are structured and compact
- Tracepoints provide hooks for eBPF scripts

Sample output:

            nfsd-1954  [003]    62.017901: icm_dreq_skipped:     local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Chuck Lever
b3d03daa7c RDMA/core: Move the rdma_show_ib_cm_event() macro
Refactor: Make it globally available in the utilities header.

Link: https://lore.kernel.org/r/159767239131.2968.9520990257041764685.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 16:01:47 -03:00
Gal Pressman
8d9290a4a8 RDMA/efa: Remove redundant udata check from alloc ucontext response
The alloc ucontext flow is always called with a valid udata, there's no
need to test whether it's NULL.

While at it, the 'udata->outlen' check is removed as well as we copy the
minimum between the size of the response and outlen, so in case of zero
outlen, zero bytes will be copied.

Link: https://lore.kernel.org/r/20200818110835.54299-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:45:54 -03:00
Kamal Heib
62cbff3267 RDMA/vmw_pvrdma: Fix kernel-doc documentation
Fix the kernel-doc documentation by matching between the functions
definitions and documentation.

Link: https://lore.kernel.org/r/20200820123512.105193-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:34:18 -03:00
Mohammad Heib
fd49ddaf7e RDMA/rxe: prevent rxe creation on top of vlan interface
Creating rxe device on top of vlan interface will create a non-functional
device that has an empty gids table and can't be used for rdma cm
communication.

This is caused by the logic in
enum_all_gids_of_dev_cb()/is_eth_port_of_netdev(), which only considers
networks connected to "upper devices" of the configured network device,
resulting in an empty set of gids for a vlan interface, and attempts to
connect via this rdma device fail in cm_init_av_for_response because no
gids can be resolved.

Apparently, this behavior was implemented to fit the HW-RoCE devices that
create RoCE device per port, therefore RXE must behave the same like
HW-RoCE devices and create rxe device per real device only.

In order to communicate via a vlan interface, the user must use the gid
index of the vlan address instead of creating rxe over vlan.

Link: https://lore.kernel.org/r/20200811150415.3693-1-goody698@gmail.com
Signed-off-by: Mohammad Heib <goody698@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:12:18 -03:00
Håkon Bugge
785167a114 IB/mlx4: Adjust delayed work when a dup is observed
When scheduling delayed work to clean up the cache, if the entry already
has been scheduled for deletion, we adjust the delay.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-7-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
227a0e142e IB/mlx4: Add support for REJ due to timeout
A CM REJ packet with its reason equal to timeout is a special beast in the
sense that it doesn't have a Remote Communication ID nor does it have a
Remote Port GID.

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

This proxying doesn't not handle said REJ packets.

If the active side abandons its connection attempt after having sent a
REQ, it will send a REJ with the reason being timeout. This example can be
provoked by a simple user-verbs program, which ends up doing:

    rdma_connect(cm_id, &conn_param);
    rdma_destroy_id(cm_id);

using the async librdmacm API.

Having dynamic debug prints enabled in the mlx4_ib driver, we will then
see:

mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12

The solution is to introduce a radix-tree. When a REQ packet is received
and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's
para-virtual cm_id and the destination slave. We then insert an entry into
the tree with said information. We also schedule work to remove this entry
from the tree and free it, in order to avoid memory leak.

When a REJ packet with reason timeout is received, we can look up the
slave in the tree, and deliver the packet to the correct slave.

When a duplicate REQ packet is received, the entry is in the tree. In this
case, we adjust the delayed work in order to avoid a too premature
eviction of the entry.

When cleaning up, we simply traverse the tree and modify any delayed work
to use a zero delay. A subsequent flush of the system_wq will ensure all
entries being wiped out.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
7fd1507df7 IB/mlx4: Fix starvation in paravirt mux/demux
The mlx4 driver will proxy MAD packets through the PF driver. A VM or an
instantiated VF will send its MAD packets to the PF driver using
loop-back. The PF driver will be informed by an interrupt, but defer the
handling and polling of CQEs to a worker thread running on an ordered
work-queue.

Consider the following scenario: the VMs will in short proximity in time,
for example due to a network event, send many MAD packets to the PF
driver. Lets say there are K VMs, each sending N packets.

The interrupt from the first VM will start the worker thread, which will
poll N CQEs. A common case here is where the PF driver will multiplex the
packets received from the VMs out on the wire QP.

But before the wire QP has returned a send CQE and associated interrupt,
the other K - 1 VMs have sent their N packets as well.

The PF driver has to multiplex K * N packets out on the wire QP. But the
send-queue on the wire QP has a finite capacity.

So, in this scenario, if K * N is larger than the send-queue capacity of
the wire QP, we will get MAD packets dropped on the floor with this
dynamic debug message:

mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)

and this despite the fact that the wire send-queue could have capacity,
but the PF driver isn't aware, because the wire send CQEs have not yet
been polled.

We can also have a similar scenario inbound, with a wire recv-queue larger
than the tunnel QP's send-queue. If many remote peers send MAD packets to
the very same VM, the tunnel send-queue destined to the VM could allegedly
be construed to be full by the PF driver.

This starvation is fixed by introducing separate work queues for the wire
QPs vs. the tunnel QPs.

With this fix, using a dual ported HCA, 8 VFs instantiated, we could run
cmtime on each of the 18 interfaces towards a similar configured peer,
each cmtime instance with 800 QPs (all in all 14400 QPs) without a single
CM packet getting lost.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
0ae207fb91 IB/mlx4: Separate tunnel and wire bufs parameters
Using CX-3 in virtualized mode, MAD packets are proxied through the PF
driver. The feed is N tunnel QPs, and what is received from the VFs is
multiplexed out on the wire QP. Since this is a many-to-one scenario, it
is better to have separate initialization parameters for the two usages.

The number of wire and tunnel bufs are yanked up to 2K and 512
respectively. With this set of parameters, a system consisting of eight
physical servers, each with eight VMs and 14 I/O servers (BM), can run
switch fail-over without seeing:

mlx4_ib_demux_mad: failed sending GSI to slave 3 via tunnel qp (-11)

or

mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-4-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Håkon Bugge
e7d087fce6 IB/mlx4: Add support for MRA
Using CX-3 in virtualized mode, MAD packets are proxied through the PF
driver. However, the handling lacks support of the MRA (Message Receipt
Acknowledgment) packet. When having dynamic debug enabled, we see tons of:

mlx4_ib_multiplex_cm_handler: id{slave: 7, sl_cm_id: 0x8fcb45a0} is NULL! attr_id: 0x11

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-3-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Håkon Bugge
094619449a IB/mlx4: Add and improve logging
Add missing check for success after call to mlx4_ib_send_to_wire() in
mlx4_ib_multiplex_mad().

Amended the existing pr_debug() in mlx4_ib_multiplex_cm_handler() and
mlx4_ib_demux_cm_handler() with attr_id during a lookup failure.

Removed two noisy pr_debug() in mad.c

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-2-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Jason Gunthorpe
c0f4979e90 RDMA/cm: Remove unused cm_class
Previous commits removed all references to the /sys/class/infiniband_cm/
directory represented by the cm_class symbol. Remove the directory and
cm_class.

Fixes: a1a8e4a85c ("rdma: Delete the ib_ucm module")
Link: https://lore.kernel.org/r/0-v1-90096a98c476+205-remove_cm_leftovers_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:43:07 -03:00
Max Gurtovoy
c97119b6d3 IB/isert: remove duplicated error prints
The isert_post_recv function prints an error in case of failures, so no
need for the callers to add another print.

Link: https://lore.kernel.org/r/20200805121231.166162-2-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:22:05 -03:00
Maor Gottlieb
e6ac9f6006 RDMA/mlx5: Enable sniffer when device is in switchdev mode
In order to allow sniffer when the RDMA device is in switchdev mode, we
don't need to set the source port when creating the sniffer rule.

Link: https://lore.kernel.org/r/20200803060214.15328-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Mark Zhang
c531024bb1 RDMA/mlx5: Add new IB rates support
Support 56, 25, 100, 200 and 50Gbps IB rates in mlx5 driver.

Link: https://lore.kernel.org/r/20200802081712.1993490-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Gal Pressman
a4e6a1dd57 RDMA/efa: Introduce SRD RNR retry
This patch introduces the ability to configure SRD QPs with the RNR retry
parameter when issuing a modify QP command.

In addition, a capability bit was added to report support to the userspace
library.

Link: https://lore.kernel.org/r/20200731060420.17053-5-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:45 -03:00
Gal Pressman
22c50e0660 RDMA/efa: Introduce SRD QP state machine
This precursory patch adds the SRD QP type state machine, which is
currently identical to the one of UD QP type.  A following patch is going
to change the SRD QP state machine to support RNR retry modifications.

Link: https://lore.kernel.org/r/20200731060420.17053-4-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:45 -03:00
Gal Pressman
ab67badd1c RDMA/efa: Be consistent with modify QP bitmask
The modify QP bitmask was not consistent with other bitmasks used in the
device interface. Remove the bitmask enum and allow usage with
EFA_GET/SET.

Link: https://lore.kernel.org/r/20200731060420.17053-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:44 -03:00
Gal Pressman
34eb009ffe RDMA/efa: Add a generic capability check helper
Instead of adding a new function for each capability added, introduce a
generic helper to query device capabilities.

Link: https://lore.kernel.org/r/20200731060420.17053-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:44 -03:00
Leon Romanovsky
d6673746d6 RDMA: Remove constant domain argument from flow creation call
The "domain" argument is constant and modern device (mlx5) doesn't support
anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and
simplify code.

Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
70c1430fba RDMA/mlx5: Replace open-coded offsetofend() macro
Clean mlx5_ib from open-coded implementations of offsetofend().

Link: https://lore.kernel.org/r/20200730081235.1581127-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
156f378985 RDMA/mlx5: Simplify multiple else-if cases with switch keyword
Improve readability of fs.c by converting multiple else-if constructions
to be implemented with switch keyword.

Link: https://lore.kernel.org/r/20200730081235.1581127-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Linus Torvalds
9123e3a74e Linux 5.9-rc1 2020-08-16 13:04:57 -07:00
Linus Torvalds
2cc3c4b3c2 io_uring-5.9-2020-08-15
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl84aLkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgps72D/96/HCiUArhxmRltiwqB9KjemEPaY7nWDkz
 0hrUtTCr/MjqFIzgPwx6pIjdiSP4GbmcMCrBO67E+mLbOdO1hKte2ElysRAsTlLE
 fGrcdrs3is5+QK8aqJJks74NzM7XG6jfrR8ewV9cz6aJRWbgRjCWvSxx/03Iha2B
 t897xuwJg4K30Z83IxPfnu+Xp0dIfmFLUuXQApsZ3bNTuQW3sR4CC/v418+1Wmk4
 kXGbQtxcEsrhCy9OxnNyU6GPEJ3b99ANPbRE8OUNQwaHiejvMOCWpcmaoT6TwKeU
 aku+XcVyoMjxJQk2k0uzr8Ecj5G1FJv4fUHhTZBxcGqkrxhkTjQ520HtrqPlc7uV
 BjyXutZ8yjmeCbvGrsXu8f8ktjHHkntkRDA8hgzW1OpmWYuWKF/2OIjNpmcmtvbj
 XqwBDEKdQW9X4dHoQKsVExtzeT6nNP0dxaeZX8OeB2GGkitP7rCm8k/SOuDPTCLi
 MX/qWpERo4hRfCLjY+4nezxkFMLIF7Jej3tzwuVRshFYsRVQzTPQpbnkmkuwibhi
 ObEwVI+lLkbatnR2wmJwoVKcywH13U68VNJXACyw1GZnPlp2lYylT/MV80y/iELE
 mj4zDklqwfIrnoHEuCkERwgXrYsffhUrFvajmAMnJncUFOI4khYJ+dWBVVlBkyn0
 e7UK1Sd7Jw==
 =T+fx
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few differerent things in here.

  Seems like syzbot got some more io_uring bits wired up, and we got a
  handful of reports and the associated fixes are in here.

  General fixes too, and a lot of them marked for stable.

  Lastly, a bit of fallout from the async buffered reads, where we now
  more easily trigger short reads. Some applications don't really like
  that, so the io_read() code now handles short reads internally, and
  got a cleanup along the way so that it's now easier to read (and
  documented). We're now passing tests that failed before"

* tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
  io_uring: short circuit -EAGAIN for blocking read attempt
  io_uring: sanitize double poll handling
  io_uring: internally retry short reads
  io_uring: retain iov_iter state over io_read/io_write calls
  task_work: only grab task signal lock when needed
  io_uring: enable lookup of links holding inflight files
  io_uring: fail poll arm on queue proc failure
  io_uring: hold 'ctx' reference around task_work queue + execute
  fs: RWF_NOWAIT should imply IOCB_NOIO
  io_uring: defer file table grabbing request cleanup for locked requests
  io_uring: add missing REQ_F_COMP_LOCKED for nested requests
  io_uring: fix recursive completion locking on oveflow flush
  io_uring: use TWA_SIGNAL for task_work uncondtionally
  io_uring: account locked memory before potential error case
  io_uring: set ctx sq/cq entry count earlier
  io_uring: Fix NULL pointer dereference in loop_rw_iter()
  io_uring: add comments on how the async buffered read retry works
  io_uring: io_async_buf_func() need not test page bit
2020-08-16 10:55:12 -07:00
Mike Rapoport
6f6aea7e96 parisc: fix PMD pages allocation by restoring pmd_alloc_one()
Commit 1355c31eeb ("asm-generic: pgalloc: provide generic pmd_alloc_one()
and pmd_free_one()") converted parisc to use generic version of
pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for
PMD.

Restore the original version of pmd_alloc_one() for parisc, just use
GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and
memset.

Fixes: 1355c31eeb ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()")
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.ee
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-16 10:53:13 -07:00
Linus Torvalds
4b6c093e21 block-5.9-2020-08-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl83DGUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplTYEACs5kPPpSVdzVsOeHj0LkfQXiSlli8dbPBo
 NB2+TIyr9NxJgxn8B4x+5/4DZgJaCoHOeyyzOocXQmvWGwWOTkrfX/OSyQOlRB5z
 dpzqF0Huhw31MSQEiwA/8lo3omBmat9cMzMa5PJYPghMGfqyQDzVJk1lIX51a1th
 oE01eBpNNsDK0OTwKrl6Rx2/OuFZnA0P3lQwgPZSLnDM6Hq+xeHTdx2LNSyE2QFv
 GzYl4dFoXg3NReLv9D57b7hE6Dc95NcCDDeU7Y3cE7XPksKMA/TkVYOD20ysJ31l
 9uzscvvcm2UugN2r0d/B35lf6NWmOG24SmkLMKTtExPGHOCQIbDAlSP/QQ4zz9pQ
 2yA+eImpQnRsCzPbGcnBzwEF3yX5+lQYmFWac+0AHDiWEWkb8e3MzNSWPZrsN+cD
 7U7c5Zw6zDEtl/naJccuZZPgQGbZgFJ/P6Wo6l5ywIPtE7wzv4MUe4eUxZhitL9M
 0ZP6WIQd8oNQdNoCYVQDwPdYJYMq7uUQFUo40vaSfntZxVKZQao7cvUHwmzVzNlZ
 v5UazETAx+4Eg6MNwfjKp+kt3rr6Xul7K9Nzn6R/cVacIU349FovUshm7WieoAUu
 niZ40gXltxj7NDwHj3p/dqesW5Nhv/qk6hlVWoi9vdmh8vAVBy/fedQfocvKrFJy
 prCI1h1UOQ==
 =10Pr
 -----END PGP SIGNATURE-----

Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes on the block side of things:

   - Discard granularity fix (Coly)

   - rnbd cleanups (Guoqing)

   - md error handling fix (Dan)

   - md sysfs fix (Junxiao)

   - Fix flush request accounting, which caused an IO slowdown for some
     configurations (Ming)

   - Properly propagate loop flag for partition scanning (Lennart)"

* tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
  block: fix double account of flush request's driver tag
  loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
  rnbd: no need to set bi_end_io in rnbd_bio_map_kern
  rnbd: remove rnbd_dev_submit_io
  md-cluster: Fix potential error pointer dereference in resize_bitmaps()
  block: check queue's limits.discard_granularity in __blkdev_issue_discard()
  md: get sysfs entry after redundancy attr group create
2020-08-15 20:36:42 -07:00
Linus Torvalds
d84835b118 A RISC-V Fix for 5.9
I collected a single fix during the merge window: we managed to break the early
 trap setup on !MMU, this patch fixes it.
 
 The power keeps going on here so I haven't have a chance to give this the
 testing I usually would, but I don't have a Kendryte anyway so I doubt I'd pick
 up anything subtle even if I was to test.  The patch seems pretty safe and it's
 still early, so I don't see any reason to let it sit around.
 
 It's fairly late so if it misses the merge window that's not a big deal.  I'll
 definately have stuff for next week, so I'll just start from whenever this
 lands.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl84Y58THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiVs8D/9blga759Z5w71l3wQ7htZnYuw2bJPw
 9vp+TgRBsdBFpVvVR/81E3deLnEk0WaYt+kg3b1SJRcYCeXpfOGofrLx/u1/pFM4
 Tpl2CTQOTumPBYEQLzUEIBxTNbusHDR4s0H3FVzJ6mwhGQk8YAaeD0Vs7CjxASoq
 ZI4JrHBo042vFV85gLqVV9q2oGjRC/kqqep2Utq7HRL2hov/A9IU1A5V7FylPn9+
 CysVEicue3IalHX3ZuTNctzHGnrnyO33gnbihrxh1c7MvVQdeI4BAFpwPR/hTMwa
 cg3DFnQQ5TYT8BkYssNrikgkzGuPzuYMpntP9CrR4zpWu8hEZJv4h0c+hu3OcLoJ
 zDwAuQu9m/iZ98LPev1IMXrComlUb5jUJgRR9HAazUQhsVlj1jXPmFiyHtKsFJ3m
 zfGyrUklTAm2XRhfbM3kiW9c+qrnc/L9qZaLp4JxgPLj8wBYJBOH/XpaasLbR+Tz
 8XNtJ8yfCOHY8srQI752v1vFcF7V4mFKPUu8XJZuZbs/IX+8dH0b5ki6vdPZ5oiS
 sOlDnjTLY3s7uM8Lh2KE/DKMKyJ160F80pzr6QfD45h5qINOZp+9TntA+KE2hvZ8
 1yJd0R27Pc6HOwgv95q9W93St2pA/oC24kKr5OLHjjf5I3Q4vZiLet3sIw1SFViF
 u/+W+oKP1Lo6qQ==
 =+8KT
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:
 "I collected a single fix during the merge window: we managed to break
  the early trap setup on !MMU, this fixes it"

* tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Setup exception vector for nommu platform
2020-08-15 18:54:42 -07:00
Linus Torvalds
5bbec3cfe3 Cleanup, SECCOMP_FILTER support, message printing fixes, and other
changes to arch/sh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJfODUiAAoJELcQ+SIFb8Hau0wH/iPeZyv0EhIwL41OPrWhm5wb
 26MNWPvPjYIpKVpr0HMXiffILv595ntvrH0Ujnh1+e8J2kRj0eT+T91UkoyGSfav
 oWmjgcG3NRK6p9882Oo8Xavjr1cTTclOmmDInR4lpAcfIBXkeq2eX0R1h2IuGdNM
 idGlXhJMkgV+xTlgZy7pYmw5pvFMqL5j7fAUQxm0UoY9kbu8Ac4bOR5WrqtFpkjt
 xTh9141YvSSfpRx9uMzrQLuUYGzGePhnjUGSUf/b1deYG/33lNtzhHr+QMK6BpXr
 zdhFalJP40+m+2tG0nCBpAIZcWiOLGb23in5n/trFx3BGZfUf5EKnhZEGUYeE7Q=
 =XWDn
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker:
 "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
  changes to arch/sh"

* tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
  sh: landisk: Add missing initialization of sh_io_port_base
  sh: bring syscall_set_return_value in line with other architectures
  sh: Add SECCOMP_FILTER
  sh: Rearrange blocks in entry-common.S
  sh: switch to copy_thread_tls()
  sh: use the generic dma coherent remap allocator
  sh: don't allow non-coherent DMA for NOMMU
  dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
  sh: unexport register_trapped_io and match_trapped_io_handler
  sh: don't include <asm/io_trapped.h> in <asm/io.h>
  sh: move the ioremap implementation out of line
  sh: move ioremap_fixed details out of <asm/io.h>
  sh: remove __KERNEL__ ifdefs from non-UAPI headers
  sh: sort the selects for SUPERH alphabetically
  sh: remove -Werror from Makefiles
  sh: Replace HTTP links with HTTPS ones
  arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
  sh: stacktrace: Remove stacktrace_ops.stack()
  sh: machvec: Modernize printing of kernel messages
  sh: pci: Modernize printing of kernel messages
  ...
2020-08-15 18:50:32 -07:00
Jens Axboe
f91daf565b io_uring: short circuit -EAGAIN for blocking read attempt
One case was missed in the short IO retry handling, and that's hitting
-EAGAIN on a blocking attempt read (eg from io-wq context). This is a
problem on sockets that are marked as non-blocking when created, they
don't carry any REQ_F_NOWAIT information to help us terminate them
instead of perpetually retrying.

Fixes: 227c0c9673 ("io_uring: internally retry short reads")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15 15:58:42 -07:00
Jens Axboe
d4e7cd36a9 io_uring: sanitize double poll handling
There's a bit of confusion on the matching pairs of poll vs double poll,
depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
poll driven retry.

Add io_poll_get_double() that returns the double poll waitqueue, if any,
and io_poll_get_single() that returns the original poll waitqueue. With
that, remove the argument to io_poll_remove_double().

Finally ensure that wait->private is cleared once the double poll handler
has run, so that remove knows it's already been seen.

Cc: stable@vger.kernel.org # v5.8
Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
Fixes: 18bceab101 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15 11:48:18 -07:00