License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
// SPDX-License-Identifier: GPL-2.0
2015-03-30 14:34:21 -04:00
/*
2017-12-14 20:57:47 -05:00
* Copyright ( c ) 2015 , 2017 Oracle . All rights reserved .
2015-03-30 14:34:21 -04:00
* Copyright ( c ) 2003 - 2007 Network Appliance , Inc . All rights reserved .
*/
/* Lightweight memory registration using Fast Registration Work
2017-12-14 20:57:47 -05:00
* Requests ( FRWR ) .
2015-03-30 14:34:21 -04:00
*
2019-08-19 18:37:52 -04:00
* FRWR features ordered asynchronous registration and invalidation
* of arbitrarily - sized memory regions . This is the fastest and safest
2015-03-30 14:34:21 -04:00
* but most complex memory registration mode .
*/
2015-05-26 11:52:35 -04:00
/* Normal operation
*
2019-08-19 18:37:52 -04:00
* A Memory Region is prepared for RDMA Read or Write using a FAST_REG
2018-12-19 10:59:01 -05:00
* Work Request ( frwr_map ) . When the RDMA operation is finished , this
2015-05-26 11:52:35 -04:00
* Memory Region is invalidated using a LOCAL_INV Work Request
2019-08-19 18:37:52 -04:00
* ( frwr_unmap_async and frwr_unmap_sync ) .
2015-05-26 11:52:35 -04:00
*
2019-08-19 18:37:52 -04:00
* Typically FAST_REG Work Requests are not signaled , and neither are
* RDMA Send Work Requests ( with the exception of signaling occasionally
* to prevent provider work queue overflows ) . This greatly reduces HCA
2015-05-26 11:52:35 -04:00
* interrupt workload .
*/
/* Transport recovery
*
2019-08-19 18:37:52 -04:00
* frwr_map and frwr_unmap_ * cannot run at the same time the transport
* connect worker is running . The connect worker holds the transport
* send lock , just as - > send_request does . This prevents frwr_map and
* the connect worker from running concurrently . When a connection is
* closed , the Receive completion queue is drained before the allowing
* the connect worker to get control . This prevents frwr_unmap and the
* connect worker from running concurrently .
*
* When the underlying transport disconnects , MRs that are in flight
2019-10-09 13:07:48 -04:00
* are flushed and are likely unusable . Thus all MRs are destroyed .
* New MRs are created on demand .
2015-05-26 11:52:35 -04:00
*/
2016-09-15 10:57:16 -04:00
# include <linux/sunrpc/rpc_rdma.h>
2018-05-07 15:27:16 -04:00
# include <linux/sunrpc/svc_rdma.h>
2016-09-15 10:57:16 -04:00
2015-03-30 14:34:21 -04:00
# include "xprt_rdma.h"
2018-05-07 15:27:05 -04:00
# include <trace/events/rpcrdma.h>
2015-03-30 14:34:21 -04:00
# if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
# define RPCDBG_FACILITY RPCDBG_TRANS
# endif
2018-12-19 10:59:01 -05:00
/**
* frwr_release_mr - Destroy one MR
2020-02-21 17:00:17 -05:00
* @ mr : MR allocated by frwr_mr_init
2018-12-19 10:59:01 -05:00
*
*/
void frwr_release_mr ( struct rpcrdma_mr * mr )
2018-10-01 14:25:25 -04:00
{
int rc ;
rc = ib_dereg_mr ( mr - > frwr . fr_mr ) ;
if ( rc )
2018-12-19 11:00:06 -05:00
trace_xprtrdma_frwr_dereg ( mr , rc ) ;
2018-10-01 14:25:25 -04:00
kfree ( mr - > mr_sg ) ;
kfree ( mr ) ;
}
2019-10-17 14:31:09 -04:00
static void frwr_mr_recycle ( struct rpcrdma_mr * mr )
2018-10-01 14:25:25 -04:00
{
2019-10-17 14:31:09 -04:00
struct rpcrdma_xprt * r_xprt = mr - > mr_xprt ;
2018-10-01 14:25:25 -04:00
trace_xprtrdma_mr_recycle ( mr ) ;
2018-12-19 10:58:13 -05:00
if ( mr - > mr_dir ! = DMA_NONE ) {
2018-10-01 14:25:30 -04:00
trace_xprtrdma_mr_unmap ( mr ) ;
2019-04-24 09:40:04 -04:00
ib_dma_unmap_sg ( r_xprt - > rx_ia . ri_id - > device ,
2018-10-01 14:25:25 -04:00
mr - > mr_sg , mr - > mr_nents , mr - > mr_dir ) ;
2018-12-19 10:58:13 -05:00
mr - > mr_dir = DMA_NONE ;
2018-10-01 14:25:25 -04:00
}
2019-08-19 18:47:57 -04:00
spin_lock ( & r_xprt - > rx_buf . rb_lock ) ;
2018-10-01 14:25:25 -04:00
list_del ( & mr - > mr_all ) ;
r_xprt - > rx_stats . mrs_recycled + + ;
2019-08-19 18:47:57 -04:00
spin_unlock ( & r_xprt - > rx_buf . rb_lock ) ;
2018-12-19 10:59:01 -05:00
frwr_release_mr ( mr ) ;
2018-10-01 14:25:25 -04:00
}
2019-06-19 10:33:04 -04:00
/* frwr_reset - Place MRs back on the free list
* @ req : request to reset
*
* Used after a failed marshal . For FRWR , this means the MRs
* don ' t have to be fully released and recreated .
*
* NB : This is safe only as long as none of @ req ' s MRs are
* involved with an ongoing asynchronous FAST_REG or LOCAL_INV
* Work Request .
*/
void frwr_reset ( struct rpcrdma_req * req )
{
2019-08-19 18:44:04 -04:00
struct rpcrdma_mr * mr ;
2019-06-19 10:33:04 -04:00
2019-08-19 18:44:04 -04:00
while ( ( mr = rpcrdma_mr_pop ( & req - > rl_registered ) ) )
2019-08-19 18:44:50 -04:00
rpcrdma_mr_put ( mr ) ;
2019-06-19 10:33:04 -04:00
}
2018-12-19 10:59:01 -05:00
/**
2020-02-21 17:00:17 -05:00
* frwr_mr_init - Initialize one MR
* @ r_xprt : controlling transport instance
2018-12-19 10:59:01 -05:00
* @ mr : generic MR to prepare for FRWR
*
* Returns zero if successful . Otherwise a negative errno
* is returned .
*/
2020-02-21 17:00:17 -05:00
int frwr_mr_init ( struct rpcrdma_xprt * r_xprt , struct rpcrdma_mr * mr )
2016-06-29 13:52:29 -04:00
{
2020-02-21 17:00:17 -05:00
struct rpcrdma_ia * ia = & r_xprt - > rx_ia ;
2017-12-14 20:57:47 -05:00
unsigned int depth = ia - > ri_max_frwr_depth ;
2018-12-19 11:00:48 -05:00
struct scatterlist * sg ;
struct ib_mr * frmr ;
2016-06-29 13:52:29 -04:00
int rc ;
2018-12-19 11:00:48 -05:00
frmr = ib_alloc_mr ( ia - > ri_pd , ia - > ri_mrtype , depth ) ;
if ( IS_ERR ( frmr ) )
2016-06-29 13:52:29 -04:00
goto out_mr_err ;
2019-08-19 18:46:24 -04:00
sg = kcalloc ( depth , sizeof ( * sg ) , GFP_NOFS ) ;
2018-12-19 11:00:48 -05:00
if ( ! sg )
2016-06-29 13:52:29 -04:00
goto out_list_err ;
2020-02-21 17:00:17 -05:00
mr - > mr_xprt = r_xprt ;
2018-12-19 11:00:48 -05:00
mr - > frwr . fr_mr = frmr ;
2018-12-19 10:58:13 -05:00
mr - > mr_dir = DMA_NONE ;
xprtrdma: Fix list corruption / DMAR errors during MR recovery
The ro_release_mr methods check whether mr->mr_list is empty.
Therefore, be sure to always use list_del_init when removing an MR
linked into a list using that field. Otherwise, when recovering from
transport failures or device removal, list corruption can result, or
MRs can get mapped or unmapped an odd number of times, resulting in
IOMMU-related failures.
In general this fix is appropriate back to v4.8. However, code
changes since then make it impossible to apply this patch directly
to stable kernels. The fix would have to be applied by hand or
reworked for kernels earlier than v4.16.
Backport guidance -- there are several cases:
- When creating an MR, initialize mr_list so that using list_empty
on an as-yet-unused MR is safe.
- When an MR is being handled by the remote invalidation path,
ensure that mr_list is reinitialized when it is removed from
rl_registered.
- When an MR is being handled by rpcrdma_destroy_mrs, it is removed
from mr_all, but it may still be on an rl_registered list. In
that case, the MR needs to be removed from that list before being
released.
- Other cases are covered by using list_del_init in rpcrdma_mr_pop.
Fixes: 9d6b04097882 ('xprtrdma: Place registered MWs on a ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-01 11:37:14 -04:00
INIT_LIST_HEAD ( & mr - > mr_list ) ;
2018-12-19 11:00:48 -05:00
init_completion ( & mr - > frwr . fr_linv_done ) ;
sg_init_table ( sg , depth ) ;
mr - > mr_sg = sg ;
2016-06-29 13:52:29 -04:00
return 0 ;
out_mr_err :
2018-12-19 11:00:48 -05:00
rc = PTR_ERR ( frmr ) ;
2018-12-19 11:00:06 -05:00
trace_xprtrdma_frwr_alloc ( mr , rc ) ;
2016-06-29 13:52:29 -04:00
return rc ;
out_list_err :
2018-12-19 11:00:48 -05:00
ib_dereg_mr ( frmr ) ;
return - ENOMEM ;
2016-06-29 13:52:29 -04:00
}
2018-12-19 10:59:01 -05:00
/**
2020-01-03 11:56:48 -05:00
* frwr_query_device - Prepare a transport for use with FRWR
* @ r_xprt : controlling transport instance
* @ device : RDMA device to query
2018-12-19 10:59:01 -05:00
*
* On success , sets :
2020-01-03 11:56:48 -05:00
* ep - > rep_attr
2019-04-24 09:40:25 -04:00
* ep - > rep_max_requests
2020-01-03 11:56:43 -05:00
* ia - > ri_max_rdma_segs
2018-05-04 15:34:48 -04:00
*
* And these FRWR - related fields :
* ia - > ri_max_frwr_depth
* ia - > ri_mrtype
2018-12-19 10:59:01 -05:00
*
2020-01-03 11:56:48 -05:00
* Return values :
* On success , returns zero .
* % - EINVAL - the device does not support FRWR memory registration
* % - ENOMEM - the device is not sufficiently capable for NFS / RDMA
2018-05-04 15:34:48 -04:00
*/
2020-01-03 11:56:48 -05:00
int frwr_query_device ( struct rpcrdma_xprt * r_xprt ,
const struct ib_device * device )
2015-03-30 14:35:26 -04:00
{
2020-01-03 11:56:48 -05:00
const struct ib_device_attr * attrs = & device - > attrs ;
struct rpcrdma_ia * ia = & r_xprt - > rx_ia ;
struct rpcrdma_ep * ep = & r_xprt - > rx_ep ;
2018-05-04 15:34:48 -04:00
int max_qp_wr , depth , delta ;
2020-01-03 11:56:27 -05:00
unsigned int max_sge ;
2020-01-03 11:56:48 -05:00
if ( ! ( attrs - > device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS ) | |
attrs - > max_fast_reg_page_list_len = = 0 ) {
pr_err ( " rpcrdma: 'frwr' mode is not supported by device %s \n " ,
device - > name ) ;
return - EINVAL ;
}
2020-01-03 11:56:27 -05:00
max_sge = min_t ( unsigned int , attrs - > max_send_sge ,
RPCRDMA_MAX_SEND_SGES ) ;
if ( max_sge < RPCRDMA_MIN_SEND_SGES ) {
pr_err ( " rpcrdma: HCA provides only %u send SGEs \n " , max_sge ) ;
return - ENOMEM ;
}
ep - > rep_attr . cap . max_send_sge = max_sge ;
ep - > rep_attr . cap . max_recv_sge = 1 ;
2015-03-30 14:35:26 -04:00
xprtrdma: Support for SG_GAP devices
Some devices (such as the Mellanox CX-4) can register, under a
single R_key, a set of memory regions that are not contiguous. When
this is done, all the segments in a Reply list, say, can then be
invalidated in a single LocalInv Work Request (or via Remote
Invalidation, which can invalidate exactly one R_key when completing
a Receive).
This means a single FastReg WR is used to register, and one or zero
LocalInv WRs can invalidate, the memory involved with RDMA transfers
on behalf of an RPC.
In addition, xprtrdma constructs some Reply chunks from three or
more segments. By registering them with SG_GAP, only one segment
is needed for the Reply chunk, allowing the whole chunk to be
invalidated remotely.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-29 10:52:24 -05:00
ia - > ri_mrtype = IB_MR_TYPE_MEM_REG ;
if ( attrs - > device_cap_flags & IB_DEVICE_SG_GAPS_REG )
ia - > ri_mrtype = IB_MR_TYPE_SG_GAPS ;
2018-12-19 10:58:51 -05:00
/* Quirk: Some devices advertise a large max_fast_reg_page_list_len
* capability , but perform optimally when the MRs are not larger
* than a page .
*/
2020-01-03 11:56:43 -05:00
if ( attrs - > max_sge_rd > RPCRDMA_MAX_HDR_SEGS )
2018-12-19 10:58:51 -05:00
ia - > ri_max_frwr_depth = attrs - > max_sge_rd ;
else
ia - > ri_max_frwr_depth = attrs - > max_fast_reg_page_list_len ;
if ( ia - > ri_max_frwr_depth > RPCRDMA_MAX_DATA_SEGS )
ia - > ri_max_frwr_depth = RPCRDMA_MAX_DATA_SEGS ;
2017-12-14 20:57:47 -05:00
/* Add room for frwr register and invalidate WRs.
* 1. FRWR reg WR for head
* 2. FRWR invalidate WR for head
* 3. N FRWR reg WRs for pagelist
* 4. N FRWR invalidate WRs for pagelist
* 5. FRWR reg WR for tail
* 6. FRWR invalidate WR for tail
2015-03-30 14:35:26 -04:00
* 7. The RDMA_SEND WR
*/
depth = 7 ;
2017-12-14 20:57:47 -05:00
/* Calculate N if the device max FRWR depth is smaller than
2015-03-30 14:35:26 -04:00
* RPCRDMA_MAX_DATA_SEGS .
*/
2017-12-14 20:57:47 -05:00
if ( ia - > ri_max_frwr_depth < RPCRDMA_MAX_DATA_SEGS ) {
delta = RPCRDMA_MAX_DATA_SEGS - ia - > ri_max_frwr_depth ;
2015-03-30 14:35:26 -04:00
do {
2017-12-14 20:57:47 -05:00
depth + = 2 ; /* FRWR reg + invalidate */
delta - = ia - > ri_max_frwr_depth ;
2015-03-30 14:35:26 -04:00
} while ( delta > 0 ) ;
}
2020-01-03 11:56:48 -05:00
max_qp_wr = attrs - > max_qp_wr ;
2018-05-04 15:34:48 -04:00
max_qp_wr - = RPCRDMA_BACKWARD_WRS ;
max_qp_wr - = 1 ;
if ( max_qp_wr < RPCRDMA_MIN_SLOT_TABLE )
return - ENOMEM ;
2019-04-24 09:40:25 -04:00
if ( ep - > rep_max_requests > max_qp_wr )
ep - > rep_max_requests = max_qp_wr ;
ep - > rep_attr . cap . max_send_wr = ep - > rep_max_requests * depth ;
2018-05-04 15:34:48 -04:00
if ( ep - > rep_attr . cap . max_send_wr > max_qp_wr ) {
2019-04-24 09:40:25 -04:00
ep - > rep_max_requests = max_qp_wr / depth ;
if ( ! ep - > rep_max_requests )
2020-01-03 11:56:48 -05:00
return - ENOMEM ;
2019-04-24 09:40:25 -04:00
ep - > rep_attr . cap . max_send_wr = ep - > rep_max_requests * depth ;
2015-03-30 14:35:26 -04:00
}
2018-05-04 15:34:48 -04:00
ep - > rep_attr . cap . max_send_wr + = RPCRDMA_BACKWARD_WRS ;
ep - > rep_attr . cap . max_send_wr + = 1 ; /* for ib_drain_sq */
2019-04-24 09:40:25 -04:00
ep - > rep_attr . cap . max_recv_wr = ep - > rep_max_requests ;
2018-05-04 15:34:48 -04:00
ep - > rep_attr . cap . max_recv_wr + = RPCRDMA_BACKWARD_WRS ;
ep - > rep_attr . cap . max_recv_wr + = 1 ; /* for ib_drain_rq */
2015-03-30 14:35:26 -04:00
2020-01-03 11:56:43 -05:00
ia - > ri_max_rdma_segs =
2019-08-19 18:39:25 -04:00
DIV_ROUND_UP ( RPCRDMA_MAX_DATA_SEGS , ia - > ri_max_frwr_depth ) ;
2018-12-19 10:58:45 -05:00
/* Reply chunks require segments for head and tail buffers */
2020-01-03 11:56:43 -05:00
ia - > ri_max_rdma_segs + = 2 ;
if ( ia - > ri_max_rdma_segs > RPCRDMA_MAX_HDR_SEGS )
ia - > ri_max_rdma_segs = RPCRDMA_MAX_HDR_SEGS ;
/* Ensure the underlying device is capable of conveying the
* largest r / wsize NFS will ask for . This guarantees that
* failing over from one RDMA device to another will not
* break NFS I / O .
*/
if ( ( ia - > ri_max_rdma_segs * ia - > ri_max_frwr_depth ) < RPCRDMA_MAX_SEGS )
return - ENOMEM ;
2015-03-30 14:34:30 -04:00
2020-01-03 11:56:43 -05:00
return 0 ;
2015-03-30 14:34:30 -04:00
}
2018-12-19 10:59:01 -05:00
/**
* frwr_map - Register a memory region
* @ r_xprt : controlling transport
* @ seg : memory region co - ordinates
* @ nsegs : number of segments remaining
* @ writing : true when RDMA Write will be used
2018-12-19 10:59:07 -05:00
* @ xid : XID of RPC using the registered memory
2019-08-19 18:45:37 -04:00
* @ mr : MR to fill in
2018-12-19 10:59:01 -05:00
*
* Prepare a REG_MR Work Request to register a memory region
2015-03-30 14:34:39 -04:00
* for remote access via RDMA READ or RDMA WRITE .
2018-12-19 10:59:01 -05:00
*
* Returns the next segment or a negative errno pointer .
2019-08-19 18:45:37 -04:00
* On success , @ mr is filled in .
2015-03-30 14:34:39 -04:00
*/
2018-12-19 10:59:01 -05:00
struct rpcrdma_mr_seg * frwr_map ( struct rpcrdma_xprt * r_xprt ,
struct rpcrdma_mr_seg * seg ,
2019-02-11 11:23:44 -05:00
int nsegs , bool writing , __be32 xid ,
2019-08-19 18:45:37 -04:00
struct rpcrdma_mr * mr )
2015-03-30 14:34:39 -04:00
{
struct rpcrdma_ia * ia = & r_xprt - > rx_ia ;
2015-12-16 17:22:31 -05:00
struct ib_reg_wr * reg_wr ;
2020-02-12 11:12:30 -05:00
int i , n , dma_nents ;
2019-08-19 18:45:37 -04:00
struct ib_mr * ibmr ;
2015-03-30 14:34:39 -04:00
u8 key ;
2017-12-14 20:57:47 -05:00
if ( nsegs > ia - > ri_max_frwr_depth )
nsegs = ia - > ri_max_frwr_depth ;
2015-10-13 19:11:35 +03:00
for ( i = 0 ; i < nsegs ; ) {
if ( seg - > mr_page )
2017-12-14 20:57:55 -05:00
sg_set_page ( & mr - > mr_sg [ i ] ,
2015-10-13 19:11:35 +03:00
seg - > mr_page ,
seg - > mr_len ,
offset_in_page ( seg - > mr_offset ) ) ;
else
2017-12-14 20:57:55 -05:00
sg_set_buf ( & mr - > mr_sg [ i ] , seg - > mr_offset ,
2015-10-13 19:11:35 +03:00
seg - > mr_len ) ;
2015-03-30 14:34:39 -04:00
+ + seg ;
+ + i ;
2019-08-19 18:45:37 -04:00
if ( ia - > ri_mrtype = = IB_MR_TYPE_SG_GAPS )
xprtrdma: Support for SG_GAP devices
Some devices (such as the Mellanox CX-4) can register, under a
single R_key, a set of memory regions that are not contiguous. When
this is done, all the segments in a Reply list, say, can then be
invalidated in a single LocalInv Work Request (or via Remote
Invalidation, which can invalidate exactly one R_key when completing
a Receive).
This means a single FastReg WR is used to register, and one or zero
LocalInv WRs can invalidate, the memory involved with RDMA transfers
on behalf of an RPC.
In addition, xprtrdma constructs some Reply chunks from three or
more segments. By registering them with SG_GAP, only one segment
is needed for the Reply chunk, allowing the whole chunk to be
invalidated remotely.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-29 10:52:24 -05:00
continue ;
2015-03-30 14:34:39 -04:00
if ( ( i < nsegs & & offset_in_page ( seg - > mr_offset ) ) | |
offset_in_page ( ( seg - 1 ) - > mr_offset + ( seg - 1 ) - > mr_len ) )
break ;
}
2017-12-14 20:57:55 -05:00
mr - > mr_dir = rpcrdma_data_dir ( writing ) ;
2020-02-12 11:12:30 -05:00
mr - > mr_nents = i ;
2015-10-13 19:11:35 +03:00
2020-02-12 11:12:30 -05:00
dma_nents = ib_dma_map_sg ( ia - > ri_id - > device , mr - > mr_sg , mr - > mr_nents ,
mr - > mr_dir ) ;
if ( ! dma_nents )
2016-06-29 13:52:21 -04:00
goto out_dmamap_err ;
2019-06-19 10:32:59 -04:00
ibmr = mr - > frwr . fr_mr ;
2020-02-12 11:12:30 -05:00
n = ib_map_mr_sg ( ibmr , mr - > mr_sg , dma_nents , NULL , PAGE_SIZE ) ;
if ( n ! = dma_nents )
2016-06-29 13:52:21 -04:00
goto out_mapmr_err ;
2015-10-13 19:11:35 +03:00
2018-12-19 10:59:07 -05:00
ibmr - > iova & = 0x00000000ffffffff ;
2019-02-11 11:23:44 -05:00
ibmr - > iova | = ( ( u64 ) be32_to_cpu ( xid ) ) < < 32 ;
2017-12-14 20:57:55 -05:00
key = ( u8 ) ( ibmr - > rkey & 0x000000FF ) ;
ib_update_fast_reg_key ( ibmr , + + key ) ;
2015-10-13 19:11:35 +03:00
2019-06-19 10:32:59 -04:00
reg_wr = & mr - > frwr . fr_regwr ;
2017-12-14 20:57:55 -05:00
reg_wr - > mr = ibmr ;
reg_wr - > key = ibmr - > rkey ;
2015-12-16 17:22:31 -05:00
reg_wr - > access = writing ?
IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
IB_ACCESS_REMOTE_READ ;
2015-03-30 14:34:39 -04:00
2017-12-14 20:57:55 -05:00
mr - > mr_handle = ibmr - > rkey ;
mr - > mr_length = ibmr - > length ;
mr - > mr_offset = ibmr - > iova ;
2018-12-19 10:59:55 -05:00
trace_xprtrdma_mr_map ( mr ) ;
2015-10-13 19:11:35 +03:00
2017-08-14 15:38:30 -04:00
return seg ;
2016-06-29 13:52:21 -04:00
out_dmamap_err :
2019-04-24 09:39:00 -04:00
mr - > mr_dir = DMA_NONE ;
2018-12-19 11:00:06 -05:00
trace_xprtrdma_frwr_sgerr ( mr , i ) ;
2017-08-14 15:38:30 -04:00
return ERR_PTR ( - EIO ) ;
2016-06-29 13:52:21 -04:00
out_mapmr_err :
2018-12-19 11:00:06 -05:00
trace_xprtrdma_frwr_maperr ( mr , n ) ;
2017-08-14 15:38:30 -04:00
return ERR_PTR ( - EIO ) ;
2018-02-28 15:30:59 -05:00
}
2015-03-30 14:34:39 -04:00
2019-06-19 10:32:59 -04:00
/**
* frwr_wc_fastreg - Invoked by RDMA provider for a flushed FastReg WC
2020-02-21 17:00:49 -05:00
* @ cq : completion queue
* @ wc : WCE for a completed FastReg WR
2019-06-19 10:32:59 -04:00
*
*/
static void frwr_wc_fastreg ( struct ib_cq * cq , struct ib_wc * wc )
{
struct ib_cqe * cqe = wc - > wr_cqe ;
struct rpcrdma_frwr * frwr =
container_of ( cqe , struct rpcrdma_frwr , fr_cqe ) ;
/* WARNING: Only wr_cqe and status are reliable at this point */
trace_xprtrdma_wc_fastreg ( wc , frwr ) ;
/* The MR will get recycled when the associated req is retransmitted */
2020-02-21 17:00:49 -05:00
rpcrdma_flush_disconnect ( cq , wc ) ;
2019-06-19 10:32:59 -04:00
}
2018-12-19 10:59:01 -05:00
/**
2020-02-21 17:00:23 -05:00
* frwr_send - post Send WRs containing the RPC Call message
* @ r_xprt : controlling transport instance
* @ req : prepared RPC Call
2018-02-28 15:30:59 -05:00
*
2018-12-19 11:00:27 -05:00
* For FRWR , chain any FastReg WRs to the Send WR . Only a
2018-02-28 15:30:59 -05:00
* single ib_post_send call is needed to register memory
* and then post the Send WR .
2018-12-19 10:59:01 -05:00
*
2020-02-21 17:00:23 -05:00
* Returns the return code from ib_post_send .
*
* Caller must hold the transport send lock to ensure that the
* pointers to the transport ' s rdma_cm_id and QP are stable .
2018-02-28 15:30:59 -05:00
*/
2020-02-21 17:00:23 -05:00
int frwr_send ( struct rpcrdma_xprt * r_xprt , struct rpcrdma_req * req )
2018-02-28 15:30:59 -05:00
{
2020-02-21 17:00:23 -05:00
struct rpcrdma_ia * ia = & r_xprt - > rx_ia ;
2018-07-18 09:25:31 -07:00
struct ib_send_wr * post_wr ;
2018-02-28 15:30:59 -05:00
struct rpcrdma_mr * mr ;
2019-10-17 14:31:35 -04:00
post_wr = & req - > rl_wr ;
2018-02-28 15:30:59 -05:00
list_for_each_entry ( mr , & req - > rl_registered , mr_list ) {
struct rpcrdma_frwr * frwr ;
frwr = & mr - > frwr ;
frwr - > fr_cqe . done = frwr_wc_fastreg ;
frwr - > fr_regwr . wr . next = post_wr ;
frwr - > fr_regwr . wr . wr_cqe = & frwr - > fr_cqe ;
frwr - > fr_regwr . wr . num_sge = 0 ;
frwr - > fr_regwr . wr . opcode = IB_WR_REG_MR ;
frwr - > fr_regwr . wr . send_flags = 0 ;
post_wr = & frwr - > fr_regwr . wr ;
}
2018-07-18 09:25:31 -07:00
return ib_post_send ( ia - > ri_id - > qp , post_wr , NULL ) ;
2015-03-30 14:34:39 -04:00
}
2018-12-19 10:59:01 -05:00
/**
* frwr_reminv - handle a remotely invalidated mr on the @ mrs list
* @ rep : Received reply
* @ mrs : list of MRs to check
*
2017-12-14 20:56:26 -05:00
*/
2018-12-19 10:59:01 -05:00
void frwr_reminv ( struct rpcrdma_rep * rep , struct list_head * mrs )
2017-12-14 20:56:26 -05:00
{
2017-12-14 20:57:55 -05:00
struct rpcrdma_mr * mr ;
2017-12-14 20:56:26 -05:00
2017-12-14 20:57:55 -05:00
list_for_each_entry ( mr , mrs , mr_list )
if ( mr - > mr_handle = = rep - > rr_inv_rkey ) {
xprtrdma: Fix list corruption / DMAR errors during MR recovery
The ro_release_mr methods check whether mr->mr_list is empty.
Therefore, be sure to always use list_del_init when removing an MR
linked into a list using that field. Otherwise, when recovering from
transport failures or device removal, list corruption can result, or
MRs can get mapped or unmapped an odd number of times, resulting in
IOMMU-related failures.
In general this fix is appropriate back to v4.8. However, code
changes since then make it impossible to apply this patch directly
to stable kernels. The fix would have to be applied by hand or
reworked for kernels earlier than v4.16.
Backport guidance -- there are several cases:
- When creating an MR, initialize mr_list so that using list_empty
on an as-yet-unused MR is safe.
- When an MR is being handled by the remote invalidation path,
ensure that mr_list is reinitialized when it is removed from
rl_registered.
- When an MR is being handled by rpcrdma_destroy_mrs, it is removed
from mr_all, but it may still be on an rl_registered list. In
that case, the MR needs to be removed from that list before being
released.
- Other cases are covered by using list_del_init in rpcrdma_mr_pop.
Fixes: 9d6b04097882 ('xprtrdma: Place registered MWs on a ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-01 11:37:14 -04:00
list_del_init ( & mr - > mr_list ) ;
2020-02-12 11:12:35 -05:00
trace_xprtrdma_mr_reminv ( mr ) ;
2019-08-19 18:44:50 -04:00
rpcrdma_mr_put ( mr ) ;
2017-12-14 20:56:26 -05:00
break ; /* only one invalidated MR per RPC */
}
}
2019-06-19 10:32:59 -04:00
static void __frwr_release_mr ( struct ib_wc * wc , struct rpcrdma_mr * mr )
{
if ( wc - > status ! = IB_WC_SUCCESS )
2019-10-17 14:31:09 -04:00
frwr_mr_recycle ( mr ) ;
2019-06-19 10:32:59 -04:00
else
2019-08-19 18:44:50 -04:00
rpcrdma_mr_put ( mr ) ;
2019-06-19 10:32:59 -04:00
}
2018-12-19 10:59:01 -05:00
/**
2019-06-19 10:32:59 -04:00
* frwr_wc_localinv - Invoked by RDMA provider for a LOCAL_INV WC
2020-02-21 17:00:49 -05:00
* @ cq : completion queue
* @ wc : WCE for a completed LocalInv WR
2015-12-16 17:22:47 -05:00
*
2019-06-19 10:32:59 -04:00
*/
static void frwr_wc_localinv ( struct ib_cq * cq , struct ib_wc * wc )
{
struct ib_cqe * cqe = wc - > wr_cqe ;
struct rpcrdma_frwr * frwr =
container_of ( cqe , struct rpcrdma_frwr , fr_cqe ) ;
struct rpcrdma_mr * mr = container_of ( frwr , struct rpcrdma_mr , frwr ) ;
/* WARNING: Only wr_cqe and status are reliable at this point */
trace_xprtrdma_wc_li ( wc , frwr ) ;
__frwr_release_mr ( wc , mr ) ;
2020-02-21 17:00:49 -05:00
rpcrdma_flush_disconnect ( cq , wc ) ;
2019-06-19 10:32:59 -04:00
}
/**
* frwr_wc_localinv_wake - Invoked by RDMA provider for a LOCAL_INV WC
2020-02-21 17:00:49 -05:00
* @ cq : completion queue
* @ wc : WCE for a completed LocalInv WR
2016-06-29 13:54:16 -04:00
*
2019-06-19 10:32:59 -04:00
* Awaken anyone waiting for an MR to finish being fenced .
2015-12-16 17:22:47 -05:00
*/
2019-06-19 10:32:59 -04:00
static void frwr_wc_localinv_wake ( struct ib_cq * cq , struct ib_wc * wc )
{
struct ib_cqe * cqe = wc - > wr_cqe ;
struct rpcrdma_frwr * frwr =
container_of ( cqe , struct rpcrdma_frwr , fr_cqe ) ;
struct rpcrdma_mr * mr = container_of ( frwr , struct rpcrdma_mr , frwr ) ;
/* WARNING: Only wr_cqe and status are reliable at this point */
trace_xprtrdma_wc_li_wake ( wc , frwr ) ;
__frwr_release_mr ( wc , mr ) ;
2019-08-19 18:47:10 -04:00
complete ( & frwr - > fr_linv_done ) ;
2020-02-21 17:00:49 -05:00
rpcrdma_flush_disconnect ( cq , wc ) ;
2019-06-19 10:32:59 -04:00
}
/**
* frwr_unmap_sync - invalidate memory regions that were registered for @ req
* @ r_xprt : controlling transport instance
* @ req : rpcrdma_req with a non - empty list of MRs to process
*
* Sleeps until it is safe for the host CPU to access the previously mapped
2019-06-19 10:33:10 -04:00
* memory regions . This guarantees that registered MRs are properly fenced
* from the server before the RPC consumer accesses the data in them . It
* also ensures proper Send flow control : waking the next RPC waits until
* this RPC has relinquished all its Send Queue entries .
2019-06-19 10:32:59 -04:00
*/
void frwr_unmap_sync ( struct rpcrdma_xprt * r_xprt , struct rpcrdma_req * req )
2015-12-16 17:22:47 -05:00
{
2018-07-18 09:25:32 -07:00
struct ib_send_wr * first , * * prev , * last ;
const struct ib_send_wr * bad_wr ;
2017-12-14 20:57:47 -05:00
struct rpcrdma_frwr * frwr ;
2017-12-14 20:57:55 -05:00
struct rpcrdma_mr * mr ;
2019-06-19 10:32:59 -04:00
int rc ;
2015-12-16 17:22:47 -05:00
2017-06-08 11:52:04 -04:00
/* ORDER: Invalidate all of the MRs first
2015-12-16 17:22:47 -05:00
*
* Chain the LOCAL_INV Work Requests and post them with
* a single ib_post_send ( ) call .
*/
2017-12-14 20:57:47 -05:00
frwr = NULL ;
2016-11-29 10:52:57 -05:00
prev = & first ;
2019-08-19 18:44:04 -04:00
while ( ( mr = rpcrdma_mr_pop ( & req - > rl_registered ) ) ) {
2017-12-14 20:57:55 -05:00
2018-10-01 14:25:30 -04:00
trace_xprtrdma_mr_localinv ( mr ) ;
2019-06-19 10:32:59 -04:00
r_xprt - > rx_stats . local_inv_needed + + ;
2016-11-29 10:52:57 -05:00
2019-06-19 10:32:59 -04:00
frwr = & mr - > frwr ;
2017-12-14 20:57:47 -05:00
frwr - > fr_cqe . done = frwr_wc_localinv ;
last = & frwr - > fr_invwr ;
2019-06-19 10:32:59 -04:00
last - > next = NULL ;
2017-12-14 20:57:47 -05:00
last - > wr_cqe = & frwr - > fr_cqe ;
2019-06-19 10:32:59 -04:00
last - > sg_list = NULL ;
last - > num_sge = 0 ;
2016-11-29 10:52:57 -05:00
last - > opcode = IB_WR_LOCAL_INV ;
2019-06-19 10:32:59 -04:00
last - > send_flags = IB_SEND_SIGNALED ;
2017-12-14 20:57:55 -05:00
last - > ex . invalidate_rkey = mr - > mr_handle ;
2015-12-16 17:22:47 -05:00
2016-11-29 10:52:57 -05:00
* prev = last ;
prev = & last - > next ;
2015-12-16 17:22:47 -05:00
}
/* Strong send queue ordering guarantees that when the
* last WR in the chain completes , all WRs in the chain
* are complete .
*/
2017-12-14 20:57:47 -05:00
frwr - > fr_cqe . done = frwr_wc_localinv_wake ;
reinit_completion ( & frwr - > fr_linv_done ) ;
2016-11-29 10:52:16 -05:00
2015-12-16 17:22:47 -05:00
/* Transport disconnect drains the receive CQ before it
* replaces the QP . The RPC reply handler won ' t call us
* unless ri_id - > qp is a valid pointer .
*/
2017-06-08 11:52:28 -04:00
bad_wr = NULL ;
2019-06-19 10:32:59 -04:00
rc = ib_post_send ( r_xprt - > rx_ia . ri_id - > qp , first , & bad_wr ) ;
2015-12-16 17:22:47 -05:00
2019-06-19 10:32:59 -04:00
/* The final LOCAL_INV WR in the chain is supposed to
* do the wake . If it was never posted , the wake will
* not happen , so don ' t wait in that case .
2015-12-16 17:22:47 -05:00
*/
2019-06-19 10:32:59 -04:00
if ( bad_wr ! = first )
wait_for_completion ( & frwr - > fr_linv_done ) ;
if ( ! rc )
return ;
2015-03-30 14:34:48 -04:00
2019-06-19 10:32:59 -04:00
/* Recycle MRs in the LOCAL_INV chain that did not get posted.
2016-05-02 14:42:12 -04:00
*/
2019-10-09 13:07:21 -04:00
trace_xprtrdma_post_linv ( req , rc ) ;
2017-06-08 11:52:28 -04:00
while ( bad_wr ) {
2017-12-14 20:57:47 -05:00
frwr = container_of ( bad_wr , struct rpcrdma_frwr ,
fr_invwr ) ;
2017-12-14 20:57:55 -05:00
mr = container_of ( frwr , struct rpcrdma_mr , frwr ) ;
2017-06-08 11:52:28 -04:00
bad_wr = bad_wr - > next ;
2018-10-01 14:25:25 -04:00
2018-12-19 10:58:19 -05:00
list_del_init ( & mr - > mr_list ) ;
2019-10-17 14:31:09 -04:00
frwr_mr_recycle ( mr ) ;
2016-05-02 14:42:12 -04:00
}
2015-12-16 17:22:47 -05:00
}
2019-06-19 10:33:10 -04:00
/**
* frwr_wc_localinv_done - Invoked by RDMA provider for a signaled LOCAL_INV WC
2020-02-21 17:00:49 -05:00
* @ cq : completion queue
* @ wc : WCE for a completed LocalInv WR
2019-06-19 10:33:10 -04:00
*
*/
static void frwr_wc_localinv_done ( struct ib_cq * cq , struct ib_wc * wc )
{
struct ib_cqe * cqe = wc - > wr_cqe ;
struct rpcrdma_frwr * frwr =
container_of ( cqe , struct rpcrdma_frwr , fr_cqe ) ;
struct rpcrdma_mr * mr = container_of ( frwr , struct rpcrdma_mr , frwr ) ;
2019-08-19 18:47:10 -04:00
struct rpcrdma_rep * rep = mr - > mr_req - > rl_reply ;
2019-06-19 10:33:10 -04:00
/* WARNING: Only wr_cqe and status are reliable at this point */
trace_xprtrdma_wc_li_done ( wc , frwr ) ;
__frwr_release_mr ( wc , mr ) ;
2019-08-19 18:47:10 -04:00
/* Ensure @rep is generated before __frwr_release_mr */
smp_rmb ( ) ;
rpcrdma_complete_rqst ( rep ) ;
2020-02-21 17:00:49 -05:00
rpcrdma_flush_disconnect ( cq , wc ) ;
2019-06-19 10:33:10 -04:00
}
/**
* frwr_unmap_async - invalidate memory regions that were registered for @ req
* @ r_xprt : controlling transport instance
* @ req : rpcrdma_req with a non - empty list of MRs to process
*
* This guarantees that registered MRs are properly fenced from the
* server before the RPC consumer accesses the data in them . It also
* ensures proper Send flow control : waking the next RPC waits until
* this RPC has relinquished all its Send Queue entries .
*/
void frwr_unmap_async ( struct rpcrdma_xprt * r_xprt , struct rpcrdma_req * req )
{
struct ib_send_wr * first , * last , * * prev ;
const struct ib_send_wr * bad_wr ;
struct rpcrdma_frwr * frwr ;
struct rpcrdma_mr * mr ;
int rc ;
/* Chain the LOCAL_INV Work Requests and post them with
* a single ib_post_send ( ) call .
*/
frwr = NULL ;
prev = & first ;
2019-08-19 18:44:04 -04:00
while ( ( mr = rpcrdma_mr_pop ( & req - > rl_registered ) ) ) {
2019-06-19 10:33:10 -04:00
trace_xprtrdma_mr_localinv ( mr ) ;
r_xprt - > rx_stats . local_inv_needed + + ;
frwr = & mr - > frwr ;
frwr - > fr_cqe . done = frwr_wc_localinv ;
last = & frwr - > fr_invwr ;
last - > next = NULL ;
last - > wr_cqe = & frwr - > fr_cqe ;
last - > sg_list = NULL ;
last - > num_sge = 0 ;
last - > opcode = IB_WR_LOCAL_INV ;
last - > send_flags = IB_SEND_SIGNALED ;
last - > ex . invalidate_rkey = mr - > mr_handle ;
* prev = last ;
prev = & last - > next ;
}
/* Strong send queue ordering guarantees that when the
* last WR in the chain completes , all WRs in the chain
* are complete . The last completion will wake up the
* RPC waiter .
*/
frwr - > fr_cqe . done = frwr_wc_localinv_done ;
/* Transport disconnect drains the receive CQ before it
* replaces the QP . The RPC reply handler won ' t call us
* unless ri_id - > qp is a valid pointer .
*/
bad_wr = NULL ;
rc = ib_post_send ( r_xprt - > rx_ia . ri_id - > qp , first , & bad_wr ) ;
if ( ! rc )
return ;
/* Recycle MRs in the LOCAL_INV chain that did not get posted.
*/
2019-10-09 13:07:21 -04:00
trace_xprtrdma_post_linv ( req , rc ) ;
2019-06-19 10:33:10 -04:00
while ( bad_wr ) {
frwr = container_of ( bad_wr , struct rpcrdma_frwr , fr_invwr ) ;
mr = container_of ( frwr , struct rpcrdma_mr , frwr ) ;
bad_wr = bad_wr - > next ;
2019-10-17 14:31:09 -04:00
frwr_mr_recycle ( mr ) ;
2019-06-19 10:33:10 -04:00
}
/* The final LOCAL_INV WR in the chain is supposed to
* do the wake . If it was never posted , the wake will
* not happen , so wake here in that case .
*/
rpcrdma_complete_rqst ( req - > rl_reply ) ;
}