Go to file
Chuck Lever ae72950abf xprtrdma: Add data structure to manage RDMA Send arguments
Problem statement:

Recently Sagi Grimberg <sagi@grimberg.me> observed that kernel RDMA-
enabled storage initiators don't handle delayed Send completion
correctly. If Send completion is delayed beyond the end of a ULP
transaction, the ULP may release resources that are still being used
by the HCA to complete a long-running Send operation.

This is a common design trait amongst our initiators. Most Send
operations are faster than the ULP transaction they are part of.
Waiting for a completion for these is typically unnecessary.

Infrequently, a network partition or some other problem crops up
where an ordering problem can occur. In NFS parlance, the RPC Reply
arrives and completes the RPC, but the HCA is still retrying the
Send WR that conveyed the RPC Call. In this case, the HCA can try
to use memory that has been invalidated or DMA unmapped, and the
connection is lost. If that memory has been re-used for something
else (possibly not related to NFS), and the Send retransmission
exposes that data on the wire.

Thus we cannot assume that it is safe to release Send-related
resources just because a ULP reply has arrived.

After some analysis, we have determined that the completion
housekeeping will not be difficult for xprtrdma:

 - Inline Send buffers are registered via the local DMA key, and
   are already left DMA mapped for the lifetime of a transport
   connection, thus no additional handling is necessary for those
 - Gathered Sends involving page cache pages _will_ need to
   DMA unmap those pages after the Send completes. But like
   inline send buffers, they are registered via the local DMA key,
   and thus will not need to be invalidated

In addition, RPC completion will need to wait for Send completion
in the latter case. However, nearly always, the Send that conveys
the RPC Call will have completed long before the RPC Reply
arrives, and thus no additional latency will be accrued.

Design notes:

In this patch, the rpcrdma_sendctx object is introduced, and a
lock-free circular queue is added to manage a set of them per
transport.

The RPC client's send path already prevents sending more than one
RPC Call at the same time. This allows us to treat the consumer
side of the queue (rpcrdma_sendctx_get_locked) as if there is a
single consumer thread.

The producer side of the queue (rpcrdma_sendctx_put_locked) is
invoked only from the Send completion handler, which is a single
thread of execution (soft IRQ).

The only care that needs to be taken is with the tail index, which
is shared between the producer and consumer. Only the producer
updates the tail index. The consumer compares the head with the
tail to ensure that the a sendctx that is in use is never handed
out again (or, expressed more conventionally, the queue is empty).

When the sendctx queue empties completely, there are enough Sends
outstanding that posting more Send operations can result in a Send
Queue overflow. In this case, the ULP is told to wait and try again.
This introduces strong Send Queue accounting to xprtrdma.

As a final touch, Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
suggested a mechanism that does not require signaling every Send.
We signal once every N Sends, and perform SGE unmapping of N Send
operations during that one completion.

Reported-by: Sagi Grimberg <sagi@grimberg.me>
Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 13:47:56 -05:00
arch Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-10-14 15:26:38 -04:00
block bio_copy_user_iov(): don't ignore ->iov_offset 2017-10-10 23:55:14 -04:00
certs modsign: add markers to endif-statements in certs/Makefile 2017-07-14 11:01:37 +10:00
crypto crypto: shash - Fix zero-length shash ahash digest crash 2017-10-11 00:34:07 +08:00
Documentation mm, swap: use page-cluster as max window of VMA based swap readahead 2017-10-13 16:18:33 -07:00
drivers Char/Misc driver fixes for 4.14-rc5 2017-10-15 07:50:38 -04:00
firmware firmware: Restore support for built-in firmware 2017-09-16 10:58:48 -07:00
fs NFS: remove special-case revalidate in nfs_opendir() 2017-10-16 13:51:27 -04:00
include NFS: Create NFS_ACCESS_* flags 2017-10-16 13:51:27 -04:00
init Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
ipc fix a typo in put_compat_shm_info() 2017-09-25 20:41:46 -04:00
kernel Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-10-14 15:20:38 -04:00
lib Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-10-14 15:14:20 -04:00
mm mm, swap: use page-cluster as max window of VMA based swap readahead 2017-10-13 16:18:33 -07:00
net xprtrdma: Add data structure to manage RDMA Send arguments 2017-11-17 13:47:56 -05:00
samples media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
scripts scripts/kallsyms.c: ignore symbol type 'n' 2017-10-13 16:18:32 -07:00
security lsm: fix smack_inode_removexattr and xattr_getsecurity memleak 2017-10-04 18:03:15 +11:00
sound ALSA: caiaq: Fix stray URB at probe error path 2017-10-11 17:01:18 +02:00
tools Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-10-14 15:16:49 -04:00
usr ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk. 2017-07-06 16:24:30 -07:00
virt Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD" 2017-09-19 08:37:17 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support to generate LLVM assembly files 2017-04-25 08:13:52 +09:00
.mailmap Update James Hogan's email address 2017-10-04 17:11:53 -07:00
COPYING
CREDITS selinux/stable-4.14 PR 20170831 2017-09-12 13:21:00 -07:00
Kbuild kbuild: Consolidate header generation from ASM offset information 2017-04-13 05:43:37 +09:00
Kconfig
MAINTAINERS Another latent bug related to PCID, an out-of-bounds access, 2017-10-12 10:42:03 -07:00
Makefile Linux 4.14-rc5 2017-10-15 21:01:12 -04:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.