67 Commits

Author SHA1 Message Date
Chuck Lever
dbc17acd5d svcrdma: trace undersized Write chunks
Clean up: Replace a dprintk call site.

This is the last remaining dprintk call site in svc_rdma_rw.c, so
remove dprintk infrastructure as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Chuck Lever
9d20063892 svcrdma: Trace page overruns when constructing RDMA Reads
Clean up: Replace a dprintk call site with a tracepoint.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Chuck Lever
f4e53e1ce3 svcrdma: Clean up handling of get_rw_ctx errors
Clean up: Replace two dprintk call sites with a tracepoint.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Chuck Lever
2abfbe7e3a svcrdma: Clean up the tracing for rw_ctx_init errors
- De-duplicate code
- Rename the tracepoint with "_err" to allow enabling via glob
- Report the sg_cnt for the failing rw_ctx
- Fix a dumb signage issue

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Linus Torvalds
29a47f456d NFS client bugfixes for Linux 5.7
Highlights include:
 
 Stable fixes
 - fix handling of backchannel binding in BIND_CONN_TO_SESSION
 
 Bugfixes
 - Fix a credential use-after-free issue in pnfs_roc()
 - Fix potential posix_acl refcnt leak in nfs3_set_acl
 - defer slow parts of rpc_free_client() to a workqueue
 - Fix an Oopsable race in __nfs_list_for_each_server()
 - Fix trace point use-after-free race
 - Regression: the RDMA client no longer responds to server disconnect requests
 - Fix return values of xdr_stream_encode_item_{present, absent}
 - _pnfs_return_layout() must always wait for layoutreturn completion
 
 Cleanups
 - Remove unreachable error conditions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6tczsACgkQZwvnipYK
 APKHWg//QGx2Tolj5dh2jBHa47A5/SYnJxCZAA0/fWdwRtFkW3HyyGne1jU86do2
 SMAVpBpri1WJPt5d3DH66gu4l4UxG1h84s7QP4lGfSa85EmtLh+LoZQCZRqYoDOo
 JAMzWctELu1TUpaa1N5Dhg/qMtMy6ulRMWgzTLqB9a/pQa3onugTK6W7xiut2prj
 PBfFq7N9XXmPboSeGV9bR4L8XKSbTCLEt3U1F2zAGU7UUINvDfpjEXq7BHYCewKL
 ObPW6EWZksyna16H8i/xGWoKgE4JFVjMwQAP7UdDBi+FW9RI6UpTBoR6z9N748j0
 jEocDbI21wgnwmtrVTbzsYm6ttHl4D4egoNxn7m5zjxTU4Ba/RQG2aaHUGFOYpJj
 1FI1f6V1Y5v4mJajdsEH+pGW/4vK/4YMR+7YHJ/hYU/WiXjLf7onIIifdWt4SQdo
 lvZbGcx6IAHYUA4lI7hkcvrK4bbqAnPLFq28nlUWEID5q5D+nA1ZR9iN0FToviDy
 FYyhQzyfD1kt98SV1DjWUqvDDd6IB64iDZTXGmtWvj6c2nbezGiFffvtzUL5LFxY
 QfI8lkpmUyt1EiWlZWhtOh4zsiM5yMZkJB/3RJv3RMmswizSSAHdgCKWhdLpX0bl
 TG1L8yEmcTc5ANS37EhlpcBNbfYw7oIF/OXuReTSRoMQl5hxjfY=
 =w0zk
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix handling of backchannel binding in BIND_CONN_TO_SESSION

  Bugfixes:
   - Fix a credential use-after-free issue in pnfs_roc()
   - Fix potential posix_acl refcnt leak in nfs3_set_acl
   - defer slow parts of rpc_free_client() to a workqueue
   - Fix an Oopsable race in __nfs_list_for_each_server()
   - Fix trace point use-after-free race
   - Regression: the RDMA client no longer responds to server disconnect
     requests
   - Fix return values of xdr_stream_encode_item_{present, absent}
   - _pnfs_return_layout() must always wait for layoutreturn completion

  Cleanups:
   - Remove unreachable error conditions"

* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race in __nfs_list_for_each_server()
  NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
  SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
  NFSv4: Remove unreachable error condition due to rpc_run_task()
  SUNRPC: Remove unreachable error condition
  xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
  xprtrdma: Fix trace point use-after-free race
  xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
  nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
  NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
  NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
2020-05-02 11:24:01 -07:00
Chuck Lever
bdb2ce8281 xprtrdma: Fix trace point use-after-free race
It's not safe to use resources pointed to by the @send_wr of
ib_post_send() _after_ that function returns. Those resources are
typically freed by the Send completion handler, which can run before
ib_post_send() returns.

Thus the trace points currently around ib_post_send() in the
client's RPC/RDMA transport are a hazard, even when they are
disabled. Rearrange them so that they touch the Work Request only
_before_ ib_post_send() is invoked.

Fixes: ab03eff58eb5 ("xprtrdma: Add trace points in RPC Call transmit paths")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-04-20 10:44:01 -04:00
Chuck Lever
e28b4fc652 svcrdma: Fix trace point use-after-free race
I hit this while testing nfsd-5.7 with kernel memory debugging
enabled on my server:

Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8
Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode
Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page
Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060
Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591
Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma]
Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 <8b> 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00
Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286
Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030
Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246
Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004
Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80
Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000
Mar 30 13:21:45 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000
Mar 30 13:21:45 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0
Mar 30 13:21:45 klimt kernel: Call Trace:
Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma]
Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma]
Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma]
Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc]
Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd]
Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb
Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50
Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core
Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8
Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]---

It's absolutely not safe to use resources pointed to by the @send_wr
argument of ib_post_send() _after_ that function returns. Those
resources are typically freed by the Send completion handler, which
can run before ib_post_send() returns.

Thus the trace points currently around ib_post_send() in the
server's RPC/RDMA transport are a hazard, even when they are
disabled. Rearrange them so that they touch the Work Request only
_before_ ib_post_send() is invoked.

Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events")
Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-17 12:40:38 -04:00
Linus Torvalds
04de788e61 NFS client updates for Linux 5.7
Highlights include:
 
 Stable fixes:
 - Fix a page leak in nfs_destroy_unlinked_subrequests()
 - Fix use-after-free issues in nfs_pageio_add_request()
 - Fix new mount code constant_table array definitions
 - finish_automount() requires us to hold 2 refs to the mount record
 
 Features:
 - Improve the accuracy of telldir/seekdir by using 64-bit cookies when
   possible.
 - Allow one RDMA active connection and several zombie connections to
   prevent blocking if the remote server is unresponsive.
 - Limit the size of the NFS access cache by default
 - Reduce the number of references to credentials that are taken by NFS
 - pNFS files and flexfiles drivers now support per-layout segment
   COMMIT lists.
 - Enable partial-file layout segments in the pNFS/flexfiles driver.
 - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
 - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
   the DS using the layouterror mechanism.
 
 Bugfixes and cleanups:
 - SUNRPC: Fix krb5p regressions
 - Don't specify NFS version in "UDP not supported" error
 - nfsroot: set tcp as the default transport protocol
 - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
 - alloc_nfs_open_context() must use the file cred when available
 - Fix locking when dereferencing the delegation cred
 - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
 - Various clean ups of the NFS O_DIRECT commit code
 - Clean up RDMA connect/disconnect
 - Replace zero-length arrays with C99-style flexible arrays
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6LhhsACgkQZwvnipYK
 APIOJxAAiQOgmIg1CV4mrlcVhkwy09N5JAia6AENtoTmwm08nAYg5Y8REb9uX46a
 /MJsM2WG8hBCgI6eYmRY8LTr4Ft9rTQEJM9DRMuwQREXwMWwBhUv/QakCeqY1lHE
 lyB1z4hj5XKeUoN/OcfALC/GXFFf56A0UyN05nMzeCkBTdd3+qu+hW8Ge1wkAXcr
 f0pyLbzdFZlJuTmI4tr8F93g9p3ezuFBuEroT7XPIVJylAdZVumHqnOnz/Mvb99x
 rNTsX2dc44GhSAfRnTzPumU3MT6BOLvUzNH1xzdiqKzJrbOnG8WjFodrGr3JWpfp
 HkeyYQxJ+Hnfb2LiZBjvMQE8M7kVMZ1jVbrGJEbCxfSqgTly8lOHboqAeKsFaReK
 LStnusizdA1LHQVZxPdvn+oL49RDxnzm9dY+DkrXK1qT0GE+icN1CyTyLLfkSCp8
 tYvZSJ/qPk5BNZegqH1nBqXkMDkOJ4eEA7+luXDmajRkdRrZ3IWY2M1DpMEoueJ2
 j/zoj/NFr1oErU4o7PV9oolA1Euhn1L3wIDuzsbVtjySmbXJNQTtaVVRFpGw3SsZ
 7rbqi4BB0SzOooNhQ4q8mLNi4qT7bl/3D04eL8UVzEM73plexhQ8XiOEz/VrIRX7
 L9viXH49g4DHQ0rZIaWefxFueqpgbNvQwnlLZl2uQotG9hwhTts=
 =YUcP
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - Fix a page leak in nfs_destroy_unlinked_subrequests()

   - Fix use-after-free issues in nfs_pageio_add_request()

   - Fix new mount code constant_table array definitions

   - finish_automount() requires us to hold 2 refs to the mount record

  Features:
   - Improve the accuracy of telldir/seekdir by using 64-bit cookies
     when possible.

   - Allow one RDMA active connection and several zombie connections to
     prevent blocking if the remote server is unresponsive.

   - Limit the size of the NFS access cache by default

   - Reduce the number of references to credentials that are taken by
     NFS

   - pNFS files and flexfiles drivers now support per-layout segment
     COMMIT lists.

   - Enable partial-file layout segments in the pNFS/flexfiles driver.

   - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

   - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
     the DS using the layouterror mechanism.

  Bugfixes and cleanups:
   - SUNRPC: Fix krb5p regressions

   - Don't specify NFS version in "UDP not supported" error

   - nfsroot: set tcp as the default transport protocol

   - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

   - alloc_nfs_open_context() must use the file cred when available

   - Fix locking when dereferencing the delegation cred

   - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

   - Various clean ups of the NFS O_DIRECT commit code

   - Clean up RDMA connect/disconnect

   - Replace zero-length arrays with C99-style flexible arrays"

* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
  NFS: Clean up process of marking inode stale.
  SUNRPC: Don't start a timer on an already queued rpc task
  NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
  NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
  NFS: Beware when dereferencing the delegation cred
  NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
  NFS: finish_automount() requires us to hold 2 refs to the mount record
  NFS: Fix a few constant_table array definitions
  NFS: Try to join page groups before an O_DIRECT retransmission
  NFS: Refactor nfs_lock_and_join_requests()
  NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
  NFS: Clean up nfs_lock_and_join_requests()
  NFS: Remove the redundant function nfs_pgio_has_mirroring()
  NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
  NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
  NFS: Fix use-after-free issues in nfs_pageio_add_request()
  NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
  NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
  NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
  pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
  ...
2020-04-07 13:51:39 -07:00
Chuck Lever
e28ce90083 xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt
Change the rpcrdma_xprt_disconnect() function so that it no longer
waits for the DISCONNECTED event.  This prevents blocking if the
remote is unresponsive.

In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is
detached. Upon return from rpcrdma_xprt_disconnect(), the transport
(r_xprt) is ready immediately for a new connection.

The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now
handled almost identically.

However, because the lifetimes of rpcrdma_xprt structures and
rpcrdma_ep structures are now independent, creating an rpcrdma_ep
needs to take a module ref count. The ep now owns most of the
hardware resources for a transport.

Also, a kref is needed to ensure that rpcrdma_ep sticks around
long enough for the cm_event_handler to finish.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
745b734c9b xprtrdma: Extract sockaddr from struct rdma_cm_id
rpcrdma_cm_event_handler() is always passed an @id pointer that is
valid. However, in a subsequent patch, we won't be able to extract
an r_xprt in every case. So instead of using the r_xprt's
presentation address strings, extract them from struct rdma_cm_id.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
93aa8e0a9d xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep
I eventually want to allocate rpcrdma_ep separately from struct
rpcrdma_xprt so that on occasion there can be more than one ep per
xprt.

The new struct rpcrdma_ep will contain all the fields currently in
rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings
for the connection, in addition to per-connection settings
negotiated with the remote.

Take this opportunity to rename the existing ep fields from rep_* to
re_* to disambiguate these from struct rpcrdma_rep.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
d6ccebf956 xprtrdma: Disconnect on flushed completion
Completion errors after a disconnect often occur much sooner than a
CM_DISCONNECT event. Use this to try to detect connection loss more
quickly.

Note that other kernel ULPs do take care to disconnect explicitly
when a WR is flushed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
81fe0c57f4 xprtrdma: Invoke rpcrdma_ia_open in the connect worker
Move rdma_cm_id creation into rpcrdma_ep_create() so that it is now
responsible for allocating all per-connection hardware resources.

With this clean-up, all three arms of the switch statement in
rpcrdma_ep_connect are exactly the same now, thus the switch can be
removed.

Because device removal behaves a little differently than
disconnection, there is a little more work to be done before
rpcrdma_ep_destroy() can release the connection's rdma_cm_id. So
it is not quite symmetrical with rpcrdma_ep_create() yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
62a89501a3 xprtrdma: Enhance MR-related trace points
Two changes:
- Show the number of SG entries that were mapped. This helps debug
  DMA-related problems.
- Record the MR's resource ID instead of its memory address. This
  groups each MR with its associated rdma-tool output, and reduces
  needless exposure of memory addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
0dabe948f2 svcrdma: Avoid DMA mapping small RPC Replies
On some platforms, DMA mapping part of a page is more costly than
copying bytes. Indeed, not involving the I/O MMU can help the
RPC/RDMA transport scale better for tiny I/Os across more RDMA
devices. This is because interaction with the I/O MMU is eliminated
for each of these small I/Os. Without the explicit unmapping, the
NIC no longer needs to do a costly internal TLB shoot down for
buffers that are just a handful of bytes.

Since pull-up is now a more a frequent operation, I've introduced a
trace point in the pull-up path. It can be used for debugging or
user-space tools that count pull-up frequency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
a406c563e8 svcrdma: Rename svcrdma_encode trace points in send routines
These trace points are misnamed:

	trace_svcrdma_encode_wseg
	trace_svcrdma_encode_write
	trace_svcrdma_encode_reply
	trace_svcrdma_encode_rseg
	trace_svcrdma_encode_read
	trace_svcrdma_encode_pzr

Because they actually trace posting on the Send Queue. Let's rename
them so that I can add trace points in the chunk list encoders that
actually do trace chunk list encoding events.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
e604aad2ca svcrdma: Use struct xdr_stream to decode ingress transport headers
The logic that checks incoming network headers has to be scrupulous.

De-duplicate: replace open-coded buffer overflow checks with the use
of xdr_stream helpers that are used most everywhere else XDR
decoding is done.

One minor change to the sanity checks: instead of checking the
length of individual segments, cap the length of the whole chunk
to be sure it can fit in the set of pages available in rq_pages.
This should be a better test of whether the server can handle the
chunks in each request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
2426ddfdf1 svcrdma: Remove svcrdma_cm_event() trace point
Clean up. This trace point is no longer needed because the RDMA/core
CMA code has an equivalent trace point that was added by commit
ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection
Manager").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
cb586decbb xprtrdma: Make sendctx queue lifetime the same as connection lifetime
The size of the sendctx queue depends on the value stored in
ia->ri_max_send_sges. This value is determined by querying the
underlying device.

Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
in the connect worker rather than at transport set-up time. The
underlying device will not have been chosen device set-up time.

The sendctx queue will thus have to be created after the underlying
device has been chosen via address and route resolution; in other
words, in the connect worker.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Linus Torvalds
911d137ab0 This is a relatively quiet cycle for nfsd, mainly various bugfixes.
Possibly most interesting is Trond's fixes for some callback races that
 were due to my incomplete understanding of rpc client shutdown.
 Unfortunately at the last minute I've started noticing a new
 intermittent failure to send callbacks.  As the logic seems basically
 correct, I'm leaving Trond's patches in for now, and hope to find a fix
 in the next week so I don't have to revert those patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl3r3AAVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+rjkP/3L6DZs0Uv0BYbGq5Gmit0uoPSQk
 8BT7oQhbagCh+ULRYWCnK6cz82wejR4Gzq4PLyl5x5Vcc5x+bLoPI9YgiRZlIbZu
 ZvSg93E6SITLfq5xRlDC0MlIVZkI+HoIfyYgv1aYiWvQ3834bcx4DxVm9h7cNpT3
 x37anEFi1lv3n9fct3obOrs3AvCS76XyA6VVhcSLJ77amKQ+O7LI0crqUc6cuX2i
 CkTwTSDwyCrzkx3dZ2xDPDTbLecxw+Ce4adaby5v3GEQo3TOCmEWX92D3dvzfMmv
 ICU07FsVOILnIT/fmC91b1+JWVRLjUUBw5EPmDduwSP/yw4YnIEODFEP/wAUAmMJ
 vJ9hi9c1rThQ9n8h08RIwA2snhnpXRxKCWhpIRY6WM8DhHL9Y9AuVPYTKxhQOjPK
 l3wbOGcMW63NrTOPHHN7hTB0vDLgPKIXYVIrMvZTd/P7CghDDEbhT1gDvx/IL3Uq
 WrHKbJtK7rbx9i2bh5f6fH0DRrv7lxbD0ffunRRa3twPAe6zsG9WPjsbZZraZzEg
 O7/o3wZu2N7MpL5bXPfzB+5ylOTxvNWew07NJjA4BIOfwin3bw/71YfB0Vnoairv
 PhmbN2Dj4/t82ld0JU5GJWojpUfH4ARXM2Li9WO99wzx+KrxScsqGPnRMFe9dC7b
 Q7ltP1p0gUbkJ88Z
 =b2zA
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "This is a relatively quiet cycle for nfsd, mainly various bugfixes.

  Possibly most interesting is Trond's fixes for some callback races
  that were due to my incomplete understanding of rpc client shutdown.
  Unfortunately at the last minute I've started noticing a new
  intermittent failure to send callbacks. As the logic seems basically
  correct, I'm leaving Trond's patches in for now, and hope to find a
  fix in the next week so I don't have to revert those patches"

* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
  nfsd: depend on CRYPTO_MD5 for legacy client tracking
  NFSD fixing possible null pointer derefering in copy offload
  nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
  nfsd: Ensure CLONE persists data and metadata changes to the target file
  SUNRPC: Fix backchannel latency metrics
  nfsd: restore NFSv3 ACL support
  nfsd: v4 support requires CRYPTO_SHA256
  nfsd: Fix cld_net->cn_tfm initialization
  lockd: remove __KERNEL__ ifdefs
  sunrpc: remove __KERNEL__ ifdefs
  race in exportfs_decode_fh()
  nfsd: Drop LIST_HEAD where the variable it declares is never used.
  nfsd: document callback_wq serialization of callback code
  nfsd: mark cb path down on unknown errors
  nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
  nfsd: minor 4.1 callback cleanup
  SUNRPC: Fix svcauth_gss_proxy_init()
  SUNRPC: Trace gssproxy upcall results
  sunrpc: fix crash when cache_head become valid before update
  nfsd: remove private bin2hex implementation
  ...
2019-12-07 16:56:00 -08:00
Chuck Lever
a52c23b8b2 xprtrdma: Replace dprintk in xprt_rdma_set_port
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
f54c870d32 xprtrdma: Replace dprintk() in rpcrdma_update_connect_private()
Clean up: Use a single trace point to record each connection's
negotiated inline thresholds and the computed maximum byte size
of transport headers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
d4957f01d2 xprtrdma: Refine trace_xprtrdma_fixup
Slightly reduce overhead and display more useful information.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
7b020f17bb xprtrdma: Report the computed connect delay
For debugging, the op_connect trace point should report the computed
connect delay. We can then ensure that the delay is computed at the
proper times, for example.

As a further clean-up, remove a few low-value "heartbeat" trace
points in the connect path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
614f3c96d7 xprtrdma: Pull up sometimes
On some platforms, DMA mapping part of a page is more costly than
copying bytes. Restore the pull-up code and use that when we
think it's going to be faster. The heuristic for now is to pull-up
when the size of the RPC message body fits in the buffer underlying
the head iovec.

Indeed, not involving the I/O MMU can help the RPC/RDMA transport
scale better for tiny I/Os across more RDMA devices. This is because
interaction with the I/O MMU is eliminated, as is handling a Send
completion, for each of these small I/Os. Without the explicit
unmapping, the NIC no longer needs to do a costly internal TLB shoot
down for buffers that are just a handful of bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
dc15c3d5f1 xprtrdma: Move the rpcrdma_sendctx::sc_wr field
Clean up: This field is not needed in the Send completion handler,
so it can be moved to struct rpcrdma_req to reduce the size of
struct rpcrdma_sendctx, and to reduce the amount of memory that
is sloshed between the sending process and the Send completion
process.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:40 -04:00
Chuck Lever
4b93dab36f xprtrdma: Add unique trace points for posting Local Invalidate WRs
When adding frwr_unmap_async way back when, I re-used the existing
trace_xprtrdma_post_send() trace point to record the return code
of ib_post_send.

Unfortunately there are some cases where re-using that trace point
causes a crash. Instead, construct a trace point specific to posting
Local Invalidate WRs that will always be safe to use in that context,
and will act as a trace log eye-catcher for Local Invalidation.

Fixes: 847568942f93 ("xprtrdma: Remove fr_state")
Fixes: d8099feda483 ("xprtrdma: Reduce context switching due ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Bill Baker <bill.baker@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:39 -04:00
Chuck Lever
832b2cb955 svcrdma: Improve DMA mapping trace points
Capture the total size of Sends, the size of DMA map and the
matching DMA unmap to ensure operation is correct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-10-08 16:01:33 -04:00
Chuck Lever
6dc6ec9e04 xprtrdma: Cache free MRs in each rpcrdma_req
Instead of a globally-contended MR free list, cache MRs in each
rpcrdma_req as they are released. This means acquiring and releasing
an MR will be lock-free in the common case, even outside the
transport send lock.

The original idea of per-rpcrdma_req MR free lists was suggested by
Shirley Ma <shirley.ma@oracle.com> several years ago. I just now
figured out how to make that idea work with on-demand MR allocation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-21 11:06:24 -04:00
Chuck Lever
3b39f52a02 xprtrdma: Move rpcrdma_mr_get out of frwr_map
Refactor: Retrieve an MR and handle error recovery entirely in
rpc_rdma.c, as this is not a device-specific function.

Note that since commit 89f90fe1ad8b ("SUNRPC: Allow calls to
xprt_transmit() to drain the entire transmit queue"), the
xprt_transmit function handles the cond_resched. The transport no
longer has to do this itself.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20 16:23:35 -04:00
Chuck Lever
2dfdcd88cf xprtrdma: Rename CQE field in Receive trace points
Make the field name the same for all trace points that handle
pointers to struct rpcrdma_rep. That makes it easy to grep for
matching rep points in trace output.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-20 13:48:43 -04:00
Chuck Lever
675dd90ad0 xprtrdma: Modernize ops->connect
Adapt and apply changes that were made to the TCP socket connect
code. See the following commits for details on the purpose of
these changes:

Commit 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
Commit 3851f1cdb2b8 ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout")
Commit 02910177aede ("SUNRPC: Fix reconnection timeouts")

Some common transport code is moved to xprt.c to satisfy the code
duplication police.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09 10:30:25 -04:00
Chuck Lever
d8099feda4 xprtrdma: Reduce context switching due to Local Invalidation
Since commit ba69cd122ece ("xprtrdma: Remove support for FMR memory
registration"), FRWR is the only supported memory registration mode.

We can take advantage of the asynchronous nature of FRWR's LOCAL_INV
Work Requests to get rid of the completion wait by having the
LOCAL_INV completion handler take care of DMA unmapping MRs and
waking the upper layer RPC waiter.

This eliminates two context switches when local invalidation is
necessary. As a side benefit, we will no longer need the per-xprt
deferred completion work queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09 10:30:25 -04:00
Chuck Lever
847568942f xprtrdma: Remove fr_state
Now that both the Send and Receive completions are handled in
process context, it is safe to DMA unmap and return MRs to the
free or recycle lists directly in the completion handlers.

Doing this means rpcrdma_frwr no longer needs to track the state of
each MR, meaning that a VALID or FLUSHED MR can no longer appear on
an xprt's MR free list. Thus there is no longer a need to track the
MR's registration state in rpcrdma_frwr.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09 10:30:25 -04:00
Chuck Lever
05eb06d866 xprtrdma: Fix occasional transport deadlock
Under high I/O workloads, I've noticed that an RPC/RDMA transport
occasionally deadlocks (IOPS goes to zero, and doesn't recover).
Diagnosis shows that the sendctx queue is empty, but when sendctxs
are returned to the queue, the xprt_write_space wake-up never
occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.

I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
via an atomic bit. Just one of those is sufficient. Removing
EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
un-reproducible.

Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
is therefore removed.

Unfortunately this patch does not apply cleanly to stable. If
needed, someone will have to port it and test it.

Fixes: 2fad659209d5 ("xprtrdma: Wait on empty sendctx queue")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09 10:30:16 -04:00
Chuck Lever
1310051c72 xprtrdma: Replace use of xdr_stream_pos in rpcrdma_marshal_req
This is a latent bug. xdr_stream_pos works by subtracting
xdr_stream::nwords from xdr_buf::len. But xdr_stream::nwords is not
initialized by xdr_init_encode().

It works today only because all fields in rpcrdma_req::rl_stream
are initialized to zero by rpcrdma_req_create, making the
subtraction in xdr_stream_pos always a no-op.

I found this issue via code inspection. It was introduced by commit
39f4cd9e9982 ("xprtrdma: Harden chunk list encoding against send
buffer overflow"), but the code has changed enough since then that
this fix can't be automatically applied to stable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09 10:30:11 -04:00
Chuck Lever
10694ac92c xprtrdma: Fix a BUG when tracing is enabled with NFSv4.1 on RDMA
A backchannel reply does not set task->tk_client.

Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-02 16:29:21 -04:00
Chuck Lever
17e4c443c0 xprtrdma: Trace marshaling failures
Record an event when rpcrdma_marshal_req returns a non-zero return
value to help track down why an xprt close might have occurred.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25 15:19:56 -04:00
Chuck Lever
0c77668ddb SUNRPC: Introduce trace points in rpc_auth_gss.ko
Add infrastructure for trace points in the RPC_AUTH_GSS kernel
module, and add a few sample trace points. These report exceptional
or unexpected events, and observe the assignment of GSS sequence
numbers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14 09:20:40 -05:00
Chuck Lever
07e10308ee xprtrdma: Prevent leak of rpcrdma_rep objects
If a reply has been processed but the RPC is later retransmitted
anyway, the req->rl_reply field still contains the only pointer to
the old rpcrdma rep. When the next reply comes in, the reply handler
will stomp on the rl_reply field, leaking the old rep.

A trace event is added to capture such leaks.

This problem seems to be worsened by the restructuring of the RPC
Call path in v4.20. Fully addressing this issue will require at
least a re-architecture of the disconnect logic, which is not
appropriate during -rc.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:19 -05:00
Chuck Lever
53b2c1cb9b xprtrdma: Trace mapping, alloc, and dereg failures
These are rare, but can be helpful at tracking down DMAR and other
problems.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:18 -05:00
Chuck Lever
395069fc37 xprtrdma: Add trace points for calls to transport switch methods
Name them "trace_xprtrdma_op_*" so they can be easily enabled as a
group. No trace point is added where the generic layer already has
observability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:18 -05:00
Chuck Lever
aba1183179 xprtrdma: Clean up of xprtrdma chunk trace points
The chunk-related trace points capture nearly the same information
as the MR-related trace points.

Also, rename them so globbing can be used to enable or disable
these trace points more easily.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:18 -05:00
Chuck Lever
6ceea36890 xprtrdma: Refactor Receive accounting
Clean up: Divide the work cleanly:

- rpcrdma_wc_receive is responsible only for RDMA Receives
- rpcrdma_reply_handler is responsible only for RPC Replies
- the posted send and receive counts both belong in rpcrdma_ep

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:16 -05:00
Chuck Lever
470443e0b3 xprtrdma: Squelch a sparse warning
linux/include/trace/events/rpcrdma.h:501:1: warning: expression using sizeof bool
linux/include/trace/events/rpcrdma.h:501:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

Fixes: ab03eff58eb5 ("xprtrdma: Add trace points in RPC Call ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03 14:35:17 -04:00
Chuck Lever
f9521d53e8 xprtrdma: Rename rpcrdma_qp_async_error_upcall
Clean up: Use a function name that is consistent with the RDMA core
API and with other consumers. Because this is a function that is
invoked from outside the rpcrdma.ko module, add an appropriate
documenting comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-03 09:14:57 -04:00
Chuck Lever
ae38288eb7 xprtrdma: Rename rpcrdma_conn_upcall
Clean up: Use a function name that is consistent with the RDMA core
API and with other consumers. Because this is a function that is
invoked from outside the rpcrdma.ko module, add an appropriate
documenting comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02 16:14:05 -04:00
Chuck Lever
d379eaa838 xprtrdma: Name MR trace events consistently
Clean up the names of trace events related to MRs so that it's
easy to enable these with a glob.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02 15:55:36 -04:00
Chuck Lever
61da886bf7 xprtrdma: Explicitly resetting MRs is no longer necessary
When a memory operation fails, the MR's driver state might not match
its hardware state. The only reliable recourse is to dereg the MR.
This is done in ->ro_recover_mr, which then attempts to allocate a
fresh MR to replace the released MR.

Since commit e2ac236c0b651 ("xprtrdma: Allocate MRs on demand"),
xprtrdma dynamically allocates MRs. It can add more MRs whenever
they are needed.

That makes it possible to simply release an MR when a memory
operation fails, instead of "recovering" it. It will automatically
be replaced by the on-demand MR allocator.

This commit is a little larger than I wanted, but it replaces
->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
rb_stale_mrs list with a generic work queue.

Since MRs are no longer orphaned, the mrs_orphaned metric is no
longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-10-02 15:48:12 -04:00
Linus Torvalds
0725d4e1b8 NFS client updates for Linux 4.18
Highlights include:
 
 Stable fixes:
 - Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message
 - Fix a hang due to incorrect error returns in rpcrdma_convert_iovs()
 - Revert an incorrect change to the NFSv4.1 callback channel
 - Fix a bug in the NFSv4.1 sequence error handling
 
 Features and optimisations:
 - Support for piggybacking a LAYOUTGET operation to the OPEN compound
 - RDMA performance enhancements to deal with transport congestion
 - Add proper SPDX tags for NetApp-contributed RDMA source
 - Do not request delegated file attributes (size+change) from the server
 - Optimise away a GETATTR in the lookup revalidate code when doing NFSv4 OPEN
 - Optimise away unnecessary lookups for rename targets
 - Misc performance improvements when freeing NFSv4 delegations
 
 Bugfixes and cleanups:
 - Try to fail quickly if proto=rdma
 - Clean up RDMA receive trace points
 - Fix sillyrename to return the delegation when appropriate
 - Misc attribute revalidation fixes
 - Immediately clear the pNFS layout on a file when the server returns ESTALE
 - Return NFS4ERR_DELAY when delegation/layout recalls fail due to igrab()
 - Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbH8gIAAoJEA4mA3inWBJcpzYQAJYY3ykt9oLQgm/2b/D/weDe
 6890M9W5nIeuZq5soWSpYsZTxqIFbGV4laG/eCTW1gUN1TitSZsoOp7kqhRHXOjq
 Rv3ZvjlZsP2qv2SnzsEmhJsynfyB46d19smSTJhgQ8dnXhaZv04Wsd4krLHx0z6p
 uUUis5Q1m+vL7HsFPp3iUareO/DFKeSkw2cQ2V5ksTIEiAzX7GC+Ex/KKWf82nrJ
 hm7+Nq7rLf1QHJkQvsc3fYCMR4gIzEwUu6F8RyxCoAVgD6O90Hx6NbxnINaHDD4N
 U0nRP5LwCyN9hbPWvwcH7Sn4ePDTos2yj2tFO5NP9btTLDVLFSGYZ2a74d9PRdAf
 9jn6f6juSDwI7T6NXvkHzzkJG6Or9ABAUZo+yX5JoD6lmgOcPUJpLRy6fu7UxAuN
 a5OZ7d9edYpOi0Kys8sDSIlLlxZtFkvybOMVuI3dSHsI+c0g39w8oarpqT2wXWMs
 /ZtFz0FCreHhKkNtz7Z49z1UQHDv/XYM0WkcO+eaeK58RLIEE0pZHoMvPKP63lkI
 nbbgHvBRAu38Jtvvu65Hpb/VpBcqNGM5hjN1cfW/BOqAPKW23s4vWVj+/1silfW/
 uw0MkNrDC9endoALp/YMCcMwPvEw9Awt9y4KjMgfVgSnKwXd0HaSZ2zE6aJU3Wry
 Fy2Tv0e0OH3z9Bi/LNuJ
 =YWSl
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message

   - Fix a hang due to incorrect error returns in rpcrdma_convert_iovs()

   - Revert an incorrect change to the NFSv4.1 callback channel

   - Fix a bug in the NFSv4.1 sequence error handling

  Features and optimisations:

   - Support for piggybacking a LAYOUTGET operation to the OPEN compound

   - RDMA performance enhancements to deal with transport congestion

   - Add proper SPDX tags for NetApp-contributed RDMA source

   - Do not request delegated file attributes (size+change) from the
     server

   - Optimise away a GETATTR in the lookup revalidate code when doing
     NFSv4 OPEN

   - Optimise away unnecessary lookups for rename targets

   - Misc performance improvements when freeing NFSv4 delegations

  Bugfixes and cleanups:

   - Try to fail quickly if proto=rdma

   - Clean up RDMA receive trace points

   - Fix sillyrename to return the delegation when appropriate

   - Misc attribute revalidation fixes

   - Immediately clear the pNFS layout on a file when the server returns
     ESTALE

   - Return NFS4ERR_DELAY when delegation/layout recalls fail due to
     igrab()

   - Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY"

* tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (80 commits)
  skip LAYOUTRETURN if layout is invalid
  NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
  NFSv4: Fix a typo in nfs41_sequence_process
  NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..")
  NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab()
  NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()
  NFSv4.0: Remove transport protocol name from non-UCS client ID
  NFSv4.0: Remove cl_ipaddr from non-UCS client ID
  NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined
  NFS: Filter cache invalidation when holding a delegation
  NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()
  NFS: Improve caching while holding a delegation
  NFS: Fix attribute revalidation
  NFS: fix up nfs_setattr_update_inode
  NFSv4: Ensure the inode is clean when we set a delegation
  NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access
  NFSv4: Don't ask for delegated attributes when adding a hard link
  NFSv4: Don't ask for delegated attributes when revalidating the inode
  NFS: Pass the inode down to the getattr() callback
  NFSv4: Don't request size+change attribute if they are delegated to us
  ...
2018-06-12 10:09:03 -07:00