linux

iv/linux

Author	SHA1	Message	Date
Chuck Lever	e3eded5e81	svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom() This, to me, seems less cluttered and less redundant. I was hoping it could help reduce lock contention on the dto_q lock by reducing the size of the critical section, but alas, the only improvement is readability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:58:48 -04:00
Chuck Lever	5533c4f4b9	svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg These fields are no longer used. The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on x86_64, down from 2440 bytes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:57:48 -04:00
Chuck Lever	9af723be86	svcrdma: Remove sc_read_complete_q Now that svc_rdma_recvfrom() waits for Read completion, sc_read_complete_q is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-31 15:57:48 -04:00
Chuck Lever	7d81ee8722	svcrdma: Single-stage RDMA Read Currently the generic RPC server layer calls svc_rdma_recvfrom() twice to retrieve an RPC message that uses Read chunks. I'm not exactly sure why this design was chosen originally. Instead, let's wait for the Read chunk completion inline in the first call to svc_rdma_recvfrom(). The goal is to eliminate some page allocator churn. rdma_read_complete() replaces pages in the second svc_rqst by calling put_page() repeatedly while the upper layer waits for the request to be constructed, which adds unnecessary NFS WRITE round- trip latency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com>	2021-03-31 15:57:39 -04:00
Chuck Lever	82011c80b3	SUNRPC: Move svc_xprt_received() call sites Currently, XPT_BUSY is not cleared until xpo_recvfrom returns. That effectively blocks the receipt and handling of the next RPC message until the current one has been taken off the transport. This strict ordering is a requirement for socket transports. For our kernel RPC/RDMA transport implementation, however, dequeuing an ingress message is nothing more than a list_del(). The transport can safely be marked un-busy as soon as that is done. To keep the changes simpler, this patch just moves the svc_xprt_received() call site from svc_handle_xprt() into the transports, so that the actual optimization can be done in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	cc93ce9529	svcrdma: Retain the page backing rq_res.head[0].iov_base svc_rdma_sendto() now waits for the NIC hardware to finish with the pages backing rq_res. We still have to release the page array in some cases, but now it's always safe to immediately re-use the page backing rq_res's head buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	579900670a	svcrdma: Remove unused sc_pages field Clean up. This significantly reduces the size of struct svc_rdma_send_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	2a1e4f21d8	svcrdma: Normalize Send page handling Currently svc_rdma_sendto() migrates xdr_buf pages into a separate page list and NULLs out a bunch of entries in rq_pages while the pages are under I/O. The Send completion handler then frees those pages later. Instead, let's wait for the Send completion, then handle page releasing in the nfsd thread. I'd like to avoid the cost of 250+ put_page() calls in the Send completion handler, which is single- threaded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	e844d307d4	svcrdma: Add a "deferred close" helper Refactor a bit of commonly used logic so that every site that wants a close deferred to an nfsd thread does all the right things (set_bit(XPT_CLOSE) then enqueue). Also, once XPT_CLOSE is set on a transport, it is never cleared. If XPT_CLOSE is already set, then the close is already being handled and the enqueue can be skipped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	c558d47596	svcrdma: Maintain a Receive water mark Post more Receives when the number of pending Receives drops below a water mark. The batch mechanism is disabled if the underlying device cannot support a reasonably-sized Receive Queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	7b748c30cc	svcrdma: Use svc_rdma_refresh_recvs() in wc_receive Replace svc_rdma_post_recv() with the new batch receive mechanism. For the moment it is posting just a single Receive WR at a time, so no change in behavior is expected. Since svc_rdma_wc_receive() was the last call site for svc_rdma_post_recv(), it is removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	77f0a2aa5c	svcrdma: Add a batch Receive posting mechanism Introduce a server-side mechanism similar to commit `e340c2d6ef` ("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive WRs in batch. Its first consumer is svc_rdma_post_recvs(), which posts the initial set of Receive WRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 13:22:13 -04:00
Chuck Lever	c6b7ed8f94	svcrdma: Remove stale comment for svc_rdma_wc_receive() xprt pinning was removed in commit `365e9992b9` ("svcrdma: Remove transport reference counting"), but this comment was not updated to reflect that change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Chuck Lever	270f25edcc	svcrdma: Provide an explanatory comment in CMA event handler Clean up: explain why svc_xprt_enqueue() is invoked in the event handler even though no xpt_flags bits are toggled here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Chuck Lever	072db263e1	svcrdma: RPCDBG_FACILITY is no longer used Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:05 -04:00
Chuck Lever	bade4be69a	svcrdma: Revert "svcrdma: Reduce Receive doorbell rate" I tested commit `43042b90ca` ("svcrdma: Reduce Receive doorbell rate") with mlx4 (IB) and software iWARP and didn't find any issues. However, I recently got my hardware iWARP setup back on line (FastLinQ) and it's crashing hard on this commit (confirmed via bisect). The failure mode is complex. - After a connection is established, the first Receive completes normally. - But the second and third Receives have garbage in their Receive buffers. The server responds with ERR_VERS as a result. - When the client tears down the connection to retry, a couple of posted Receives flush twice, and that corrupts the recv_ctxt free list. - __svc_rdma_free then faults or loops infinitely while destroying the xprt's recv_ctxts. Since `43042b90ca` ("svcrdma: Reduce Receive doorbell rate") does not fix a bug but is a scalability enhancement, it's safe and appropriate to revert it while working on a replacement. Fixes: `43042b90ca` ("svcrdma: Reduce Receive doorbell rate") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-11 15:26:07 -05:00
Timo Rothenpieler	6820bf7786	svcrdma: disable timeouts on rdma backchannel This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Fixes: `5d252f90a8` ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-06 16:41:48 -05:00
Linus Torvalds	1c9077cdec	NFS Client Updates for Linux 5.12 - New Features: - Support for eager writes, and the write=eager and write=wait mount options - Other Bugfixes and Cleanups: - Fix typos in some comments - Fix up fall-through warnings for Clang - Cleanups to the NFS readpage codepath - Remove FMR support in rpcrdma_convert_iovs() - Various other cleanups to xprtrdma - Fix xprtrdma pad optimization for servers that don't support RFC 8797 - Improvements to rpcrdma tracepoints - Fix up nfs4_bitmask_adjust() - Optimize sparse writes past the end of files -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmAwOLwACgkQ18tUv7Cl QOsUfw//W2KoJ+2IQohQNFcoi+bG1OQE7jnqHtQ+tsKfpJKemcDcu8wQEAqrwALg vXioG1Ye0QU7P5PZtNxCorylqSTVGvJSIOrfa3lTdn/PDbI7NIgN52w56TzzfeXn pJ4gDwZzPwUFUblF0LBQUIhJv5IQvOXVgUsMqezbIbMXSiuLR/bjnZ96Q/woKpoL eg2IZ5EO9Jb0QjuQ1e9U303X7c2qOl1jzpxyQLQfD7ONnWBx3HnJk1l+3JJRi8JV smnae3I0L3nUZ7rBqoqsvK7YUjUchCEBvkmEMsnHT94D5tI9mxxX5OquREee6QHn NuJRSNbsIiCD3Ne27fkCut78d6SetoMko7jZ97T6smhyijtXJiLG/6dycMPV9rt/ bVIudWMm9/A9AsXyY2YP5LC6Y6W6dhQRXygUjVgEPBl6kVsb2Eca8IA9QZghF9IL +XSEulASvxo2rWPylJJ+3aLynfqoHrowVN/Tu61svDnJWTcb+FCxQ5zyLox7erEH mUhraf1D0uoX9odH1069toN6favZFE6SIDvlUk1QTOjr6p3Jxmkuyl6PNs5t66/S 550z5JVb2deIHOPQxOie7xz/Dk6dnRoaFhTNq/Ootkt9GNe0A+NqSUdoRA5XxN5m wW11ecLSZSehDksuXjyFmkHtkagLreFxLsHbVnaAtwEm7h/thRI= =Dssn -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS Client Updates from Anna Schumaker: "New Features: - Support for eager writes, and the write=eager and write=wait mount options - Other Bugfixes and Cleanups: - Fix typos in some comments - Fix up fall-through warnings for Clang - Cleanups to the NFS readpage codepath - Remove FMR support in rpcrdma_convert_iovs() - Various other cleanups to xprtrdma - Fix xprtrdma pad optimization for servers that don't support RFC 8797 - Improvements to rpcrdma tracepoints - Fix up nfs4_bitmask_adjust() - Optimize sparse writes past the end of files" * tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Support the '-owrite=' option in /proc/self/mounts and mountinfo NFS: Set the stable writes flag when initialising the super block NFS: Add mount options supporting eager writes NFS: Add support for eager writes NFS: 'flags' field should be unsigned in struct nfs_server NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache NFS: Always clear an invalid mapping when attempting a buffered write NFS: Optimise sparse writes past the end of file NFS: Fix documenting comment for nfs_revalidate_file_size() NFSv4: Fixes for nfs4_bitmask_adjust() xprtrdma: Clean up rpcrdma_prepare_readch() rpcrdma: Capture bytes received in Receive completion tracepoints xprtrdma: Pad optimization, revisited rpcrdma: Fix comments about reverse-direction operation xprtrdma: Refactor invocations of offset_in_page() xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() xprtrdma: Remove FMR support in rpcrdma_convert_iovs() NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() NFS: Call readpage_async_filler() from nfs_readpage_async() NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc ...	2021-02-26 09:17:24 -08:00
Linus Torvalds	7c70f3a748	Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac= =+R4U -----END PGP SIGNATURE----- Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first	2021-02-22 13:29:55 -08:00
Chuck Lever	0ac24c320c	svcrdma: Hold private mutex while invoking rdma_accept() RDMA core mutex locking was restructured by commit `d114c6feed` ("RDMA/cma: Add missing locking to rdma_accept()") [Aug 2020]. When lock debugging is enabled, the RPC/RDMA server trips over the new lockdep assertion in rdma_accept() because it doesn't call rdma_accept() from its CM event handler. As a temporary fix, have svc_rdma_accept() take the handler_mutex explicitly. In the meantime, let's consider how to restructure the RPC/RDMA transport to invoke rdma_accept() from the proper context. Calls to svc_rdma_accept() are serialized with calls to svc_rdma_free() by the generic RPC server layer. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-rdma/20210209154014.GO4247@nvidia.com/ Fixes: `d114c6feed` ("RDMA/cma: Add missing locking to rdma_accept()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-02-15 10:45:00 -05:00
Chuck Lever	586a0787ce	xprtrdma: Clean up rpcrdma_prepare_readch() Since commit `9ed5af268e` ("SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client passes payload data to the transport with the padding in xdr->pages instead of in the send buffer's tail kvec. There's no need for the extra logic to advance the base of the tail kvec because the upper layer no longer places XDR padding there. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 15:54:03 -05:00
Chuck Lever	2324fbedc2	xprtrdma: Pad optimization, revisited The NetApp Linux team discovered that with NFS/RDMA servers that do not support RFC 8797, the Linux client is forming NFSv4.x WRITE requests incorrectly. In this case, the Linux NFS client disables implicit chunk round-up for odd-length Read and Write chunks. The goal was to support old servers that needed that padding to be sent explicitly by clients. In that case the Linux NFS included the tail kvec in the Read chunk, since the tail contains any needed padding. That meant a separate memory registration is needed for the tail kvec, adding to the cost of forming such requests. To avoid that cost for a mere 3 bytes of zeroes that are always ignored by receivers, we try to use implicit roundup when possible. For NFSv4.x, the tail kvec also sometimes contains a trailing GETATTR operation. The Linux NFS client unintentionally includes that GETATTR operation in the Read chunk as well as inline. The fix is simply to /never/ include the tail kvec when forming a data payload Read chunk. The padding is thus now always present. Note that since commit `9ed5af268e` ("SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client passes payload data to the transport with the padding in xdr->pages instead of in the send buffer's tail kvec. So now the Linux NFS client appends XDR padding to all odd-sized Read chunks. This shouldn't be a problem because: - RFC 8166-compliant servers are supposed to work with or without that XDR padding in Read chunks. - Since the padding is now in the same memory region as the data payload, a separate memory registration is not needed. In addition, the link layer extends data in RDMA Read responses to 4-byte boundaries anyway. Thus there is now no savings when the padding is not included. Because older kernels include the payload's XDR padding in the tail kvec, a fix there will be more complicated. Thus backporting this patch is not recommended. Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 11:16:56 -05:00
Chuck Lever	84dff5eb86	rpcrdma: Fix comments about reverse-direction operation During the final stages of publication of RFC 8167, reviewers requested that we use the term "reverse direction" rather than "backwards direction". Update comments to reflect this preference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 11:16:56 -05:00
Chuck Lever	67b16625d1	xprtrdma: Refactor invocations of offset_in_page() Clean up so that offset_in_page() is invoked less often in the most common case, which is mapping xdr->pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 11:16:56 -05:00
Chuck Lever	54e6aec57c	xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() Clean up. Remove a conditional branch from the SGL set-up loop in frwr_map(): Instead of using either sg_set_page() or sg_set_buf(), initialize the mr_page field properly when rpcrdma_convert_kvec() converts the kvec to an SGL entry. frwr_map() can then invoke sg_set_page() unconditionally. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 11:16:55 -05:00
Chuck Lever	9929f4adce	xprtrdma: Remove FMR support in rpcrdma_convert_iovs() Support for FMR was removed by commit `ba69cd122e` ("xprtrdma: Remove support for FMR memory registration") [Dec 2018]. That means the buffer-splitting behavior of rpcrdma_convert_kvec(), added by commit `821c791a0b` ("xprtrdma: Segment head and tail XDR buffers on page boundaries") [Mar 2016], is no longer necessary. FRWR memory registration handles this case with aplomb. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2021-02-05 11:16:55 -05:00
Chuck Lever	dd2d055b27	svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom() The Receive completion handler doesn't look at the contents of the Receive buffer. The DMA sync isn't terribly expensive but it's one less thing that needs to be done by the Receive completion handler, which is single-threaded (per svc_xprt). This helps scalability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-01-25 09:36:28 -05:00
Chuck Lever	43042b90ca	svcrdma: Reduce Receive doorbell rate This is similar to commit `e340c2d6ef` ("xprtrdma: Reduce the doorbell rate (Receive)") which added Receive batching to the client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Chuck Lever	c6226ff9a6	svcrdma: Deprecate stat variables that are no longer used Clean up. We are not permitted to remove old proc files. Instead, convert these variables to stubs that are only ever allowed to display a value of zero. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Chuck Lever	1e7e557316	svcrdma: Restore read and write stats Now that we have an efficient mechanism to update these two stats, let's start maintaining them again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Chuck Lever	22df5a2246	svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Chuck Lever	df971cd853	svcrdma: Convert rdma_stat_recv to a per-CPU counter Receives are frequent events. Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Chuck Lever	59a00257c6	svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up() Setting up the proc variables is about to get more complicated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Linus Torvalds	74f602dc96	NFS client updates for Linux 5.11 Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support. - A series of patches to improve readdir performance, particularly with large directories. - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers. - Micro-optimisations for RDMA. - RDMA tracing improvements. Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni). - Various fixes for getxattr and listxattr, when used over non-TCP transports. - Fixes for containerised NFS from Sargun Dhillon. - switch nfsiod to be an UNBOUND workqueue (Neil Brown). - READDIR should not ask for security label information if there is no LSM policy. (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay). - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code. - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck= =IKWG -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support - A series of patches to improve readdir performance, particularly with large directories - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers - Micro-optimisations for RDMA - RDMA tracing improvements Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni) - Various fixes for getxattr and listxattr, when used over non-TCP transports - Fixes for containerised NFS from Sargun Dhillon - switch nfsiod to be an UNBOUND workqueue (Neil Brown) - READDIR should not ask for security label information if there is no LSM policy (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay) - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes" * tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read NFSv4/pnfs: Add tracing for the deviceid cache fs/lockd: convert comma to semicolon NFSv4.2: fix error return on memory allocation failure NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer NFSv4.2: decode_read_plus_hole() needs to check the extent offset NFSv4.2: decode_read_plus_data() must skip padding after data segment NFSv4.2: Ensure we always reset the result->count in decode_read_plus() SUNRPC: When expanding the buffer, we may need grow the sparse pages SUNRPC: Cleanup - constify a number of xdr_buf helpers SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field SUNRPC: _copy_to/from_pages() now check for zero length SUNRPC: Cleanup xdr_shrink_bufhead() SUNRPC: Fix xdr_expand_hole() SUNRPC: Fixes for xdr_align_data() SUNRPC: _shift_data_left/right_pages should check the shift length ...	2020-12-17 12:15:03 -08:00
Trond Myklebust	edffb84cc8	NFSoRDmA Client updates for Linux 5.11 Cleanups and improvements: - Remove use of raw kernel memory addresses in tracepoints - Replace dprintk() call sites in ERR_CHUNK path - Trace unmap sync calls - Optimize MR DMA-unmapping -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl/SlTcACgkQ18tUv7Cl QOsc+xAA0qmDLbdZShFAiip/jgFvHkoIJzVcil5++xiRY77xOjR2rgkgBSv4hFYm XWBidP/eQm3n5r0S+LG0zudcHnaKBonZ0UV2j32PelMEnvn3H9qlSJYneEm9xv2O K1koNcbunJH8JRsLxbStdMnOnNwCmB+HzIjnWr87OgXkYucqDBBxt3RMyZVD46QU GypnItAtLns4Oacsw0TyPAPAxstjZ71YRlTMhtfyIYEJfvxUVRYRKvs7nPei9eCa 0uCzoKb28F7CWy+S9so7wSjfdDu4G+FmAfMvJbQF4irhgZBvzyzj4jq4hCnv377S XF6Eygm0Z4BSWNoAnRjgLx3Bo4ps4qZi7iAj8cUAksEKQ3QI2BWTWR3e07YwLfnm iwhWougP76zbtC3lNA+4D2yaYwA1LtN4IYp19KmS/0LyRe5EBry39ccE/eE508JA BGnCohawI9ya7WT3xTne9lyxNGhjm/jNDKt0ax0ze4hwIqSGekqhUzxQ0bl4suQz ZHP2gUEad07jZyy8gCcpEvvxA3fFW241vaG/hN34Dy47eHpAQDQIMk7611YRQXH/ jCBHoym1dHXc8eNa+gTmrjXdhVdgHo4zI0ppeaQvA2Ss0nPp8N8H2nJV/as536Dp on0b1zAF4o5uzGlEzHu+FuFGqUPE9My9/dNDqw3o0Cncu7adffk= =R45y -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-5.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next NFSoRDmA Client updates for Linux 5.11 Cleanups and improvements: - Remove use of raw kernel memory addresses in tracepoints - Replace dprintk() call sites in ERR_CHUNK path - Trace unmap sync calls - Optimize MR DMA-unmapping Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-15 20:08:41 -05:00
Chuck Lever	15261b9126	xprtrdma: Fix XDRBUF_SPARSE_PAGES support Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	d5aa6b22e2	SUNRPC: xprt_load_transport() needs to support the netid "rdma6" According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: `181342c5eb` ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Chuck Lever	d7cc739726	svcrdma: support multiple Read chunks per RPC An efficient way to handle multiple Read chunks is to post them all together and then take a single completion. This is also how the code is already structured: when the Read completion fires, all portions of the incoming RPC message are available to be assembled. The difficult problem is setting up the Read sink buffers so that the server pulls the client's data into place, making subsequent pull-up unnecessary. There are several cases: * No Read chunks. No-op. * One data item Read chunk. This is the fast case, where the inline part of the RPC-over-RDMA message becomes the head and tail, and the data item chunk is placed in buf->pages. * A Position-zero Read chunk. Treated like TCP: the Read chunk is pulled into contiguous pages. + A Position-zero Read chunk with data item chunks. Treated like TCP: all of the Read chunks are pulled into contiguous pages. + Multiple data item chunks. Treated like TCP: the inline part is copied and the data item chunks are pulled into contiguous pages. The "*" cases are already supported. This patch adds support for the "+" cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	d96962e6d0	svcrdma: Use the new parsed chunk list when pulling Read chunks As a pre-requisite for handling multiple Read chunks in each Read list, convert svc_rdma_recv_read_chunk() to use the new parsed Read chunk list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	bafe9c27d5	svcrdma: Rename info::ri_chunklen I'm about to change the purpose of ri_chunklen: Instead of tracking the number of bytes in one Read chunk, it will track the total number of bytes in the Read list. Rename it for clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	b704be09dc	svcrdma: Clean up chunk tracepoints We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Reads, let's fire one tracepoint before posting each type of chunk. So we'll get: nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=0 position=0 192@0x013ca9ebfae14000:0xb0010b05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=1 position=0 7688@0x013ca9ebf914e000:0xb0010a05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=2 position=0 28@0x013ca9ebfae15000:0xb0010905 nfsd-1998 [002] 321.666622: svcrdma_decode_rqst: cq.id=4 cid=42 xid=0x013ca9eb vers=1 credits=128 proc=RDMA_NOMSG hdrlen=100 nfsd-1998 [002] 321.666642: svcrdma_post_read_chunk: cq.id=3 cid=112 sqecount=3 kworker/2:1H-221 [002] 321.673949: svcrdma_wc_read: cq.id=3 cid=112 status=SUCCESS (0/0x0) Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	7954c8503b	svcrdma: Remove chunk list pointers Clean up: These pointers are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	41bc163ffe	svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunk Refactor svc_rdma_send_reply_chunk() so that it Sends only the parts of rq_res that do not contain a result payload. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	2371bcc056	svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg() Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload. This change has been tested to confirm that it does not cause a regression in the no Write chunk and single Write chunk cases. Multiple Write chunks have not been tested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	9d0b09d5ef	svcrdma: Support multiple write chunks when pulling up When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the pull-up buffer, result payloads are not to be copied to the Send buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	6911f3e10c	svcrdma: Use parsed chunk lists to encode Reply transport headers Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use the new parsed Write list and Reply chunk, which are version- agnostic and already XDR decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	7a1cbfa180	svcrdma: Use parsed chunk lists to construct RDMA Writes Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the Write list and Reply chunk, which are version-agnostic and already XDR-decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	58b2e0fefa	svcrdma: Use parsed chunk lists to detect reverse direction replies Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Note that the XID sanity test is also removed. The XID is already looked up by the cb handler, and is rejected if it's not recognized. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	eb3de6a49d	svcrdma: Use parsed chunk lists to derive the inv_rkey Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	78147ca8b4	svcrdma: Add a "parsed chunk list" data structure This simple data structure binds the location of each data payload inside of an RPC message to the chunk that will be used to push it to or pull it from the client. There are several benefits to this small additional overhead: * It enables support for more than one chunk in incoming Read and Write lists. * It translates the version-specific on-the-wire format into a generic in-memory structure, enabling support for multiple versions of the RPC/RDMA transport protocol. * It enables the server to re-organize a chunk list if it needs to adjust where Read chunk data lands in server memory without altering the contents of the XDR-encoded Receive buffer. Construction of these lists is done while sanity checking each incoming RPC/RDMA header. Subsequent patches will make use of the generated data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00

1 2 3 4 5 ...

1146 Commits