NFS client updates for Linux 4.9

Highlights include:
 
 Stable bugfixes:
 - sunrpc: fix writ espace race causing stalls
 - NFS: Fix inode corruption in nfs_prime_dcache()
 - NFSv4: Don't report revoked delegations as valid in
   nfs_have_delegation()
 - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is
   invalid
 - NFSv4: Open state recovery must account for file permission changes
 - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
 
 Features:
 - Add support for tracking multiple layout types with an ordered list
 - Add support for using multiple backchannel threads on the client
 - Add support for pNFS file layout session trunking
 - Delay xprtrdma use of DMA API (for device driver removal)
 - Add support for xprtrdma remote invalidation
 - Add support for larger xprtrdma inline thresholds
 - Use a scatter/gather list for sending xprtrdma RPC calls
 - Add support for the CB_NOTIFY_LOCK callback
 - Improve hashing sunrpc auth_creds by using both uid and gid
 
 Bugfixes:
 - Fix xprtrdma use of DMA API
 - Validate filenames before adding to the dcache
 - Fix corruption of xdr->nwords in xdr_copy_to_scratch
 - Fix setting buffer length in xdr_set_next_buffer()
 - Don't deadlock the state manager on the SEQUENCE status flags
 - Various delegation and stateid related fixes
 - Retry operations if an interrupted slot receives EREMOTEIO
 - Make nfs boot time y2038 safe
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX/+ZfAAoJENfLVL+wpUDr5MUP/16s2Kp9ZZZZ7ICi3yrHOzb0
 9WpCOmbKUIELXl8YgkxlvPUYMzTQTIc32TwbVgdFV0g41my/0+O3z3+IiTrUGxH5
 8LgouMWBZ9KKmyUB//+KQAXr3j/bvDdF6Li6wJfz8a2o+9xT4oTkK1+Js8p0kn6e
 HNKfRknfCKwvE+j4tPCLfs2RX5qDyBFILXwWhj1fAbmT3rbnp+QqkXD4mWUrXb9z
 DBgxciXRhOkOQQAD2KQBFd2kUqWDZ5ED23b+aYsu9D3VCW45zitBqQFAxkQWL0hp
 x8Mp+MDCxlgdEaGQPUmUiDtPkG1X9ZxUJCAwaJWWsZaItwR2Il+en2sETctnTZ1X
 0IAxZVFdolzSeLzIfNx3OG32JdWJdaNjUzkIZam8gO6i1f6PAmK4alR0J3CT31nJ
 /OEN76o1E7acGWRMmj+MAZ2U5gPfR7EitOzyE8ZUPcHgyeGMiynjwi56WIpeSvT2
 F/Sp5kRe5+D5gtnYuppGp7Srp5vYdtFaz1zgPDUKpDLcxfDweO8AHGjJf3Zmrunx
 X24yia4A14CnfcUy4vKpISXRykmkG/3Z0tpWwV53uXZm4nlQfRc7gPibiW7Ay521
 af8sDoItW98K3DK5NQU7IUn83ua1TStzpoqlAEafRw//g9zPMTbhHvNvOyrRfrcX
 kjWn6hNblMu9M34JOjtu
 =XOrF
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - sunrpc: fix writ espace race causing stalls
   - NFS: Fix inode corruption in nfs_prime_dcache()
   - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
   - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
   - NFSv4: Open state recovery must account for file permission changes
   - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

  Features:
   - Add support for tracking multiple layout types with an ordered list
   - Add support for using multiple backchannel threads on the client
   - Add support for pNFS file layout session trunking
   - Delay xprtrdma use of DMA API (for device driver removal)
   - Add support for xprtrdma remote invalidation
   - Add support for larger xprtrdma inline thresholds
   - Use a scatter/gather list for sending xprtrdma RPC calls
   - Add support for the CB_NOTIFY_LOCK callback
   - Improve hashing sunrpc auth_creds by using both uid and gid

  Bugfixes:
   - Fix xprtrdma use of DMA API
   - Validate filenames before adding to the dcache
   - Fix corruption of xdr->nwords in xdr_copy_to_scratch
   - Fix setting buffer length in xdr_set_next_buffer()
   - Don't deadlock the state manager on the SEQUENCE status flags
   - Various delegation and stateid related fixes
   - Retry operations if an interrupted slot receives EREMOTEIO
   - Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  fs: nfs: Make nfs boot time y2038 safe
  sunrpc: replace generic auth_cred hash with auth-specific function
  sunrpc: add RPCSEC_GSS hash_cred() function
  sunrpc: add auth_unix hash_cred() function
  sunrpc: add generic_auth hash_cred() function
  sunrpc: add hash_cred() function to rpc_authops struct
  Retry operation on EREMOTEIO on an interrupted slot
  pNFS: Fix atime updates on pNFS clients
  sunrpc: queue work on system_power_efficient_wq
  NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
  NFSv4: If recovery failed for a specific open stateid, then don't retry
  NFSv4: Fix retry issues with nfs41_test/free_stateid
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: Mark the lock and open stateids as invalid after freeing them
  NFSv4: Don't test open_stateid unless it is set
  NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
  NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
  NFSv4: Fix a race when updating an open_stateid
  NFSv4: Fix a race in nfs_inode_reclaim_delegation()
  ...
This commit is contained in:
Linus Torvalds 2016-10-13 21:28:20 -07:00
commit c4a86165d1
58 changed files with 2217 additions and 986 deletions

View File

@ -2470,6 +2470,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nfsrootdebug [NFS] enable nfsroot debugging messages. nfsrootdebug [NFS] enable nfsroot debugging messages.
See Documentation/filesystems/nfs/nfsroot.txt. See Documentation/filesystems/nfs/nfsroot.txt.
nfs.callback_nr_threads=
[NFSv4] set the total number of threads that the
NFS client will assign to service NFSv4 callback
requests.
nfs.callback_tcpport= nfs.callback_tcpport=
[NFS] set the TCP port on which the NFSv4 callback [NFS] set the TCP port on which the NFSv4 callback
channel should listen. channel should listen.
@ -2493,6 +2498,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
of returning the full 64-bit number. of returning the full 64-bit number.
The default is to return 64-bit inode numbers. The default is to return 64-bit inode numbers.
nfs.max_session_cb_slots=
[NFSv4.1] Sets the maximum number of session
slots the client will assign to the callback
channel. This determines the maximum number of
callbacks the client will process in parallel for
a particular server.
nfs.max_session_slots= nfs.max_session_slots=
[NFSv4.1] Sets the maximum number of session slots [NFSv4.1] Sets the maximum number of session slots
the client will attempt to negotiate with the server. the client will attempt to negotiate with the server.

View File

@ -76,7 +76,7 @@ static void nfs_dns_cache_revisit(struct cache_deferred_req *d, int toomany)
dreq = container_of(d, struct nfs_cache_defer_req, deferred_req); dreq = container_of(d, struct nfs_cache_defer_req, deferred_req);
complete_all(&dreq->completion); complete(&dreq->completion);
nfs_cache_defer_req_put(dreq); nfs_cache_defer_req_put(dreq);
} }

View File

@ -31,8 +31,6 @@
struct nfs_callback_data { struct nfs_callback_data {
unsigned int users; unsigned int users;
struct svc_serv *serv; struct svc_serv *serv;
struct svc_rqst *rqst;
struct task_struct *task;
}; };
static struct nfs_callback_data nfs_callback_info[NFS4_MAX_MINOR_VERSION + 1]; static struct nfs_callback_data nfs_callback_info[NFS4_MAX_MINOR_VERSION + 1];
@ -89,15 +87,6 @@ nfs4_callback_svc(void *vrqstp)
return 0; return 0;
} }
/*
* Prepare to bring up the NFSv4 callback service
*/
static struct svc_rqst *
nfs4_callback_up(struct svc_serv *serv)
{
return svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
}
#if defined(CONFIG_NFS_V4_1) #if defined(CONFIG_NFS_V4_1)
/* /*
* The callback service for NFSv4.1 callbacks * The callback service for NFSv4.1 callbacks
@ -139,29 +128,6 @@ nfs41_callback_svc(void *vrqstp)
return 0; return 0;
} }
/*
* Bring up the NFSv4.1 callback service
*/
static struct svc_rqst *
nfs41_callback_up(struct svc_serv *serv)
{
struct svc_rqst *rqstp;
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
rqstp = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
dprintk("--> %s return %d\n", __func__, PTR_ERR_OR_ZERO(rqstp));
return rqstp;
}
static void nfs_minorversion_callback_svc_setup(struct svc_serv *serv,
struct svc_rqst **rqstpp, int (**callback_svc)(void *vrqstp))
{
*rqstpp = nfs41_callback_up(serv);
*callback_svc = nfs41_callback_svc;
}
static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt, static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv) struct svc_serv *serv)
{ {
@ -173,13 +139,6 @@ static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
xprt->bc_serv = serv; xprt->bc_serv = serv;
} }
#else #else
static void nfs_minorversion_callback_svc_setup(struct svc_serv *serv,
struct svc_rqst **rqstpp, int (**callback_svc)(void *vrqstp))
{
*rqstpp = ERR_PTR(-ENOTSUPP);
*callback_svc = ERR_PTR(-ENOTSUPP);
}
static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt, static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv) struct svc_serv *serv)
{ {
@ -189,45 +148,22 @@ static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv) struct svc_serv *serv)
{ {
struct svc_rqst *rqstp; int nrservs = nfs_callback_nr_threads;
int (*callback_svc)(void *vrqstp);
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
int ret; int ret;
nfs_callback_bc_serv(minorversion, xprt, serv); nfs_callback_bc_serv(minorversion, xprt, serv);
if (cb_info->task) if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS)
nrservs = NFS4_MIN_NR_CALLBACK_THREADS;
if (serv->sv_nrthreads-1 == nrservs)
return 0; return 0;
switch (minorversion) { ret = serv->sv_ops->svo_setup(serv, NULL, nrservs);
case 0: if (ret) {
/* v4.0 callback setup */ serv->sv_ops->svo_setup(serv, NULL, 0);
rqstp = nfs4_callback_up(serv);
callback_svc = nfs4_callback_svc;
break;
default:
nfs_minorversion_callback_svc_setup(serv,
&rqstp, &callback_svc);
}
if (IS_ERR(rqstp))
return PTR_ERR(rqstp);
svc_sock_update_bufs(serv);
cb_info->serv = serv;
cb_info->rqst = rqstp;
cb_info->task = kthread_create(callback_svc, cb_info->rqst,
"nfsv4.%u-svc", minorversion);
if (IS_ERR(cb_info->task)) {
ret = PTR_ERR(cb_info->task);
svc_exit_thread(cb_info->rqst);
cb_info->rqst = NULL;
cb_info->task = NULL;
return ret; return ret;
} }
rqstp->rq_task = cb_info->task;
wake_up_process(cb_info->task);
dprintk("nfs_callback_up: service started\n"); dprintk("nfs_callback_up: service started\n");
return 0; return 0;
} }
@ -281,19 +217,41 @@ err_bind:
return ret; return ret;
} }
static struct svc_serv_ops nfs_cb_sv_ops = { static struct svc_serv_ops nfs40_cb_sv_ops = {
.svo_function = nfs4_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_enqueue_xprt = svc_xprt_do_enqueue,
.svo_setup = svc_set_num_threads,
.svo_module = THIS_MODULE,
}; };
#if defined(CONFIG_NFS_V4_1)
static struct svc_serv_ops nfs41_cb_sv_ops = {
.svo_function = nfs41_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
.svo_setup = svc_set_num_threads,
.svo_module = THIS_MODULE,
};
struct svc_serv_ops *nfs4_cb_sv_ops[] = {
[0] = &nfs40_cb_sv_ops,
[1] = &nfs41_cb_sv_ops,
};
#else
struct svc_serv_ops *nfs4_cb_sv_ops[] = {
[0] = &nfs40_cb_sv_ops,
[1] = NULL,
};
#endif
static struct svc_serv *nfs_callback_create_svc(int minorversion) static struct svc_serv *nfs_callback_create_svc(int minorversion)
{ {
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion]; struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
struct svc_serv *serv; struct svc_serv *serv;
struct svc_serv_ops *sv_ops;
/* /*
* Check whether we're already up and running. * Check whether we're already up and running.
*/ */
if (cb_info->task) { if (cb_info->serv) {
/* /*
* Note: increase service usage, because later in case of error * Note: increase service usage, because later in case of error
* svc_destroy() will be called. * svc_destroy() will be called.
@ -302,6 +260,17 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
return cb_info->serv; return cb_info->serv;
} }
switch (minorversion) {
case 0:
sv_ops = nfs4_cb_sv_ops[0];
break;
default:
sv_ops = nfs4_cb_sv_ops[1];
}
if (sv_ops == NULL)
return ERR_PTR(-ENOTSUPP);
/* /*
* Sanity check: if there's no task, * Sanity check: if there's no task,
* we should be the first user ... * we should be the first user ...
@ -310,11 +279,12 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n",
cb_info->users); cb_info->users);
serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, &nfs_cb_sv_ops); serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops);
if (!serv) { if (!serv) {
printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); printk(KERN_ERR "nfs_callback_create_svc: create service failed\n");
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
} }
cb_info->serv = serv;
/* As there is only one thread we need to over-ride the /* As there is only one thread we need to over-ride the
* default maximum of 80 connections * default maximum of 80 connections
*/ */
@ -357,6 +327,8 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
* thread exits. * thread exits.
*/ */
err_net: err_net:
if (!cb_info->users)
cb_info->serv = NULL;
svc_destroy(serv); svc_destroy(serv);
err_create: err_create:
mutex_unlock(&nfs_callback_mutex); mutex_unlock(&nfs_callback_mutex);
@ -374,18 +346,18 @@ err_start:
void nfs_callback_down(int minorversion, struct net *net) void nfs_callback_down(int minorversion, struct net *net)
{ {
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion]; struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
struct svc_serv *serv;
mutex_lock(&nfs_callback_mutex); mutex_lock(&nfs_callback_mutex);
nfs_callback_down_net(minorversion, cb_info->serv, net); serv = cb_info->serv;
nfs_callback_down_net(minorversion, serv, net);
cb_info->users--; cb_info->users--;
if (cb_info->users == 0 && cb_info->task != NULL) { if (cb_info->users == 0) {
kthread_stop(cb_info->task); svc_get(serv);
dprintk("nfs_callback_down: service stopped\n"); serv->sv_ops->svo_setup(serv, NULL, 0);
svc_exit_thread(cb_info->rqst); svc_destroy(serv);
dprintk("nfs_callback_down: service destroyed\n"); dprintk("nfs_callback_down: service destroyed\n");
cb_info->serv = NULL; cb_info->serv = NULL;
cb_info->rqst = NULL;
cb_info->task = NULL;
} }
mutex_unlock(&nfs_callback_mutex); mutex_unlock(&nfs_callback_mutex);
} }

View File

@ -179,6 +179,15 @@ extern __be32 nfs4_callback_devicenotify(
struct cb_devicenotifyargs *args, struct cb_devicenotifyargs *args,
void *dummy, struct cb_process_state *cps); void *dummy, struct cb_process_state *cps);
struct cb_notify_lock_args {
struct nfs_fh cbnl_fh;
struct nfs_lowner cbnl_owner;
bool cbnl_valid;
};
extern __be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args,
void *dummy,
struct cb_process_state *cps);
#endif /* CONFIG_NFS_V4_1 */ #endif /* CONFIG_NFS_V4_1 */
extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *); extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args, extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
@ -198,6 +207,9 @@ extern void nfs_callback_down(int minorversion, struct net *net);
#define NFS41_BC_MIN_CALLBACKS 1 #define NFS41_BC_MIN_CALLBACKS 1
#define NFS41_BC_MAX_CALLBACKS 1 #define NFS41_BC_MAX_CALLBACKS 1
#define NFS4_MIN_NR_CALLBACK_THREADS 1
extern unsigned int nfs_callback_set_tcpport; extern unsigned int nfs_callback_set_tcpport;
extern unsigned short nfs_callback_nr_threads;
#endif /* __LINUX_FS_NFS_CALLBACK_H */ #endif /* __LINUX_FS_NFS_CALLBACK_H */

View File

@ -628,4 +628,20 @@ out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status)); dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status; return status;
} }
__be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args, void *dummy,
struct cb_process_state *cps)
{
if (!cps->clp) /* set in cb_sequence */
return htonl(NFS4ERR_OP_NOT_IN_SESSION);
dprintk_rcu("NFS: CB_NOTIFY_LOCK request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));
/* Don't wake anybody if the string looked bogus */
if (args->cbnl_valid)
__wake_up(&cps->clp->cl_lock_waitq, TASK_NORMAL, 0, args);
return htonl(NFS4_OK);
}
#endif /* CONFIG_NFS_V4_1 */ #endif /* CONFIG_NFS_V4_1 */

View File

@ -35,6 +35,7 @@
(1 + 3) * 4) // seqid, 3 slotids (1 + 3) * 4) // seqid, 3 slotids
#define CB_OP_RECALLANY_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ) #define CB_OP_RECALLANY_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#define CB_OP_RECALLSLOT_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ) #define CB_OP_RECALLSLOT_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#define CB_OP_NOTIFY_LOCK_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#endif /* CONFIG_NFS_V4_1 */ #endif /* CONFIG_NFS_V4_1 */
#define NFSDBG_FACILITY NFSDBG_CALLBACK #define NFSDBG_FACILITY NFSDBG_CALLBACK
@ -72,7 +73,7 @@ static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p, void *dummy)
return xdr_ressize_check(rqstp, p); return xdr_ressize_check(rqstp, p);
} }
static __be32 *read_buf(struct xdr_stream *xdr, int nbytes) static __be32 *read_buf(struct xdr_stream *xdr, size_t nbytes)
{ {
__be32 *p; __be32 *p;
@ -534,6 +535,49 @@ static __be32 decode_recallslot_args(struct svc_rqst *rqstp,
return 0; return 0;
} }
static __be32 decode_lockowner(struct xdr_stream *xdr, struct cb_notify_lock_args *args)
{
__be32 *p;
unsigned int len;
p = read_buf(xdr, 12);
if (unlikely(p == NULL))
return htonl(NFS4ERR_BADXDR);
p = xdr_decode_hyper(p, &args->cbnl_owner.clientid);
len = be32_to_cpu(*p);
p = read_buf(xdr, len);
if (unlikely(p == NULL))
return htonl(NFS4ERR_BADXDR);
/* Only try to decode if the length is right */
if (len == 20) {
p += 2; /* skip "lock id:" */
args->cbnl_owner.s_dev = be32_to_cpu(*p++);
xdr_decode_hyper(p, &args->cbnl_owner.id);
args->cbnl_valid = true;
} else {
args->cbnl_owner.s_dev = 0;
args->cbnl_owner.id = 0;
args->cbnl_valid = false;
}
return 0;
}
static __be32 decode_notify_lock_args(struct svc_rqst *rqstp, struct xdr_stream *xdr, struct cb_notify_lock_args *args)
{
__be32 status;
status = decode_fh(xdr, &args->cbnl_fh);
if (unlikely(status != 0))
goto out;
status = decode_lockowner(xdr, args);
out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status;
}
#endif /* CONFIG_NFS_V4_1 */ #endif /* CONFIG_NFS_V4_1 */
static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str) static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
@ -746,6 +790,7 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
case OP_CB_RECALL_SLOT: case OP_CB_RECALL_SLOT:
case OP_CB_LAYOUTRECALL: case OP_CB_LAYOUTRECALL:
case OP_CB_NOTIFY_DEVICEID: case OP_CB_NOTIFY_DEVICEID:
case OP_CB_NOTIFY_LOCK:
*op = &callback_ops[op_nr]; *op = &callback_ops[op_nr];
break; break;
@ -753,7 +798,6 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
case OP_CB_PUSH_DELEG: case OP_CB_PUSH_DELEG:
case OP_CB_RECALLABLE_OBJ_AVAIL: case OP_CB_RECALLABLE_OBJ_AVAIL:
case OP_CB_WANTS_CANCELLED: case OP_CB_WANTS_CANCELLED:
case OP_CB_NOTIFY_LOCK:
return htonl(NFS4ERR_NOTSUPP); return htonl(NFS4ERR_NOTSUPP);
default: default:
@ -1006,6 +1050,11 @@ static struct callback_op callback_ops[] = {
.decode_args = (callback_decode_arg_t)decode_recallslot_args, .decode_args = (callback_decode_arg_t)decode_recallslot_args,
.res_maxsize = CB_OP_RECALLSLOT_RES_MAXSZ, .res_maxsize = CB_OP_RECALLSLOT_RES_MAXSZ,
}, },
[OP_CB_NOTIFY_LOCK] = {
.process_op = (callback_process_op_t)nfs4_callback_notify_lock,
.decode_args = (callback_decode_arg_t)decode_notify_lock_args,
.res_maxsize = CB_OP_NOTIFY_LOCK_RES_MAXSZ,
},
#endif /* CONFIG_NFS_V4_1 */ #endif /* CONFIG_NFS_V4_1 */
}; };

View File

@ -313,7 +313,10 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
continue; continue;
/* Match the full socket address */ /* Match the full socket address */
if (!rpc_cmp_addr_port(sap, clap)) if (!rpc_cmp_addr_port(sap, clap))
continue; /* Match all xprt_switch full socket addresses */
if (!rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient,
sap))
continue;
atomic_inc(&clp->cl_count); atomic_inc(&clp->cl_count);
return clp; return clp;
@ -785,7 +788,8 @@ int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs
} }
fsinfo.fattr = fattr; fsinfo.fattr = fattr;
fsinfo.layouttype = 0; fsinfo.nlayouttypes = 0;
memset(fsinfo.layouttype, 0, sizeof(fsinfo.layouttype));
error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo); error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo);
if (error < 0) if (error < 0)
goto out_error; goto out_error;
@ -1078,7 +1082,7 @@ void nfs_clients_init(struct net *net)
idr_init(&nn->cb_ident_idr); idr_init(&nn->cb_ident_idr);
#endif #endif
spin_lock_init(&nn->nfs_client_lock); spin_lock_init(&nn->nfs_client_lock);
nn->boot_time = CURRENT_TIME; nn->boot_time = ktime_get_real();
} }
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS

View File

@ -41,6 +41,17 @@ void nfs_mark_delegation_referenced(struct nfs_delegation *delegation)
set_bit(NFS_DELEGATION_REFERENCED, &delegation->flags); set_bit(NFS_DELEGATION_REFERENCED, &delegation->flags);
} }
static bool
nfs4_is_valid_delegation(const struct nfs_delegation *delegation,
fmode_t flags)
{
if (delegation != NULL && (delegation->type & flags) == flags &&
!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags) &&
!test_bit(NFS_DELEGATION_RETURNING, &delegation->flags))
return true;
return false;
}
static int static int
nfs4_do_check_delegation(struct inode *inode, fmode_t flags, bool mark) nfs4_do_check_delegation(struct inode *inode, fmode_t flags, bool mark)
{ {
@ -50,8 +61,7 @@ nfs4_do_check_delegation(struct inode *inode, fmode_t flags, bool mark)
flags &= FMODE_READ|FMODE_WRITE; flags &= FMODE_READ|FMODE_WRITE;
rcu_read_lock(); rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation); delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation != NULL && (delegation->type & flags) == flags && if (nfs4_is_valid_delegation(delegation, flags)) {
!test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)) {
if (mark) if (mark)
nfs_mark_delegation_referenced(delegation); nfs_mark_delegation_referenced(delegation);
ret = 1; ret = 1;
@ -185,15 +195,13 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred,
rcu_read_unlock(); rcu_read_unlock();
put_rpccred(oldcred); put_rpccred(oldcred);
trace_nfs4_reclaim_delegation(inode, res->delegation_type); trace_nfs4_reclaim_delegation(inode, res->delegation_type);
} else { return;
/* We appear to have raced with a delegation return. */
spin_unlock(&delegation->lock);
rcu_read_unlock();
nfs_inode_set_delegation(inode, cred, res);
} }
} else { /* We appear to have raced with a delegation return. */
rcu_read_unlock(); spin_unlock(&delegation->lock);
} }
rcu_read_unlock();
nfs_inode_set_delegation(inode, cred, res);
} }
static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync) static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync)
@ -642,28 +650,49 @@ static void nfs_client_mark_return_unused_delegation_types(struct nfs_client *cl
rcu_read_unlock(); rcu_read_unlock();
} }
static void nfs_revoke_delegation(struct inode *inode) static void nfs_mark_delegation_revoked(struct nfs_server *server,
struct nfs_delegation *delegation)
{ {
struct nfs_delegation *delegation; set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
rcu_read_lock(); delegation->stateid.type = NFS4_INVALID_STATEID_TYPE;
delegation = rcu_dereference(NFS_I(inode)->delegation); nfs_mark_return_delegation(server, delegation);
if (delegation != NULL) {
set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
nfs_mark_return_delegation(NFS_SERVER(inode), delegation);
}
rcu_read_unlock();
} }
void nfs_remove_bad_delegation(struct inode *inode) static bool nfs_revoke_delegation(struct inode *inode,
const nfs4_stateid *stateid)
{
struct nfs_delegation *delegation;
nfs4_stateid tmp;
bool ret = false;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation == NULL)
goto out;
if (stateid == NULL) {
nfs4_stateid_copy(&tmp, &delegation->stateid);
stateid = &tmp;
} else if (!nfs4_stateid_match(stateid, &delegation->stateid))
goto out;
nfs_mark_delegation_revoked(NFS_SERVER(inode), delegation);
ret = true;
out:
rcu_read_unlock();
if (ret)
nfs_inode_find_state_and_recover(inode, stateid);
return ret;
}
void nfs_remove_bad_delegation(struct inode *inode,
const nfs4_stateid *stateid)
{ {
struct nfs_delegation *delegation; struct nfs_delegation *delegation;
nfs_revoke_delegation(inode); if (!nfs_revoke_delegation(inode, stateid))
return;
delegation = nfs_inode_detach_delegation(inode); delegation = nfs_inode_detach_delegation(inode);
if (delegation) { if (delegation)
nfs_inode_find_state_and_recover(inode, &delegation->stateid);
nfs_free_delegation(delegation); nfs_free_delegation(delegation);
}
} }
EXPORT_SYMBOL_GPL(nfs_remove_bad_delegation); EXPORT_SYMBOL_GPL(nfs_remove_bad_delegation);
@ -786,8 +815,15 @@ static void nfs_delegation_mark_reclaim_server(struct nfs_server *server)
{ {
struct nfs_delegation *delegation; struct nfs_delegation *delegation;
list_for_each_entry_rcu(delegation, &server->delegations, super_list) list_for_each_entry_rcu(delegation, &server->delegations, super_list) {
/*
* If the delegation may have been admin revoked, then we
* cannot reclaim it.
*/
if (test_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags))
continue;
set_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags); set_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
}
} }
/** /**
@ -851,6 +887,141 @@ restart:
rcu_read_unlock(); rcu_read_unlock();
} }
static inline bool nfs4_server_rebooted(const struct nfs_client *clp)
{
return (clp->cl_state & (BIT(NFS4CLNT_CHECK_LEASE) |
BIT(NFS4CLNT_LEASE_EXPIRED) |
BIT(NFS4CLNT_SESSION_RESET))) != 0;
}
static void nfs_mark_test_expired_delegation(struct nfs_server *server,
struct nfs_delegation *delegation)
{
if (delegation->stateid.type == NFS4_INVALID_STATEID_TYPE)
return;
clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
set_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags);
set_bit(NFS4CLNT_DELEGATION_EXPIRED, &server->nfs_client->cl_state);
}
static void nfs_inode_mark_test_expired_delegation(struct nfs_server *server,
struct inode *inode)
{
struct nfs_delegation *delegation;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation)
nfs_mark_test_expired_delegation(server, delegation);
rcu_read_unlock();
}
static void nfs_delegation_mark_test_expired_server(struct nfs_server *server)
{
struct nfs_delegation *delegation;
list_for_each_entry_rcu(delegation, &server->delegations, super_list)
nfs_mark_test_expired_delegation(server, delegation);
}
/**
* nfs_mark_test_expired_all_delegations - mark all delegations for testing
* @clp: nfs_client to process
*
* Iterates through all the delegations associated with this server and
* marks them as needing to be checked for validity.
*/
void nfs_mark_test_expired_all_delegations(struct nfs_client *clp)
{
struct nfs_server *server;
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link)
nfs_delegation_mark_test_expired_server(server);
rcu_read_unlock();
}
/**
* nfs_reap_expired_delegations - reap expired delegations
* @clp: nfs_client to process
*
* Iterates through all the delegations associated with this server and
* checks if they have may have been revoked. This function is usually
* expected to be called in cases where the server may have lost its
* lease.
*/
void nfs_reap_expired_delegations(struct nfs_client *clp)
{
const struct nfs4_minor_version_ops *ops = clp->cl_mvops;
struct nfs_delegation *delegation;
struct nfs_server *server;
struct inode *inode;
struct rpc_cred *cred;
nfs4_stateid stateid;
restart:
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
list_for_each_entry_rcu(delegation, &server->delegations,
super_list) {
if (test_bit(NFS_DELEGATION_RETURNING,
&delegation->flags))
continue;
if (test_bit(NFS_DELEGATION_TEST_EXPIRED,
&delegation->flags) == 0)
continue;
if (!nfs_sb_active(server->super))
continue;
inode = nfs_delegation_grab_inode(delegation);
if (inode == NULL) {
rcu_read_unlock();
nfs_sb_deactive(server->super);
goto restart;
}
cred = get_rpccred_rcu(delegation->cred);
nfs4_stateid_copy(&stateid, &delegation->stateid);
clear_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags);
rcu_read_unlock();
if (cred != NULL &&
ops->test_and_free_expired(server, &stateid, cred) < 0) {
nfs_revoke_delegation(inode, &stateid);
nfs_inode_find_state_and_recover(inode, &stateid);
}
put_rpccred(cred);
if (nfs4_server_rebooted(clp)) {
nfs_inode_mark_test_expired_delegation(server,inode);
iput(inode);
nfs_sb_deactive(server->super);
return;
}
iput(inode);
nfs_sb_deactive(server->super);
goto restart;
}
}
rcu_read_unlock();
}
void nfs_inode_find_delegation_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid)
{
struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
struct nfs_delegation *delegation;
bool found = false;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation &&
nfs4_stateid_match_other(&delegation->stateid, stateid)) {
nfs_mark_test_expired_delegation(NFS_SERVER(inode), delegation);
found = true;
}
rcu_read_unlock();
if (found)
nfs4_schedule_state_manager(clp);
}
/** /**
* nfs_delegations_present - check for existence of delegations * nfs_delegations_present - check for existence of delegations
* @clp: client state handle * @clp: client state handle
@ -893,7 +1064,7 @@ bool nfs4_copy_delegation_stateid(struct inode *inode, fmode_t flags,
flags &= FMODE_READ|FMODE_WRITE; flags &= FMODE_READ|FMODE_WRITE;
rcu_read_lock(); rcu_read_lock();
delegation = rcu_dereference(nfsi->delegation); delegation = rcu_dereference(nfsi->delegation);
ret = (delegation != NULL && (delegation->type & flags) == flags); ret = nfs4_is_valid_delegation(delegation, flags);
if (ret) { if (ret) {
nfs4_stateid_copy(dst, &delegation->stateid); nfs4_stateid_copy(dst, &delegation->stateid);
nfs_mark_delegation_referenced(delegation); nfs_mark_delegation_referenced(delegation);

View File

@ -32,6 +32,7 @@ enum {
NFS_DELEGATION_REFERENCED, NFS_DELEGATION_REFERENCED,
NFS_DELEGATION_RETURNING, NFS_DELEGATION_RETURNING,
NFS_DELEGATION_REVOKED, NFS_DELEGATION_REVOKED,
NFS_DELEGATION_TEST_EXPIRED,
}; };
int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res); int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res);
@ -47,11 +48,14 @@ void nfs_expire_unused_delegation_types(struct nfs_client *clp, fmode_t flags);
void nfs_expire_unreferenced_delegations(struct nfs_client *clp); void nfs_expire_unreferenced_delegations(struct nfs_client *clp);
int nfs_client_return_marked_delegations(struct nfs_client *clp); int nfs_client_return_marked_delegations(struct nfs_client *clp);
int nfs_delegations_present(struct nfs_client *clp); int nfs_delegations_present(struct nfs_client *clp);
void nfs_remove_bad_delegation(struct inode *inode); void nfs_remove_bad_delegation(struct inode *inode, const nfs4_stateid *stateid);
void nfs_delegation_mark_reclaim(struct nfs_client *clp); void nfs_delegation_mark_reclaim(struct nfs_client *clp);
void nfs_delegation_reap_unclaimed(struct nfs_client *clp); void nfs_delegation_reap_unclaimed(struct nfs_client *clp);
void nfs_mark_test_expired_all_delegations(struct nfs_client *clp);
void nfs_reap_expired_delegations(struct nfs_client *clp);
/* NFSv4 delegation-related procedures */ /* NFSv4 delegation-related procedures */
int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync); int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync);
int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid, fmode_t type); int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid, fmode_t type);
@ -62,6 +66,8 @@ void nfs_mark_delegation_referenced(struct nfs_delegation *delegation);
int nfs4_have_delegation(struct inode *inode, fmode_t flags); int nfs4_have_delegation(struct inode *inode, fmode_t flags);
int nfs4_check_delegation(struct inode *inode, fmode_t flags); int nfs4_check_delegation(struct inode *inode, fmode_t flags);
bool nfs4_delegation_flush_on_close(const struct inode *inode); bool nfs4_delegation_flush_on_close(const struct inode *inode);
void nfs_inode_find_delegation_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid);
#endif #endif

View File

@ -435,11 +435,11 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
return 0; return 0;
nfsi = NFS_I(inode); nfsi = NFS_I(inode);
if (entry->fattr->fileid == nfsi->fileid) if (entry->fattr->fileid != nfsi->fileid)
return 1; return 0;
if (nfs_compare_fh(entry->fh, &nfsi->fh) == 0) if (entry->fh->size && nfs_compare_fh(entry->fh, &nfsi->fh) != 0)
return 1; return 0;
return 0; return 1;
} }
static static
@ -496,6 +496,14 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
return; return;
if (!(entry->fattr->valid & NFS_ATTR_FATTR_FSID)) if (!(entry->fattr->valid & NFS_ATTR_FATTR_FSID))
return; return;
if (filename.len == 0)
return;
/* Validate that the name doesn't contain any illegal '\0' */
if (strnlen(filename.name, filename.len) != filename.len)
return;
/* ...or '/' */
if (strnchr(filename.name, filename.len, '/'))
return;
if (filename.name[0] == '.') { if (filename.name[0] == '.') {
if (filename.len == 1) if (filename.len == 1)
return; return;
@ -517,6 +525,8 @@ again:
&entry->fattr->fsid)) &entry->fattr->fsid))
goto out; goto out;
if (nfs_same_file(dentry, entry)) { if (nfs_same_file(dentry, entry)) {
if (!entry->fh->size)
goto out;
nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
status = nfs_refresh_inode(d_inode(dentry), entry->fattr); status = nfs_refresh_inode(d_inode(dentry), entry->fattr);
if (!status) if (!status)
@ -529,6 +539,10 @@ again:
goto again; goto again;
} }
} }
if (!entry->fh->size) {
d_lookup_done(dentry);
goto out;
}
inode = nfs_fhget(dentry->d_sb, entry->fh, entry->fattr, entry->label); inode = nfs_fhget(dentry->d_sb, entry->fh, entry->fattr, entry->label);
alias = d_splice_alias(inode, dentry); alias = d_splice_alias(inode, dentry);

View File

@ -387,7 +387,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq)
dreq->iocb->ki_complete(dreq->iocb, res, 0); dreq->iocb->ki_complete(dreq->iocb, res, 0);
} }
complete_all(&dreq->completion); complete(&dreq->completion);
nfs_direct_req_release(dreq); nfs_direct_req_release(dreq);
} }

View File

@ -520,7 +520,9 @@ const struct address_space_operations nfs_file_aops = {
.invalidatepage = nfs_invalidate_page, .invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page, .releasepage = nfs_release_page,
.direct_IO = nfs_direct_IO, .direct_IO = nfs_direct_IO,
#ifdef CONFIG_MIGRATION
.migratepage = nfs_migrate_page, .migratepage = nfs_migrate_page,
#endif
.launder_page = nfs_launder_page, .launder_page = nfs_launder_page,
.is_dirty_writeback = nfs_check_dirty_writeback, .is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_page = generic_error_remove_page, .error_remove_page = generic_error_remove_page,
@ -685,11 +687,6 @@ out_noconflict:
goto out; goto out;
} }
static int do_vfs_lock(struct file *file, struct file_lock *fl)
{
return locks_lock_file_wait(file, fl);
}
static int static int
do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{ {
@ -722,7 +719,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local) if (!is_local)
status = NFS_PROTO(inode)->lock(filp, cmd, fl); status = NFS_PROTO(inode)->lock(filp, cmd, fl);
else else
status = do_vfs_lock(filp, fl); status = locks_lock_file_wait(filp, fl);
return status; return status;
} }
@ -747,7 +744,7 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local) if (!is_local)
status = NFS_PROTO(inode)->lock(filp, cmd, fl); status = NFS_PROTO(inode)->lock(filp, cmd, fl);
else else
status = do_vfs_lock(filp, fl); status = locks_lock_file_wait(filp, fl);
if (status < 0) if (status < 0)
goto out; goto out;

View File

@ -1080,7 +1080,7 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
case -NFS4ERR_BAD_STATEID: case -NFS4ERR_BAD_STATEID:
if (state == NULL) if (state == NULL)
break; break;
nfs_remove_bad_delegation(state->inode); nfs_remove_bad_delegation(state->inode, NULL);
case -NFS4ERR_OPENMODE: case -NFS4ERR_OPENMODE:
if (state == NULL) if (state == NULL)
break; break;

View File

@ -534,12 +534,9 @@ void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo)
} }
#endif #endif
#ifdef CONFIG_MIGRATION #ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *, extern int nfs_migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode); struct page *, struct page *, enum migrate_mode);
#else
#define nfs_migrate_page NULL
#endif #endif
static inline int static inline int
@ -562,7 +559,6 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
extern ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq); extern ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq);
/* nfs4proc.c */ /* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_pgio_header *);
extern struct nfs_client *nfs4_init_client(struct nfs_client *clp, extern struct nfs_client *nfs4_init_client(struct nfs_client *clp,
const struct nfs_client_initdata *); const struct nfs_client_initdata *);
extern int nfs40_walk_client_list(struct nfs_client *clp, extern int nfs40_walk_client_list(struct nfs_client *clp,
@ -571,6 +567,9 @@ extern int nfs40_walk_client_list(struct nfs_client *clp,
extern int nfs41_walk_client_list(struct nfs_client *clp, extern int nfs41_walk_client_list(struct nfs_client *clp,
struct nfs_client **result, struct nfs_client **result,
struct rpc_cred *cred); struct rpc_cred *cred);
extern int nfs4_test_session_trunk(struct rpc_clnt *,
struct rpc_xprt *,
void *);
static inline struct inode *nfs_igrab_and_active(struct inode *inode) static inline struct inode *nfs_igrab_and_active(struct inode *inode)
{ {

View File

@ -29,7 +29,7 @@ struct nfs_net {
int cb_users[NFS4_MAX_MINOR_VERSION + 1]; int cb_users[NFS4_MAX_MINOR_VERSION + 1];
#endif #endif
spinlock_t nfs_client_lock; spinlock_t nfs_client_lock;
struct timespec boot_time; ktime_t boot_time;
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
struct proc_dir_entry *proc_nfsfs; struct proc_dir_entry *proc_nfsfs;
#endif #endif

View File

@ -443,6 +443,7 @@ int nfs42_proc_layoutstats_generic(struct nfs_server *server,
task = rpc_run_task(&task_setup); task = rpc_run_task(&task_setup);
if (IS_ERR(task)) if (IS_ERR(task))
return PTR_ERR(task); return PTR_ERR(task);
rpc_put_task(task);
return 0; return 0;
} }

View File

@ -39,6 +39,7 @@ enum nfs4_client_state {
NFS4CLNT_BIND_CONN_TO_SESSION, NFS4CLNT_BIND_CONN_TO_SESSION,
NFS4CLNT_MOVED, NFS4CLNT_MOVED,
NFS4CLNT_LEASE_MOVED, NFS4CLNT_LEASE_MOVED,
NFS4CLNT_DELEGATION_EXPIRED,
}; };
#define NFS4_RENEW_TIMEOUT 0x01 #define NFS4_RENEW_TIMEOUT 0x01
@ -57,8 +58,11 @@ struct nfs4_minor_version_ops {
struct nfs_fsinfo *); struct nfs_fsinfo *);
void (*free_lock_state)(struct nfs_server *, void (*free_lock_state)(struct nfs_server *,
struct nfs4_lock_state *); struct nfs4_lock_state *);
int (*test_and_free_expired)(struct nfs_server *,
nfs4_stateid *, struct rpc_cred *);
struct nfs_seqid * struct nfs_seqid *
(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t); (*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
int (*session_trunk)(struct rpc_clnt *, struct rpc_xprt *, void *);
const struct rpc_call_ops *call_sync_ops; const struct rpc_call_ops *call_sync_ops;
const struct nfs4_state_recovery_ops *reboot_recovery_ops; const struct nfs4_state_recovery_ops *reboot_recovery_ops;
const struct nfs4_state_recovery_ops *nograce_recovery_ops; const struct nfs4_state_recovery_ops *nograce_recovery_ops;
@ -156,6 +160,7 @@ enum {
NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */ NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */
NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */ NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */
NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */ NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */
NFS_STATE_MAY_NOTIFY_LOCK, /* server may CB_NOTIFY_LOCK */
}; };
struct nfs4_state { struct nfs4_state {
@ -203,6 +208,11 @@ struct nfs4_state_recovery_ops {
struct rpc_cred *); struct rpc_cred *);
}; };
struct nfs4_add_xprt_data {
struct nfs_client *clp;
struct rpc_cred *cred;
};
struct nfs4_state_maintenance_ops { struct nfs4_state_maintenance_ops {
int (*sched_state_renewal)(struct nfs_client *, struct rpc_cred *, unsigned); int (*sched_state_renewal)(struct nfs_client *, struct rpc_cred *, unsigned);
struct rpc_cred * (*get_state_renewal_cred_locked)(struct nfs_client *); struct rpc_cred * (*get_state_renewal_cred_locked)(struct nfs_client *);
@ -278,6 +288,8 @@ extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
struct nfs_fsinfo *fsinfo); struct nfs_fsinfo *fsinfo);
extern int nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data, extern int nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data,
bool sync); bool sync);
extern int nfs4_detect_session_trunking(struct nfs_client *clp,
struct nfs41_exchange_id_res *res, struct rpc_xprt *xprt);
static inline bool static inline bool
is_ds_only_client(struct nfs_client *clp) is_ds_only_client(struct nfs_client *clp)
@ -439,7 +451,7 @@ extern void nfs4_schedule_path_down_recovery(struct nfs_client *clp);
extern int nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *); extern int nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
extern int nfs4_schedule_migration_recovery(const struct nfs_server *); extern int nfs4_schedule_migration_recovery(const struct nfs_server *);
extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *); extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags); extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags, bool);
extern void nfs41_handle_server_scope(struct nfs_client *, extern void nfs41_handle_server_scope(struct nfs_client *,
struct nfs41_server_scope **); struct nfs41_server_scope **);
extern void nfs4_put_lock_state(struct nfs4_lock_state *lsp); extern void nfs4_put_lock_state(struct nfs4_lock_state *lsp);
@ -471,6 +483,7 @@ extern struct nfs_subversion nfs_v4;
struct dentry *nfs4_try_mount(int, const char *, struct nfs_mount_info *, struct nfs_subversion *); struct dentry *nfs4_try_mount(int, const char *, struct nfs_mount_info *, struct nfs_subversion *);
extern bool nfs4_disable_idmapping; extern bool nfs4_disable_idmapping;
extern unsigned short max_session_slots; extern unsigned short max_session_slots;
extern unsigned short max_session_cb_slots;
extern unsigned short send_implementation_id; extern unsigned short send_implementation_id;
extern bool recover_lost_locks; extern bool recover_lost_locks;

View File

@ -199,6 +199,9 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_minorversion = cl_init->minorversion; clp->cl_minorversion = cl_init->minorversion;
clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion]; clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion];
clp->cl_mig_gen = 1; clp->cl_mig_gen = 1;
#if IS_ENABLED(CONFIG_NFS_V4_1)
init_waitqueue_head(&clp->cl_lock_waitq);
#endif
return clp; return clp;
error: error:
@ -562,15 +565,15 @@ out:
/* /*
* Returns true if the client IDs match * Returns true if the client IDs match
*/ */
static bool nfs4_match_clientids(struct nfs_client *a, struct nfs_client *b) static bool nfs4_match_clientids(u64 a, u64 b)
{ {
if (a->cl_clientid != b->cl_clientid) { if (a != b) {
dprintk("NFS: --> %s client ID %llx does not match %llx\n", dprintk("NFS: --> %s client ID %llx does not match %llx\n",
__func__, a->cl_clientid, b->cl_clientid); __func__, a, b);
return false; return false;
} }
dprintk("NFS: --> %s client ID %llx matches %llx\n", dprintk("NFS: --> %s client ID %llx matches %llx\n",
__func__, a->cl_clientid, b->cl_clientid); __func__, a, b);
return true; return true;
} }
@ -578,17 +581,15 @@ static bool nfs4_match_clientids(struct nfs_client *a, struct nfs_client *b)
* Returns true if the server major ids match * Returns true if the server major ids match
*/ */
static bool static bool
nfs4_check_clientid_trunking(struct nfs_client *a, struct nfs_client *b) nfs4_check_serverowner_major_id(struct nfs41_server_owner *o1,
struct nfs41_server_owner *o2)
{ {
struct nfs41_server_owner *o1 = a->cl_serverowner;
struct nfs41_server_owner *o2 = b->cl_serverowner;
if (o1->major_id_sz != o2->major_id_sz) if (o1->major_id_sz != o2->major_id_sz)
goto out_major_mismatch; goto out_major_mismatch;
if (memcmp(o1->major_id, o2->major_id, o1->major_id_sz) != 0) if (memcmp(o1->major_id, o2->major_id, o1->major_id_sz) != 0)
goto out_major_mismatch; goto out_major_mismatch;
dprintk("NFS: --> %s server owners match\n", __func__); dprintk("NFS: --> %s server owner major IDs match\n", __func__);
return true; return true;
out_major_mismatch: out_major_mismatch:
@ -597,6 +598,100 @@ out_major_mismatch:
return false; return false;
} }
/*
* Returns true if server minor ids match
*/
static bool
nfs4_check_serverowner_minor_id(struct nfs41_server_owner *o1,
struct nfs41_server_owner *o2)
{
/* Check eir_server_owner so_minor_id */
if (o1->minor_id != o2->minor_id)
goto out_minor_mismatch;
dprintk("NFS: --> %s server owner minor IDs match\n", __func__);
return true;
out_minor_mismatch:
dprintk("NFS: --> %s server owner minor IDs do not match\n", __func__);
return false;
}
/*
* Returns true if the server scopes match
*/
static bool
nfs4_check_server_scope(struct nfs41_server_scope *s1,
struct nfs41_server_scope *s2)
{
if (s1->server_scope_sz != s2->server_scope_sz)
goto out_scope_mismatch;
if (memcmp(s1->server_scope, s2->server_scope,
s1->server_scope_sz) != 0)
goto out_scope_mismatch;
dprintk("NFS: --> %s server scopes match\n", __func__);
return true;
out_scope_mismatch:
dprintk("NFS: --> %s server scopes do not match\n",
__func__);
return false;
}
/**
* nfs4_detect_session_trunking - Checks for session trunking.
*
* Called after a successful EXCHANGE_ID on a multi-addr connection.
* Upon success, add the transport.
*
* @clp: original mount nfs_client
* @res: result structure from an exchange_id using the original mount
* nfs_client with a new multi_addr transport
*
* Returns zero on success, otherwise -EINVAL
*
* Note: since the exchange_id for the new multi_addr transport uses the
* same nfs_client from the original mount, the cl_owner_id is reused,
* so eir_clientowner is the same.
*/
int nfs4_detect_session_trunking(struct nfs_client *clp,
struct nfs41_exchange_id_res *res,
struct rpc_xprt *xprt)
{
/* Check eir_clientid */
if (!nfs4_match_clientids(clp->cl_clientid, res->clientid))
goto out_err;
/* Check eir_server_owner so_major_id */
if (!nfs4_check_serverowner_major_id(clp->cl_serverowner,
res->server_owner))
goto out_err;
/* Check eir_server_owner so_minor_id */
if (!nfs4_check_serverowner_minor_id(clp->cl_serverowner,
res->server_owner))
goto out_err;
/* Check eir_server_scope */
if (!nfs4_check_server_scope(clp->cl_serverscope, res->server_scope))
goto out_err;
/* Session trunking passed, add the xprt */
rpc_clnt_xprt_switch_add_xprt(clp->cl_rpcclient, xprt);
pr_info("NFS: %s: Session trunking succeeded for %s\n",
clp->cl_hostname,
xprt->address_strings[RPC_DISPLAY_ADDR]);
return 0;
out_err:
pr_info("NFS: %s: Session trunking failed for %s\n", clp->cl_hostname,
xprt->address_strings[RPC_DISPLAY_ADDR]);
return -EINVAL;
}
/** /**
* nfs41_walk_client_list - Find nfs_client that matches a client/server owner * nfs41_walk_client_list - Find nfs_client that matches a client/server owner
* *
@ -650,7 +745,7 @@ int nfs41_walk_client_list(struct nfs_client *new,
if (pos->cl_cons_state != NFS_CS_READY) if (pos->cl_cons_state != NFS_CS_READY)
continue; continue;
if (!nfs4_match_clientids(pos, new)) if (!nfs4_match_clientids(pos->cl_clientid, new->cl_clientid))
continue; continue;
/* /*
@ -658,7 +753,8 @@ int nfs41_walk_client_list(struct nfs_client *new,
* client id trunking. In either case, we want to fall back * client id trunking. In either case, we want to fall back
* to using the existing nfs_client. * to using the existing nfs_client.
*/ */
if (!nfs4_check_clientid_trunking(pos, new)) if (!nfs4_check_serverowner_major_id(pos->cl_serverowner,
new->cl_serverowner))
continue; continue;
/* Unlike NFSv4.0, we know that NFSv4.1 always uses the /* Unlike NFSv4.0, we know that NFSv4.1 always uses the

File diff suppressed because it is too large Load Diff

View File

@ -9,6 +9,7 @@
/* maximum number of slots to use */ /* maximum number of slots to use */
#define NFS4_DEF_SLOT_TABLE_SIZE (64U) #define NFS4_DEF_SLOT_TABLE_SIZE (64U)
#define NFS4_DEF_CB_SLOT_TABLE_SIZE (1U)
#define NFS4_MAX_SLOT_TABLE (1024U) #define NFS4_MAX_SLOT_TABLE (1024U)
#define NFS4_NO_SLOT ((u32)-1) #define NFS4_NO_SLOT ((u32)-1)
@ -22,6 +23,7 @@ struct nfs4_slot {
u32 slot_nr; u32 slot_nr;
u32 seq_nr; u32 seq_nr;
unsigned int interrupted : 1, unsigned int interrupted : 1,
privileged : 1,
seq_done : 1; seq_done : 1;
}; };

View File

@ -991,6 +991,8 @@ int nfs4_select_rw_stateid(struct nfs4_state *state,
{ {
int ret; int ret;
if (!nfs4_valid_open_stateid(state))
return -EIO;
if (cred != NULL) if (cred != NULL)
*cred = NULL; *cred = NULL;
ret = nfs4_copy_lock_stateid(dst, state, lockowner); ret = nfs4_copy_lock_stateid(dst, state, lockowner);
@ -1303,6 +1305,8 @@ void nfs4_schedule_path_down_recovery(struct nfs_client *clp)
static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state) static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
{ {
if (!nfs4_valid_open_stateid(state))
return 0;
set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags); set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
/* Don't recover state that expired before the reboot */ /* Don't recover state that expired before the reboot */
if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) { if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) {
@ -1316,6 +1320,8 @@ static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_st
int nfs4_state_mark_reclaim_nograce(struct nfs_client *clp, struct nfs4_state *state) int nfs4_state_mark_reclaim_nograce(struct nfs_client *clp, struct nfs4_state *state)
{ {
if (!nfs4_valid_open_stateid(state))
return 0;
set_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags); set_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags);
clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags); clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->owner->so_flags); set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->owner->so_flags);
@ -1327,9 +1333,8 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
{ {
struct nfs_client *clp = server->nfs_client; struct nfs_client *clp = server->nfs_client;
if (!nfs4_valid_open_stateid(state)) if (!nfs4_state_mark_reclaim_nograce(clp, state))
return -EBADF; return -EBADF;
nfs4_state_mark_reclaim_nograce(clp, state);
dprintk("%s: scheduling stateid recovery for server %s\n", __func__, dprintk("%s: scheduling stateid recovery for server %s\n", __func__,
clp->cl_hostname); clp->cl_hostname);
nfs4_schedule_state_manager(clp); nfs4_schedule_state_manager(clp);
@ -1337,6 +1342,35 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
} }
EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery); EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
static struct nfs4_lock_state *
nfs_state_find_lock_state_by_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid)
{
struct nfs4_lock_state *pos;
list_for_each_entry(pos, &state->lock_states, ls_locks) {
if (!test_bit(NFS_LOCK_INITIALIZED, &pos->ls_flags))
continue;
if (nfs4_stateid_match_other(&pos->ls_stateid, stateid))
return pos;
}
return NULL;
}
static bool nfs_state_lock_state_matches_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid)
{
bool found = false;
if (test_bit(LK_STATE_IN_USE, &state->flags)) {
spin_lock(&state->state_lock);
if (nfs_state_find_lock_state_by_stateid(state, stateid))
found = true;
spin_unlock(&state->state_lock);
}
return found;
}
void nfs_inode_find_state_and_recover(struct inode *inode, void nfs_inode_find_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid) const nfs4_stateid *stateid)
{ {
@ -1351,14 +1385,18 @@ void nfs_inode_find_state_and_recover(struct inode *inode,
state = ctx->state; state = ctx->state;
if (state == NULL) if (state == NULL)
continue; continue;
if (!test_bit(NFS_DELEGATED_STATE, &state->flags)) if (nfs4_stateid_match_other(&state->stateid, stateid) &&
nfs4_state_mark_reclaim_nograce(clp, state)) {
found = true;
continue; continue;
if (!nfs4_stateid_match(&state->stateid, stateid)) }
continue; if (nfs_state_lock_state_matches_stateid(state, stateid) &&
nfs4_state_mark_reclaim_nograce(clp, state); nfs4_state_mark_reclaim_nograce(clp, state))
found = true; found = true;
} }
spin_unlock(&inode->i_lock); spin_unlock(&inode->i_lock);
nfs_inode_find_delegation_state_and_recover(inode, stateid);
if (found) if (found)
nfs4_schedule_state_manager(clp); nfs4_schedule_state_manager(clp);
} }
@ -1498,6 +1536,9 @@ restart:
__func__, status); __func__, status);
case -ENOENT: case -ENOENT:
case -ENOMEM: case -ENOMEM:
case -EACCES:
case -EROFS:
case -EIO:
case -ESTALE: case -ESTALE:
/* Open state on this file cannot be recovered */ /* Open state on this file cannot be recovered */
nfs4_state_mark_recovery_failed(state, status); nfs4_state_mark_recovery_failed(state, status);
@ -1656,15 +1697,9 @@ static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp)
put_rpccred(cred); put_rpccred(cred);
} }
static void nfs_delegation_clear_all(struct nfs_client *clp)
{
nfs_delegation_mark_reclaim(clp);
nfs_delegation_reap_unclaimed(clp);
}
static void nfs4_state_start_reclaim_nograce(struct nfs_client *clp) static void nfs4_state_start_reclaim_nograce(struct nfs_client *clp)
{ {
nfs_delegation_clear_all(clp); nfs_mark_test_expired_all_delegations(clp);
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce); nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce);
} }
@ -2195,7 +2230,7 @@ static void nfs41_handle_all_state_revoked(struct nfs_client *clp)
static void nfs41_handle_some_state_revoked(struct nfs_client *clp) static void nfs41_handle_some_state_revoked(struct nfs_client *clp)
{ {
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce); nfs4_state_start_reclaim_nograce(clp);
nfs4_schedule_state_manager(clp); nfs4_schedule_state_manager(clp);
dprintk("%s: state revoked on server %s\n", __func__, clp->cl_hostname); dprintk("%s: state revoked on server %s\n", __func__, clp->cl_hostname);
@ -2227,13 +2262,22 @@ static void nfs41_handle_cb_path_down(struct nfs_client *clp)
nfs4_schedule_state_manager(clp); nfs4_schedule_state_manager(clp);
} }
void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags) void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags,
bool recovery)
{ {
if (!flags) if (!flags)
return; return;
dprintk("%s: \"%s\" (client ID %llx) flags=0x%08x\n", dprintk("%s: \"%s\" (client ID %llx) flags=0x%08x\n",
__func__, clp->cl_hostname, clp->cl_clientid, flags); __func__, clp->cl_hostname, clp->cl_clientid, flags);
/*
* If we're called from the state manager thread, then assume we're
* already handling the RECLAIM_NEEDED and/or STATE_REVOKED.
* Those flags are expected to remain set until we're done
* recovering (see RFC5661, section 18.46.3).
*/
if (recovery)
goto out_recovery;
if (flags & SEQ4_STATUS_RESTART_RECLAIM_NEEDED) if (flags & SEQ4_STATUS_RESTART_RECLAIM_NEEDED)
nfs41_handle_server_reboot(clp); nfs41_handle_server_reboot(clp);
@ -2246,6 +2290,7 @@ void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags)
nfs4_schedule_lease_moved_recovery(clp); nfs4_schedule_lease_moved_recovery(clp);
if (flags & SEQ4_STATUS_RECALLABLE_STATE_REVOKED) if (flags & SEQ4_STATUS_RECALLABLE_STATE_REVOKED)
nfs41_handle_recallable_state_revoked(clp); nfs41_handle_recallable_state_revoked(clp);
out_recovery:
if (flags & SEQ4_STATUS_BACKCHANNEL_FAULT) if (flags & SEQ4_STATUS_BACKCHANNEL_FAULT)
nfs41_handle_backchannel_fault(clp); nfs41_handle_backchannel_fault(clp);
else if (flags & (SEQ4_STATUS_CB_PATH_DOWN | else if (flags & (SEQ4_STATUS_CB_PATH_DOWN |
@ -2410,6 +2455,13 @@ static void nfs4_state_manager(struct nfs_client *clp)
nfs4_state_end_reclaim_reboot(clp); nfs4_state_end_reclaim_reboot(clp);
} }
/* Detect expired delegations... */
if (test_and_clear_bit(NFS4CLNT_DELEGATION_EXPIRED, &clp->cl_state)) {
section = "detect expired delegations";
nfs_reap_expired_delegations(clp);
continue;
}
/* Now recover expired state... */ /* Now recover expired state... */
if (test_and_clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state)) { if (test_and_clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state)) {
section = "reclaim nograce"; section = "reclaim nograce";

View File

@ -1850,7 +1850,7 @@ static void encode_create_session(struct xdr_stream *xdr,
*p++ = cpu_to_be32(RPC_AUTH_UNIX); /* auth_sys */ *p++ = cpu_to_be32(RPC_AUTH_UNIX); /* auth_sys */
/* authsys_parms rfc1831 */ /* authsys_parms rfc1831 */
*p++ = cpu_to_be32(nn->boot_time.tv_nsec); /* stamp */ *p++ = cpu_to_be32(ktime_to_ns(nn->boot_time)); /* stamp */
p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen); p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
*p++ = cpu_to_be32(0); /* UID */ *p++ = cpu_to_be32(0); /* UID */
*p++ = cpu_to_be32(0); /* GID */ *p++ = cpu_to_be32(0); /* GID */
@ -4725,34 +4725,37 @@ static int decode_getfattr(struct xdr_stream *xdr, struct nfs_fattr *fattr,
} }
/* /*
* Decode potentially multiple layout types. Currently we only support * Decode potentially multiple layout types.
* one layout driver per file system.
*/ */
static int decode_first_pnfs_layout_type(struct xdr_stream *xdr, static int decode_pnfs_layout_types(struct xdr_stream *xdr,
uint32_t *layouttype) struct nfs_fsinfo *fsinfo)
{ {
__be32 *p; __be32 *p;
int num; uint32_t i;
p = xdr_inline_decode(xdr, 4); p = xdr_inline_decode(xdr, 4);
if (unlikely(!p)) if (unlikely(!p))
goto out_overflow; goto out_overflow;
num = be32_to_cpup(p); fsinfo->nlayouttypes = be32_to_cpup(p);
/* pNFS is not supported by the underlying file system */ /* pNFS is not supported by the underlying file system */
if (num == 0) { if (fsinfo->nlayouttypes == 0)
*layouttype = 0;
return 0; return 0;
}
if (num > 1)
printk(KERN_INFO "NFS: %s: Warning: Multiple pNFS layout "
"drivers per filesystem not supported\n", __func__);
/* Decode and set first layout type, move xdr->p past unused types */ /* Decode and set first layout type, move xdr->p past unused types */
p = xdr_inline_decode(xdr, num * 4); p = xdr_inline_decode(xdr, fsinfo->nlayouttypes * 4);
if (unlikely(!p)) if (unlikely(!p))
goto out_overflow; goto out_overflow;
*layouttype = be32_to_cpup(p);
/* If we get too many, then just cap it at the max */
if (fsinfo->nlayouttypes > NFS_MAX_LAYOUT_TYPES) {
printk(KERN_INFO "NFS: %s: Warning: Too many (%u) pNFS layout types\n",
__func__, fsinfo->nlayouttypes);
fsinfo->nlayouttypes = NFS_MAX_LAYOUT_TYPES;
}
for(i = 0; i < fsinfo->nlayouttypes; ++i)
fsinfo->layouttype[i] = be32_to_cpup(p++);
return 0; return 0;
out_overflow: out_overflow:
print_overflow_msg(__func__, xdr); print_overflow_msg(__func__, xdr);
@ -4764,7 +4767,7 @@ out_overflow:
* Note we must ensure that layouttype is set in any non-error case. * Note we must ensure that layouttype is set in any non-error case.
*/ */
static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap, static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
uint32_t *layouttype) struct nfs_fsinfo *fsinfo)
{ {
int status = 0; int status = 0;
@ -4772,10 +4775,9 @@ static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
if (unlikely(bitmap[1] & (FATTR4_WORD1_FS_LAYOUT_TYPES - 1U))) if (unlikely(bitmap[1] & (FATTR4_WORD1_FS_LAYOUT_TYPES - 1U)))
return -EIO; return -EIO;
if (bitmap[1] & FATTR4_WORD1_FS_LAYOUT_TYPES) { if (bitmap[1] & FATTR4_WORD1_FS_LAYOUT_TYPES) {
status = decode_first_pnfs_layout_type(xdr, layouttype); status = decode_pnfs_layout_types(xdr, fsinfo);
bitmap[1] &= ~FATTR4_WORD1_FS_LAYOUT_TYPES; bitmap[1] &= ~FATTR4_WORD1_FS_LAYOUT_TYPES;
} else }
*layouttype = 0;
return status; return status;
} }
@ -4856,7 +4858,7 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
status = decode_attr_time_delta(xdr, bitmap, &fsinfo->time_delta); status = decode_attr_time_delta(xdr, bitmap, &fsinfo->time_delta);
if (status != 0) if (status != 0)
goto xdr_error; goto xdr_error;
status = decode_attr_pnfstype(xdr, bitmap, &fsinfo->layouttype); status = decode_attr_pnfstype(xdr, bitmap, fsinfo);
if (status != 0) if (status != 0)
goto xdr_error; goto xdr_error;

View File

@ -30,6 +30,7 @@
#include <linux/nfs_fs.h> #include <linux/nfs_fs.h>
#include <linux/nfs_page.h> #include <linux/nfs_page.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/sort.h>
#include "internal.h" #include "internal.h"
#include "pnfs.h" #include "pnfs.h"
#include "iostat.h" #include "iostat.h"
@ -98,36 +99,80 @@ unset_pnfs_layoutdriver(struct nfs_server *nfss)
nfss->pnfs_curr_ld = NULL; nfss->pnfs_curr_ld = NULL;
} }
/*
* When the server sends a list of layout types, we choose one in the order
* given in the list below.
*
* FIXME: should this list be configurable in some fashion? module param?
* mount option? something else?
*/
static const u32 ld_prefs[] = {
LAYOUT_SCSI,
LAYOUT_BLOCK_VOLUME,
LAYOUT_OSD2_OBJECTS,
LAYOUT_FLEX_FILES,
LAYOUT_NFSV4_1_FILES,
0
};
static int
ld_cmp(const void *e1, const void *e2)
{
u32 ld1 = *((u32 *)e1);
u32 ld2 = *((u32 *)e2);
int i;
for (i = 0; ld_prefs[i] != 0; i++) {
if (ld1 == ld_prefs[i])
return -1;
if (ld2 == ld_prefs[i])
return 1;
}
return 0;
}
/* /*
* Try to set the server's pnfs module to the pnfs layout type specified by id. * Try to set the server's pnfs module to the pnfs layout type specified by id.
* Currently only one pNFS layout driver per filesystem is supported. * Currently only one pNFS layout driver per filesystem is supported.
* *
* @id layout type. Zero (illegal layout type) indicates pNFS not in use. * @ids array of layout types supported by MDS.
*/ */
void void
set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh, set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh,
u32 id) struct nfs_fsinfo *fsinfo)
{ {
struct pnfs_layoutdriver_type *ld_type = NULL; struct pnfs_layoutdriver_type *ld_type = NULL;
u32 id;
int i;
if (id == 0)
goto out_no_driver;
if (!(server->nfs_client->cl_exchange_flags & if (!(server->nfs_client->cl_exchange_flags &
(EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS))) { (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS))) {
printk(KERN_ERR "NFS: %s: id %u cl_exchange_flags 0x%x\n", printk(KERN_ERR "NFS: %s: cl_exchange_flags 0x%x\n",
__func__, id, server->nfs_client->cl_exchange_flags); __func__, server->nfs_client->cl_exchange_flags);
goto out_no_driver; goto out_no_driver;
} }
ld_type = find_pnfs_driver(id);
if (!ld_type) { sort(fsinfo->layouttype, fsinfo->nlayouttypes,
request_module("%s-%u", LAYOUT_NFSV4_1_MODULE_PREFIX, id); sizeof(*fsinfo->layouttype), ld_cmp, NULL);
for (i = 0; i < fsinfo->nlayouttypes; i++) {
id = fsinfo->layouttype[i];
ld_type = find_pnfs_driver(id); ld_type = find_pnfs_driver(id);
if (!ld_type) { if (!ld_type) {
dprintk("%s: No pNFS module found for %u.\n", request_module("%s-%u", LAYOUT_NFSV4_1_MODULE_PREFIX,
__func__, id); id);
goto out_no_driver; ld_type = find_pnfs_driver(id);
} }
if (ld_type)
break;
} }
if (!ld_type) {
dprintk("%s: No pNFS module found!\n", __func__);
goto out_no_driver;
}
server->pnfs_curr_ld = ld_type; server->pnfs_curr_ld = ld_type;
if (ld_type->set_layoutdriver if (ld_type->set_layoutdriver
&& ld_type->set_layoutdriver(server, mntfh)) { && ld_type->set_layoutdriver(server, mntfh)) {
@ -2185,10 +2230,8 @@ static void pnfs_ld_handle_read_error(struct nfs_pgio_header *hdr)
*/ */
void pnfs_ld_read_done(struct nfs_pgio_header *hdr) void pnfs_ld_read_done(struct nfs_pgio_header *hdr)
{ {
if (likely(!hdr->pnfs_error)) { if (likely(!hdr->pnfs_error))
__nfs4_read_done_cb(hdr);
hdr->mds_ops->rpc_call_done(&hdr->task, hdr); hdr->mds_ops->rpc_call_done(&hdr->task, hdr);
}
trace_nfs4_pnfs_read(hdr, hdr->pnfs_error); trace_nfs4_pnfs_read(hdr, hdr->pnfs_error);
if (unlikely(hdr->pnfs_error)) if (unlikely(hdr->pnfs_error))
pnfs_ld_handle_read_error(hdr); pnfs_ld_handle_read_error(hdr);

View File

@ -236,7 +236,7 @@ void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
void pnfs_put_lseg(struct pnfs_layout_segment *lseg); void pnfs_put_lseg(struct pnfs_layout_segment *lseg);
void pnfs_put_lseg_locked(struct pnfs_layout_segment *lseg); void pnfs_put_lseg_locked(struct pnfs_layout_segment *lseg);
void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, u32); void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, struct nfs_fsinfo *);
void unset_pnfs_layoutdriver(struct nfs_server *); void unset_pnfs_layoutdriver(struct nfs_server *);
void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *); void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *);
int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc); int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
@ -657,7 +657,8 @@ pnfs_wait_on_layoutreturn(struct inode *ino, struct rpc_task *task)
} }
static inline void set_pnfs_layoutdriver(struct nfs_server *s, static inline void set_pnfs_layoutdriver(struct nfs_server *s,
const struct nfs_fh *mntfh, u32 id) const struct nfs_fh *mntfh,
struct nfs_fsinfo *fsinfo)
{ {
} }

View File

@ -690,13 +690,50 @@ static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
dprintk("%s: DS %s: trying address %s\n", dprintk("%s: DS %s: trying address %s\n",
__func__, ds->ds_remotestr, da->da_remotestr); __func__, ds->ds_remotestr, da->da_remotestr);
clp = nfs4_set_ds_client(mds_srv, if (!IS_ERR(clp) && clp->cl_mvops->session_trunk) {
(struct sockaddr *)&da->da_addr, struct xprt_create xprt_args = {
da->da_addrlen, IPPROTO_TCP, .ident = XPRT_TRANSPORT_TCP,
timeo, retrans, minor_version, .net = clp->cl_net,
au_flavor); .dstaddr = (struct sockaddr *)&da->da_addr,
if (!IS_ERR(clp)) .addrlen = da->da_addrlen,
break; .servername = clp->cl_hostname,
};
struct nfs4_add_xprt_data xprtdata = {
.clp = clp,
.cred = nfs4_get_clid_cred(clp),
};
struct rpc_add_xprt_test rpcdata = {
.add_xprt_test = clp->cl_mvops->session_trunk,
.data = &xprtdata,
};
/**
* Test this address for session trunking and
* add as an alias
*/
rpc_clnt_add_xprt(clp->cl_rpcclient, &xprt_args,
rpc_clnt_setup_test_and_add_xprt,
&rpcdata);
if (xprtdata.cred)
put_rpccred(xprtdata.cred);
} else {
clp = nfs4_set_ds_client(mds_srv,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
timeo, retrans, minor_version,
au_flavor);
if (IS_ERR(clp))
continue;
status = nfs4_init_ds_session(clp,
mds_srv->nfs_client->cl_lease_time);
if (status) {
nfs_put_client(clp);
clp = ERR_PTR(-EIO);
continue;
}
}
} }
if (IS_ERR(clp)) { if (IS_ERR(clp)) {
@ -704,18 +741,11 @@ static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
goto out; goto out;
} }
status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
if (status)
goto out_put;
smp_wmb(); smp_wmb();
ds->ds_clp = clp; ds->ds_clp = clp;
dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr); dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
out: out:
return status; return status;
out_put:
nfs_put_client(clp);
goto out;
} }
/* /*

View File

@ -2848,19 +2848,23 @@ out_invalid_transport_udp:
* NFS client for backwards compatibility * NFS client for backwards compatibility
*/ */
unsigned int nfs_callback_set_tcpport; unsigned int nfs_callback_set_tcpport;
unsigned short nfs_callback_nr_threads;
/* Default cache timeout is 10 minutes */ /* Default cache timeout is 10 minutes */
unsigned int nfs_idmap_cache_timeout = 600; unsigned int nfs_idmap_cache_timeout = 600;
/* Turn off NFSv4 uid/gid mapping when using AUTH_SYS */ /* Turn off NFSv4 uid/gid mapping when using AUTH_SYS */
bool nfs4_disable_idmapping = true; bool nfs4_disable_idmapping = true;
unsigned short max_session_slots = NFS4_DEF_SLOT_TABLE_SIZE; unsigned short max_session_slots = NFS4_DEF_SLOT_TABLE_SIZE;
unsigned short max_session_cb_slots = NFS4_DEF_CB_SLOT_TABLE_SIZE;
unsigned short send_implementation_id = 1; unsigned short send_implementation_id = 1;
char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = ""; char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = "";
bool recover_lost_locks = false; bool recover_lost_locks = false;
EXPORT_SYMBOL_GPL(nfs_callback_nr_threads);
EXPORT_SYMBOL_GPL(nfs_callback_set_tcpport); EXPORT_SYMBOL_GPL(nfs_callback_set_tcpport);
EXPORT_SYMBOL_GPL(nfs_idmap_cache_timeout); EXPORT_SYMBOL_GPL(nfs_idmap_cache_timeout);
EXPORT_SYMBOL_GPL(nfs4_disable_idmapping); EXPORT_SYMBOL_GPL(nfs4_disable_idmapping);
EXPORT_SYMBOL_GPL(max_session_slots); EXPORT_SYMBOL_GPL(max_session_slots);
EXPORT_SYMBOL_GPL(max_session_cb_slots);
EXPORT_SYMBOL_GPL(send_implementation_id); EXPORT_SYMBOL_GPL(send_implementation_id);
EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier); EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier);
EXPORT_SYMBOL_GPL(recover_lost_locks); EXPORT_SYMBOL_GPL(recover_lost_locks);
@ -2887,6 +2891,9 @@ static const struct kernel_param_ops param_ops_portnr = {
#define param_check_portnr(name, p) __param_check(name, p, unsigned int); #define param_check_portnr(name, p) __param_check(name, p, unsigned int);
module_param_named(callback_tcpport, nfs_callback_set_tcpport, portnr, 0644); module_param_named(callback_tcpport, nfs_callback_set_tcpport, portnr, 0644);
module_param_named(callback_nr_threads, nfs_callback_nr_threads, ushort, 0644);
MODULE_PARM_DESC(callback_nr_threads, "Number of threads that will be "
"assigned to the NFSv4 callback channels.");
module_param(nfs_idmap_cache_timeout, int, 0644); module_param(nfs_idmap_cache_timeout, int, 0644);
module_param(nfs4_disable_idmapping, bool, 0644); module_param(nfs4_disable_idmapping, bool, 0644);
module_param_string(nfs4_unique_id, nfs4_client_id_uniquifier, module_param_string(nfs4_unique_id, nfs4_client_id_uniquifier,
@ -2896,6 +2903,9 @@ MODULE_PARM_DESC(nfs4_disable_idmapping,
module_param(max_session_slots, ushort, 0644); module_param(max_session_slots, ushort, 0644);
MODULE_PARM_DESC(max_session_slots, "Maximum number of outstanding NFSv4.1 " MODULE_PARM_DESC(max_session_slots, "Maximum number of outstanding NFSv4.1 "
"requests the client will negotiate"); "requests the client will negotiate");
module_param(max_session_cb_slots, ushort, 0644);
MODULE_PARM_DESC(max_session_slots, "Maximum number of parallel NFSv4.1 "
"callbacks the client will process for a given server");
module_param(send_implementation_id, ushort, 0644); module_param(send_implementation_id, ushort, 0644);
MODULE_PARM_DESC(send_implementation_id, MODULE_PARM_DESC(send_implementation_id,
"Send implementation ID with NFSv4.1 exchange_id"); "Send implementation ID with NFSv4.1 exchange_id");

View File

@ -67,6 +67,7 @@ struct nfs4_stateid_struct {
NFS4_DELEGATION_STATEID_TYPE, NFS4_DELEGATION_STATEID_TYPE,
NFS4_LAYOUT_STATEID_TYPE, NFS4_LAYOUT_STATEID_TYPE,
NFS4_PNFS_DS_STATEID_TYPE, NFS4_PNFS_DS_STATEID_TYPE,
NFS4_REVOKED_STATEID_TYPE,
} type; } type;
}; };

View File

@ -103,6 +103,9 @@ struct nfs_client {
#define NFS_SP4_MACH_CRED_WRITE 5 /* WRITE */ #define NFS_SP4_MACH_CRED_WRITE 5 /* WRITE */
#define NFS_SP4_MACH_CRED_COMMIT 6 /* COMMIT */ #define NFS_SP4_MACH_CRED_COMMIT 6 /* COMMIT */
#define NFS_SP4_MACH_CRED_PNFS_CLEANUP 7 /* LAYOUTRETURN */ #define NFS_SP4_MACH_CRED_PNFS_CLEANUP 7 /* LAYOUTRETURN */
#if IS_ENABLED(CONFIG_NFS_V4_1)
wait_queue_head_t cl_lock_waitq;
#endif /* CONFIG_NFS_V4_1 */
#endif /* CONFIG_NFS_V4 */ #endif /* CONFIG_NFS_V4 */
/* Our own IP address, as a null-terminated string. /* Our own IP address, as a null-terminated string.

View File

@ -124,6 +124,11 @@ struct nfs_fattr {
| NFS_ATTR_FATTR_SPACE_USED \ | NFS_ATTR_FATTR_SPACE_USED \
| NFS_ATTR_FATTR_V4_SECURITY_LABEL) | NFS_ATTR_FATTR_V4_SECURITY_LABEL)
/*
* Maximal number of supported layout drivers.
*/
#define NFS_MAX_LAYOUT_TYPES 8
/* /*
* Info on the file system * Info on the file system
*/ */
@ -139,7 +144,8 @@ struct nfs_fsinfo {
__u64 maxfilesize; __u64 maxfilesize;
struct timespec time_delta; /* server time granularity */ struct timespec time_delta; /* server time granularity */
__u32 lease_time; /* in seconds */ __u32 lease_time; /* in seconds */
__u32 layouttype; /* supported pnfs layout driver */ __u32 nlayouttypes; /* number of layouttypes */
__u32 layouttype[NFS_MAX_LAYOUT_TYPES]; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */ __u32 blksize; /* preferred pnfs io block size */
__u32 clone_blksize; /* granularity of a CLONE operation */ __u32 clone_blksize; /* granularity of a CLONE operation */
}; };

View File

@ -131,6 +131,7 @@ struct rpc_authops {
struct rpc_auth * (*create)(struct rpc_auth_create_args *, struct rpc_clnt *); struct rpc_auth * (*create)(struct rpc_auth_create_args *, struct rpc_clnt *);
void (*destroy)(struct rpc_auth *); void (*destroy)(struct rpc_auth *);
int (*hash_cred)(struct auth_cred *, unsigned int);
struct rpc_cred * (*lookup_cred)(struct rpc_auth *, struct auth_cred *, int); struct rpc_cred * (*lookup_cred)(struct rpc_auth *, struct auth_cred *, int);
struct rpc_cred * (*crcreate)(struct rpc_auth*, struct auth_cred *, int, gfp_t); struct rpc_cred * (*crcreate)(struct rpc_auth*, struct auth_cred *, int, gfp_t);
int (*list_pseudoflavors)(rpc_authflavor_t *, int); int (*list_pseudoflavors)(rpc_authflavor_t *, int);

View File

@ -125,6 +125,13 @@ struct rpc_create_args {
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
}; };
struct rpc_add_xprt_test {
int (*add_xprt_test)(struct rpc_clnt *,
struct rpc_xprt *,
void *calldata);
void *data;
};
/* Values for "flags" field */ /* Values for "flags" field */
#define RPC_CLNT_CREATE_HARDRTRY (1UL << 0) #define RPC_CLNT_CREATE_HARDRTRY (1UL << 0)
#define RPC_CLNT_CREATE_AUTOBIND (1UL << 2) #define RPC_CLNT_CREATE_AUTOBIND (1UL << 2)
@ -198,6 +205,16 @@ int rpc_clnt_add_xprt(struct rpc_clnt *, struct xprt_create *,
void rpc_cap_max_reconnect_timeout(struct rpc_clnt *clnt, void rpc_cap_max_reconnect_timeout(struct rpc_clnt *clnt,
unsigned long timeo); unsigned long timeo);
int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *,
struct rpc_xprt_switch *,
struct rpc_xprt *,
void *);
const char *rpc_proc_name(const struct rpc_task *task); const char *rpc_proc_name(const struct rpc_task *task);
void rpc_clnt_xprt_switch_put(struct rpc_clnt *);
void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *, struct rpc_xprt *);
bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
const struct sockaddr *sap);
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
#endif /* _LINUX_SUNRPC_CLNT_H */ #endif /* _LINUX_SUNRPC_CLNT_H */

View File

@ -46,6 +46,10 @@
#define RPCRDMA_VERSION 1 #define RPCRDMA_VERSION 1
#define rpcrdma_version cpu_to_be32(RPCRDMA_VERSION) #define rpcrdma_version cpu_to_be32(RPCRDMA_VERSION)
enum {
RPCRDMA_V1_DEF_INLINE_SIZE = 1024,
};
struct rpcrdma_segment { struct rpcrdma_segment {
__be32 rs_handle; /* Registered memory handle */ __be32 rs_handle; /* Registered memory handle */
__be32 rs_length; /* Length of the chunk in bytes */ __be32 rs_length; /* Length of the chunk in bytes */

View File

@ -239,8 +239,8 @@ struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
void *); void *);
void rpc_wake_up_status(struct rpc_wait_queue *, int); void rpc_wake_up_status(struct rpc_wait_queue *, int);
void rpc_delay(struct rpc_task *, unsigned long); void rpc_delay(struct rpc_task *, unsigned long);
void * rpc_malloc(struct rpc_task *, size_t); int rpc_malloc(struct rpc_task *);
void rpc_free(void *); void rpc_free(struct rpc_task *);
int rpciod_up(void); int rpciod_up(void);
void rpciod_down(void); void rpciod_down(void);
int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *); int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *);

View File

@ -67,6 +67,18 @@ struct xdr_buf {
len; /* Length of XDR encoded message */ len; /* Length of XDR encoded message */
}; };
static inline void
xdr_buf_init(struct xdr_buf *buf, void *start, size_t len)
{
buf->head[0].iov_base = start;
buf->head[0].iov_len = len;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->flags = 0;
buf->len = 0;
buf->buflen = len;
}
/* /*
* pre-xdr'ed macros. * pre-xdr'ed macros.
*/ */

View File

@ -83,9 +83,11 @@ struct rpc_rqst {
void (*rq_release_snd_buf)(struct rpc_rqst *); /* release rq_enc_pages */ void (*rq_release_snd_buf)(struct rpc_rqst *); /* release rq_enc_pages */
struct list_head rq_list; struct list_head rq_list;
__u32 * rq_buffer; /* XDR encode buffer */ void *rq_xprtdata; /* Per-xprt private data */
size_t rq_callsize, void *rq_buffer; /* Call XDR encode buffer */
rq_rcvsize; size_t rq_callsize;
void *rq_rbuffer; /* Reply XDR decode buffer */
size_t rq_rcvsize;
size_t rq_xmit_bytes_sent; /* total bytes sent */ size_t rq_xmit_bytes_sent; /* total bytes sent */
size_t rq_reply_bytes_recvd; /* total reply bytes */ size_t rq_reply_bytes_recvd; /* total reply bytes */
/* received */ /* received */
@ -127,8 +129,8 @@ struct rpc_xprt_ops {
void (*rpcbind)(struct rpc_task *task); void (*rpcbind)(struct rpc_task *task);
void (*set_port)(struct rpc_xprt *xprt, unsigned short port); void (*set_port)(struct rpc_xprt *xprt, unsigned short port);
void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task); void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
void * (*buf_alloc)(struct rpc_task *task, size_t size); int (*buf_alloc)(struct rpc_task *task);
void (*buf_free)(void *buffer); void (*buf_free)(struct rpc_task *task);
int (*send_request)(struct rpc_task *task); int (*send_request)(struct rpc_task *task);
void (*set_retrans_timeout)(struct rpc_task *task); void (*set_retrans_timeout)(struct rpc_task *task);
void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task);

View File

@ -66,4 +66,6 @@ extern struct rpc_xprt *xprt_iter_xprt(struct rpc_xprt_iter *xpi);
extern struct rpc_xprt *xprt_iter_get_xprt(struct rpc_xprt_iter *xpi); extern struct rpc_xprt *xprt_iter_get_xprt(struct rpc_xprt_iter *xpi);
extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi); extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi);
extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps,
const struct sockaddr *sap);
#endif #endif

View File

@ -53,8 +53,8 @@
#define RPCRDMA_MAX_SLOT_TABLE (256U) #define RPCRDMA_MAX_SLOT_TABLE (256U)
#define RPCRDMA_MIN_INLINE (1024) /* min inline thresh */ #define RPCRDMA_MIN_INLINE (1024) /* min inline thresh */
#define RPCRDMA_DEF_INLINE (1024) /* default inline thresh */ #define RPCRDMA_DEF_INLINE (4096) /* default inline thresh */
#define RPCRDMA_MAX_INLINE (3068) /* max inline thresh */ #define RPCRDMA_MAX_INLINE (65536) /* max inline thresh */
/* Memory registration strategies, by number. /* Memory registration strategies, by number.
* This is part of a kernel / user space API. Do not remove. */ * This is part of a kernel / user space API. Do not remove. */

View File

@ -551,7 +551,7 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
*entry, *new; *entry, *new;
unsigned int nr; unsigned int nr;
nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits); nr = auth->au_ops->hash_cred(acred, cache->hashbits);
rcu_read_lock(); rcu_read_lock();
hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) { hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) {

View File

@ -78,6 +78,14 @@ static struct rpc_cred *generic_bind_cred(struct rpc_task *task,
return auth->au_ops->lookup_cred(auth, acred, lookupflags); return auth->au_ops->lookup_cred(auth, acred, lookupflags);
} }
static int
generic_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kgid(&init_user_ns, acred->gid) |
((u64)from_kuid(&init_user_ns, acred->uid) <<
(sizeof(gid_t) * 8)), hashbits);
}
/* /*
* Lookup generic creds for current process * Lookup generic creds for current process
*/ */
@ -258,6 +266,7 @@ generic_key_timeout(struct rpc_auth *auth, struct rpc_cred *cred)
static const struct rpc_authops generic_auth_ops = { static const struct rpc_authops generic_auth_ops = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.au_name = "Generic", .au_name = "Generic",
.hash_cred = generic_hash_cred,
.lookup_cred = generic_lookup_cred, .lookup_cred = generic_lookup_cred,
.crcreate = generic_create_cred, .crcreate = generic_create_cred,
.key_timeout = generic_key_timeout, .key_timeout = generic_key_timeout,

View File

@ -1298,6 +1298,12 @@ gss_destroy_cred(struct rpc_cred *cred)
gss_destroy_nullcred(cred); gss_destroy_nullcred(cred);
} }
static int
gss_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kuid(&init_user_ns, acred->uid), hashbits);
}
/* /*
* Lookup RPCSEC_GSS cred for the current process * Lookup RPCSEC_GSS cred for the current process
*/ */
@ -1982,6 +1988,7 @@ static const struct rpc_authops authgss_ops = {
.au_name = "RPCSEC_GSS", .au_name = "RPCSEC_GSS",
.create = gss_create, .create = gss_create,
.destroy = gss_destroy, .destroy = gss_destroy,
.hash_cred = gss_hash_cred,
.lookup_cred = gss_lookup_cred, .lookup_cred = gss_lookup_cred,
.crcreate = gss_create_cred, .crcreate = gss_create_cred,
.list_pseudoflavors = gss_mech_list_pseudoflavors, .list_pseudoflavors = gss_mech_list_pseudoflavors,

View File

@ -46,6 +46,14 @@ unx_destroy(struct rpc_auth *auth)
rpcauth_clear_credcache(auth->au_credcache); rpcauth_clear_credcache(auth->au_credcache);
} }
static int
unx_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kgid(&init_user_ns, acred->gid) |
((u64)from_kuid(&init_user_ns, acred->uid) <<
(sizeof(gid_t) * 8)), hashbits);
}
/* /*
* Lookup AUTH_UNIX creds for current process * Lookup AUTH_UNIX creds for current process
*/ */
@ -220,6 +228,7 @@ const struct rpc_authops authunix_ops = {
.au_name = "UNIX", .au_name = "UNIX",
.create = unx_create, .create = unx_create,
.destroy = unx_destroy, .destroy = unx_destroy,
.hash_cred = unx_hash_cred,
.lookup_cred = unx_lookup_cred, .lookup_cred = unx_lookup_cred,
.crcreate = unx_create_cred, .crcreate = unx_create_cred,
}; };

View File

@ -76,13 +76,7 @@ static int xprt_alloc_xdr_buf(struct xdr_buf *buf, gfp_t gfp_flags)
page = alloc_page(gfp_flags); page = alloc_page(gfp_flags);
if (page == NULL) if (page == NULL)
return -ENOMEM; return -ENOMEM;
buf->head[0].iov_base = page_address(page); xdr_buf_init(buf, page_address(page), PAGE_SIZE);
buf->head[0].iov_len = PAGE_SIZE;
buf->tail[0].iov_base = NULL;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->len = 0;
buf->buflen = PAGE_SIZE;
return 0; return 0;
} }

View File

@ -353,7 +353,7 @@ void sunrpc_init_cache_detail(struct cache_detail *cd)
spin_unlock(&cache_list_lock); spin_unlock(&cache_list_lock);
/* start the cleaning process */ /* start the cleaning process */
schedule_delayed_work(&cache_cleaner, 0); queue_delayed_work(system_power_efficient_wq, &cache_cleaner, 0);
} }
EXPORT_SYMBOL_GPL(sunrpc_init_cache_detail); EXPORT_SYMBOL_GPL(sunrpc_init_cache_detail);
@ -476,7 +476,8 @@ static void do_cache_clean(struct work_struct *work)
delay = 0; delay = 0;
if (delay) if (delay)
schedule_delayed_work(&cache_cleaner, delay); queue_delayed_work(system_power_efficient_wq,
&cache_cleaner, delay);
} }

View File

@ -184,7 +184,6 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, unsigned long event,
struct super_block *sb) struct super_block *sb)
{ {
struct dentry *dentry; struct dentry *dentry;
int err = 0;
switch (event) { switch (event) {
case RPC_PIPEFS_MOUNT: case RPC_PIPEFS_MOUNT:
@ -201,7 +200,7 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, unsigned long event,
printk(KERN_ERR "%s: unknown event: %ld\n", __func__, event); printk(KERN_ERR "%s: unknown event: %ld\n", __func__, event);
return -ENOTSUPP; return -ENOTSUPP;
} }
return err; return 0;
} }
static int __rpc_pipefs_event(struct rpc_clnt *clnt, unsigned long event, static int __rpc_pipefs_event(struct rpc_clnt *clnt, unsigned long event,
@ -988,7 +987,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
{ {
if (clnt != NULL) { if (clnt != NULL) {
rpc_task_release_client(task);
if (task->tk_xprt == NULL) if (task->tk_xprt == NULL)
task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi);
task->tk_client = clnt; task->tk_client = clnt;
@ -1693,6 +1691,7 @@ call_allocate(struct rpc_task *task)
struct rpc_rqst *req = task->tk_rqstp; struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt; struct rpc_xprt *xprt = req->rq_xprt;
struct rpc_procinfo *proc = task->tk_msg.rpc_proc; struct rpc_procinfo *proc = task->tk_msg.rpc_proc;
int status;
dprint_status(task); dprint_status(task);
@ -1718,11 +1717,14 @@ call_allocate(struct rpc_task *task)
req->rq_rcvsize = RPC_REPHDRSIZE + slack + proc->p_replen; req->rq_rcvsize = RPC_REPHDRSIZE + slack + proc->p_replen;
req->rq_rcvsize <<= 2; req->rq_rcvsize <<= 2;
req->rq_buffer = xprt->ops->buf_alloc(task, status = xprt->ops->buf_alloc(task);
req->rq_callsize + req->rq_rcvsize);
if (req->rq_buffer != NULL)
return;
xprt_inject_disconnect(xprt); xprt_inject_disconnect(xprt);
if (status == 0)
return;
if (status != -ENOMEM) {
rpc_exit(task, status);
return;
}
dprintk("RPC: %5u rpc_buffer allocation failed\n", task->tk_pid); dprintk("RPC: %5u rpc_buffer allocation failed\n", task->tk_pid);
@ -1748,18 +1750,6 @@ rpc_task_force_reencode(struct rpc_task *task)
task->tk_rqstp->rq_bytes_sent = 0; task->tk_rqstp->rq_bytes_sent = 0;
} }
static inline void
rpc_xdr_buf_init(struct xdr_buf *buf, void *start, size_t len)
{
buf->head[0].iov_base = start;
buf->head[0].iov_len = len;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->flags = 0;
buf->len = 0;
buf->buflen = len;
}
/* /*
* 3. Encode arguments of an RPC call * 3. Encode arguments of an RPC call
*/ */
@ -1772,12 +1762,12 @@ rpc_xdr_encode(struct rpc_task *task)
dprint_status(task); dprint_status(task);
rpc_xdr_buf_init(&req->rq_snd_buf, xdr_buf_init(&req->rq_snd_buf,
req->rq_buffer, req->rq_buffer,
req->rq_callsize); req->rq_callsize);
rpc_xdr_buf_init(&req->rq_rcv_buf, xdr_buf_init(&req->rq_rcv_buf,
(char *)req->rq_buffer + req->rq_callsize, req->rq_rbuffer,
req->rq_rcvsize); req->rq_rcvsize);
p = rpc_encode_header(task); p = rpc_encode_header(task);
if (p == NULL) { if (p == NULL) {
@ -2615,6 +2605,70 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
} }
EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt); EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt);
/**
* rpc_clnt_setup_test_and_add_xprt()
*
* This is an rpc_clnt_add_xprt setup() function which returns 1 so:
* 1) caller of the test function must dereference the rpc_xprt_switch
* and the rpc_xprt.
* 2) test function must call rpc_xprt_switch_add_xprt, usually in
* the rpc_call_done routine.
*
* Upon success (return of 1), the test function adds the new
* transport to the rpc_clnt xprt switch
*
* @clnt: struct rpc_clnt to get the new transport
* @xps: the rpc_xprt_switch to hold the new transport
* @xprt: the rpc_xprt to test
* @data: a struct rpc_add_xprt_test pointer that holds the test function
* and test function call data
*/
int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *clnt,
struct rpc_xprt_switch *xps,
struct rpc_xprt *xprt,
void *data)
{
struct rpc_cred *cred;
struct rpc_task *task;
struct rpc_add_xprt_test *xtest = (struct rpc_add_xprt_test *)data;
int status = -EADDRINUSE;
xprt = xprt_get(xprt);
xprt_switch_get(xps);
if (rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
goto out_err;
/* Test the connection */
cred = authnull_ops.lookup_cred(NULL, NULL, 0);
task = rpc_call_null_helper(clnt, xprt, cred,
RPC_TASK_SOFT | RPC_TASK_SOFTCONN,
NULL, NULL);
put_rpccred(cred);
if (IS_ERR(task)) {
status = PTR_ERR(task);
goto out_err;
}
status = task->tk_status;
rpc_put_task(task);
if (status < 0)
goto out_err;
/* rpc_xprt_switch and rpc_xprt are deferrenced by add_xprt_test() */
xtest->add_xprt_test(clnt, xprt, xtest->data);
/* so that rpc_clnt_add_xprt does not call rpc_xprt_switch_add_xprt */
return 1;
out_err:
xprt_put(xprt);
xprt_switch_put(xps);
pr_info("RPC: rpc_clnt_test_xprt failed: %d addr %s not added\n",
status, xprt->address_strings[RPC_DISPLAY_ADDR]);
return status;
}
EXPORT_SYMBOL_GPL(rpc_clnt_setup_test_and_add_xprt);
/** /**
* rpc_clnt_add_xprt - Add a new transport to a rpc_clnt * rpc_clnt_add_xprt - Add a new transport to a rpc_clnt
* @clnt: pointer to struct rpc_clnt * @clnt: pointer to struct rpc_clnt
@ -2697,6 +2751,34 @@ rpc_cap_max_reconnect_timeout(struct rpc_clnt *clnt, unsigned long timeo)
} }
EXPORT_SYMBOL_GPL(rpc_cap_max_reconnect_timeout); EXPORT_SYMBOL_GPL(rpc_cap_max_reconnect_timeout);
void rpc_clnt_xprt_switch_put(struct rpc_clnt *clnt)
{
xprt_switch_put(rcu_dereference(clnt->cl_xpi.xpi_xpswitch));
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_put);
void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt)
{
rpc_xprt_switch_add_xprt(rcu_dereference(clnt->cl_xpi.xpi_xpswitch),
xprt);
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_add_xprt);
bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
const struct sockaddr *sap)
{
struct rpc_xprt_switch *xps;
bool ret;
xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
rcu_read_lock();
ret = rpc_xprt_switch_has_addr(xps, sap);
rcu_read_unlock();
return ret;
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_has_addr);
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
static void rpc_show_header(void) static void rpc_show_header(void)
{ {

View File

@ -849,14 +849,17 @@ static void rpc_async_schedule(struct work_struct *work)
} }
/** /**
* rpc_malloc - allocate an RPC buffer * rpc_malloc - allocate RPC buffer resources
* @task: RPC task that will use this buffer * @task: RPC task
* @size: requested byte size *
* A single memory region is allocated, which is split between the
* RPC call and RPC reply that this task is being used for. When
* this RPC is retired, the memory is released by calling rpc_free.
* *
* To prevent rpciod from hanging, this allocator never sleeps, * To prevent rpciod from hanging, this allocator never sleeps,
* returning NULL and suppressing warning if the request cannot be serviced * returning -ENOMEM and suppressing warning if the request cannot
* immediately. * be serviced immediately. The caller can arrange to sleep in a
* The caller can arrange to sleep in a way that is safe for rpciod. * way that is safe for rpciod.
* *
* Most requests are 'small' (under 2KiB) and can be serviced from a * Most requests are 'small' (under 2KiB) and can be serviced from a
* mempool, ensuring that NFS reads and writes can always proceed, * mempool, ensuring that NFS reads and writes can always proceed,
@ -865,8 +868,10 @@ static void rpc_async_schedule(struct work_struct *work)
* In order to avoid memory starvation triggering more writebacks of * In order to avoid memory starvation triggering more writebacks of
* NFS requests, we avoid using GFP_KERNEL. * NFS requests, we avoid using GFP_KERNEL.
*/ */
void *rpc_malloc(struct rpc_task *task, size_t size) int rpc_malloc(struct rpc_task *task)
{ {
struct rpc_rqst *rqst = task->tk_rqstp;
size_t size = rqst->rq_callsize + rqst->rq_rcvsize;
struct rpc_buffer *buf; struct rpc_buffer *buf;
gfp_t gfp = GFP_NOIO | __GFP_NOWARN; gfp_t gfp = GFP_NOIO | __GFP_NOWARN;
@ -880,28 +885,28 @@ void *rpc_malloc(struct rpc_task *task, size_t size)
buf = kmalloc(size, gfp); buf = kmalloc(size, gfp);
if (!buf) if (!buf)
return NULL; return -ENOMEM;
buf->len = size; buf->len = size;
dprintk("RPC: %5u allocated buffer of size %zu at %p\n", dprintk("RPC: %5u allocated buffer of size %zu at %p\n",
task->tk_pid, size, buf); task->tk_pid, size, buf);
return &buf->data; rqst->rq_buffer = buf->data;
rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize;
return 0;
} }
EXPORT_SYMBOL_GPL(rpc_malloc); EXPORT_SYMBOL_GPL(rpc_malloc);
/** /**
* rpc_free - free buffer allocated via rpc_malloc * rpc_free - free RPC buffer resources allocated via rpc_malloc
* @buffer: buffer to free * @task: RPC task
* *
*/ */
void rpc_free(void *buffer) void rpc_free(struct rpc_task *task)
{ {
void *buffer = task->tk_rqstp->rq_buffer;
size_t size; size_t size;
struct rpc_buffer *buf; struct rpc_buffer *buf;
if (!buffer)
return;
buf = container_of(buffer, struct rpc_buffer, data); buf = container_of(buffer, struct rpc_buffer, data);
size = buf->len; size = buf->len;

View File

@ -401,6 +401,21 @@ int svc_bind(struct svc_serv *serv, struct net *net)
} }
EXPORT_SYMBOL_GPL(svc_bind); EXPORT_SYMBOL_GPL(svc_bind);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
static void
__svc_init_bc(struct svc_serv *serv)
{
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
}
#else
static void
__svc_init_bc(struct svc_serv *serv)
{
}
#endif
/* /*
* Create an RPC service * Create an RPC service
*/ */
@ -443,6 +458,8 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
init_timer(&serv->sv_temptimer); init_timer(&serv->sv_temptimer);
spin_lock_init(&serv->sv_lock); spin_lock_init(&serv->sv_lock);
__svc_init_bc(serv);
serv->sv_nrpools = npools; serv->sv_nrpools = npools;
serv->sv_pools = serv->sv_pools =
kcalloc(serv->sv_nrpools, sizeof(struct svc_pool), kcalloc(serv->sv_nrpools, sizeof(struct svc_pool),

View File

@ -767,7 +767,7 @@ static void xdr_set_next_page(struct xdr_stream *xdr)
newbase -= xdr->buf->page_base; newbase -= xdr->buf->page_base;
if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0) if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len); xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
} }
static bool xdr_set_next_buffer(struct xdr_stream *xdr) static bool xdr_set_next_buffer(struct xdr_stream *xdr)
@ -776,7 +776,7 @@ static bool xdr_set_next_buffer(struct xdr_stream *xdr)
xdr_set_next_page(xdr); xdr_set_next_page(xdr);
else if (xdr->iov == xdr->buf->head) { else if (xdr->iov == xdr->buf->head) {
if (xdr_set_page_base(xdr, 0, PAGE_SIZE) < 0) if (xdr_set_page_base(xdr, 0, PAGE_SIZE) < 0)
xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len); xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
} }
return xdr->p != xdr->end; return xdr->p != xdr->end;
} }
@ -859,12 +859,15 @@ EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer);
static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes) static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes)
{ {
__be32 *p; __be32 *p;
void *cpdest = xdr->scratch.iov_base; char *cpdest = xdr->scratch.iov_base;
size_t cplen = (char *)xdr->end - (char *)xdr->p; size_t cplen = (char *)xdr->end - (char *)xdr->p;
if (nbytes > xdr->scratch.iov_len) if (nbytes > xdr->scratch.iov_len)
return NULL; return NULL;
memcpy(cpdest, xdr->p, cplen); p = __xdr_inline_decode(xdr, cplen);
if (p == NULL)
return NULL;
memcpy(cpdest, p, cplen);
cpdest += cplen; cpdest += cplen;
nbytes -= cplen; nbytes -= cplen;
if (!xdr_set_next_buffer(xdr)) if (!xdr_set_next_buffer(xdr))

View File

@ -1295,7 +1295,7 @@ void xprt_release(struct rpc_task *task)
xprt_schedule_autodisconnect(xprt); xprt_schedule_autodisconnect(xprt);
spin_unlock_bh(&xprt->transport_lock); spin_unlock_bh(&xprt->transport_lock);
if (req->rq_buffer) if (req->rq_buffer)
xprt->ops->buf_free(req->rq_buffer); xprt->ops->buf_free(task);
xprt_inject_disconnect(xprt); xprt_inject_disconnect(xprt);
if (req->rq_cred != NULL) if (req->rq_cred != NULL)
put_rpccred(req->rq_cred); put_rpccred(req->rq_cred);

View File

@ -15,6 +15,7 @@
#include <asm/cmpxchg.h> #include <asm/cmpxchg.h>
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/sunrpc/xprt.h> #include <linux/sunrpc/xprt.h>
#include <linux/sunrpc/addr.h>
#include <linux/sunrpc/xprtmultipath.h> #include <linux/sunrpc/xprtmultipath.h>
typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head, typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head,
@ -49,7 +50,8 @@ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps,
if (xprt == NULL) if (xprt == NULL)
return; return;
spin_lock(&xps->xps_lock); spin_lock(&xps->xps_lock);
if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) if ((xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) &&
!rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
xprt_switch_add_xprt_locked(xps, xprt); xprt_switch_add_xprt_locked(xps, xprt);
spin_unlock(&xps->xps_lock); spin_unlock(&xps->xps_lock);
} }
@ -232,6 +234,26 @@ struct rpc_xprt *xprt_iter_current_entry(struct rpc_xprt_iter *xpi)
return xprt_switch_find_current_entry(head, xpi->xpi_cursor); return xprt_switch_find_current_entry(head, xpi->xpi_cursor);
} }
bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps,
const struct sockaddr *sap)
{
struct list_head *head;
struct rpc_xprt *pos;
if (xps == NULL || sap == NULL)
return false;
head = &xps->xps_xprt_list;
list_for_each_entry_rcu(pos, head, xprt_switch) {
if (rpc_cmp_addr_port(sap, (struct sockaddr *)&pos->addr)) {
pr_info("RPC: addr %s already in xprt switch\n",
pos->address_strings[RPC_DISPLAY_ADDR]);
return true;
}
}
return false;
}
static static
struct rpc_xprt *xprt_switch_find_next_entry(struct list_head *head, struct rpc_xprt *xprt_switch_find_next_entry(struct list_head *head,
const struct rpc_xprt *cur) const struct rpc_xprt *cur)

View File

@ -27,7 +27,7 @@ static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
list_del(&req->rl_all); list_del(&req->rl_all);
spin_unlock(&buf->rb_reqslock); spin_unlock(&buf->rb_reqslock);
rpcrdma_destroy_req(&r_xprt->rx_ia, req); rpcrdma_destroy_req(req);
kfree(rqst); kfree(rqst);
} }
@ -35,10 +35,8 @@ static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt, static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
struct rpc_rqst *rqst) struct rpc_rqst *rqst)
{ {
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_regbuf *rb; struct rpcrdma_regbuf *rb;
struct rpcrdma_req *req; struct rpcrdma_req *req;
struct xdr_buf *buf;
size_t size; size_t size;
req = rpcrdma_create_req(r_xprt); req = rpcrdma_create_req(r_xprt);
@ -46,30 +44,19 @@ static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
return PTR_ERR(req); return PTR_ERR(req);
req->rl_backchannel = true; req->rl_backchannel = true;
size = RPCRDMA_INLINE_WRITE_THRESHOLD(rqst); rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE,
rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL); DMA_TO_DEVICE, GFP_KERNEL);
if (IS_ERR(rb)) if (IS_ERR(rb))
goto out_fail; goto out_fail;
req->rl_rdmabuf = rb; req->rl_rdmabuf = rb;
size += RPCRDMA_INLINE_READ_THRESHOLD(rqst); size = r_xprt->rx_data.inline_rsize;
rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL); rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, GFP_KERNEL);
if (IS_ERR(rb)) if (IS_ERR(rb))
goto out_fail; goto out_fail;
rb->rg_owner = req;
req->rl_sendbuf = rb; req->rl_sendbuf = rb;
/* so that rpcr_to_rdmar works when receiving a request */ xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base, size);
rqst->rq_buffer = (void *)req->rl_sendbuf->rg_base; rpcrdma_set_xprtdata(rqst, req);
buf = &rqst->rq_snd_buf;
buf->head[0].iov_base = rqst->rq_buffer;
buf->head[0].iov_len = 0;
buf->tail[0].iov_base = NULL;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->len = 0;
buf->buflen = size;
return 0; return 0;
out_fail: out_fail:
@ -219,7 +206,6 @@ int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt); struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
struct rpcrdma_req *req = rpcr_to_rdmar(rqst); struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_msg *headerp; struct rpcrdma_msg *headerp;
size_t rpclen;
headerp = rdmab_to_msg(req->rl_rdmabuf); headerp = rdmab_to_msg(req->rl_rdmabuf);
headerp->rm_xid = rqst->rq_xid; headerp->rm_xid = rqst->rq_xid;
@ -231,26 +217,9 @@ int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)
headerp->rm_body.rm_chunks[1] = xdr_zero; headerp->rm_body.rm_chunks[1] = xdr_zero;
headerp->rm_body.rm_chunks[2] = xdr_zero; headerp->rm_body.rm_chunks[2] = xdr_zero;
rpclen = rqst->rq_svec[0].iov_len; if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, RPCRDMA_HDRLEN_MIN,
&rqst->rq_snd_buf, rpcrdma_noch))
#ifdef RPCRDMA_BACKCHANNEL_DEBUG return -EIO;
pr_info("RPC: %s: rpclen %zd headerp 0x%p lkey 0x%x\n",
__func__, rpclen, headerp, rdmab_lkey(req->rl_rdmabuf));
pr_info("RPC: %s: RPC/RDMA: %*ph\n",
__func__, (int)RPCRDMA_HDRLEN_MIN, headerp);
pr_info("RPC: %s: RPC: %*ph\n",
__func__, (int)rpclen, rqst->rq_svec[0].iov_base);
#endif
req->rl_send_iov[0].addr = rdmab_addr(req->rl_rdmabuf);
req->rl_send_iov[0].length = RPCRDMA_HDRLEN_MIN;
req->rl_send_iov[0].lkey = rdmab_lkey(req->rl_rdmabuf);
req->rl_send_iov[1].addr = rdmab_addr(req->rl_sendbuf);
req->rl_send_iov[1].length = rpclen;
req->rl_send_iov[1].lkey = rdmab_lkey(req->rl_sendbuf);
req->rl_niovs = 2;
return 0; return 0;
} }
@ -402,7 +371,7 @@ out_overflow:
out_short: out_short:
pr_warn("RPC/RDMA short backward direction call\n"); pr_warn("RPC/RDMA short backward direction call\n");
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep)) if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
xprt_disconnect_done(xprt); xprt_disconnect_done(xprt);
else else
pr_warn("RPC: %s: reposting rep %p\n", pr_warn("RPC: %s: reposting rep %p\n",

View File

@ -160,9 +160,8 @@ static int
fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep, fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
struct rpcrdma_create_data_internal *cdata) struct rpcrdma_create_data_internal *cdata)
{ {
rpcrdma_set_max_header_sizes(ia, cdata, max_t(unsigned int, 1, ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS /
RPCRDMA_MAX_DATA_SEGS / RPCRDMA_MAX_FMR_SGES);
RPCRDMA_MAX_FMR_SGES));
return 0; return 0;
} }
@ -274,6 +273,7 @@ fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
*/ */
list_for_each_entry(mw, &req->rl_registered, mw_list) list_for_each_entry(mw, &req->rl_registered, mw_list)
list_add_tail(&mw->fmr.fm_mr->list, &unmap_list); list_add_tail(&mw->fmr.fm_mr->list, &unmap_list);
r_xprt->rx_stats.local_inv_needed++;
rc = ib_unmap_fmr(&unmap_list); rc = ib_unmap_fmr(&unmap_list);
if (rc) if (rc)
goto out_reset; goto out_reset;
@ -331,4 +331,5 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_init_mr = fmr_op_init_mr, .ro_init_mr = fmr_op_init_mr,
.ro_release_mr = fmr_op_release_mr, .ro_release_mr = fmr_op_release_mr,
.ro_displayname = "fmr", .ro_displayname = "fmr",
.ro_send_w_inv_ok = 0,
}; };

View File

@ -67,6 +67,8 @@
* pending send queue WRs before the transport is reconnected. * pending send queue WRs before the transport is reconnected.
*/ */
#include <linux/sunrpc/rpc_rdma.h>
#include "xprt_rdma.h" #include "xprt_rdma.h"
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
@ -161,7 +163,7 @@ __frwr_reset_mr(struct rpcrdma_ia *ia, struct rpcrdma_mw *r)
return PTR_ERR(f->fr_mr); return PTR_ERR(f->fr_mr);
} }
dprintk("RPC: %s: recovered FRMR %p\n", __func__, r); dprintk("RPC: %s: recovered FRMR %p\n", __func__, f);
f->fr_state = FRMR_IS_INVALID; f->fr_state = FRMR_IS_INVALID;
return 0; return 0;
} }
@ -242,9 +244,8 @@ frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
depth; depth;
} }
rpcrdma_set_max_header_sizes(ia, cdata, max_t(unsigned int, 1, ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS /
RPCRDMA_MAX_DATA_SEGS / ia->ri_max_frmr_depth);
ia->ri_max_frmr_depth));
return 0; return 0;
} }
@ -329,7 +330,7 @@ frwr_wc_localinv_wake(struct ib_cq *cq, struct ib_wc *wc)
frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe); frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe);
if (wc->status != IB_WC_SUCCESS) if (wc->status != IB_WC_SUCCESS)
__frwr_sendcompletion_flush(wc, frmr, "localinv"); __frwr_sendcompletion_flush(wc, frmr, "localinv");
complete_all(&frmr->fr_linv_done); complete(&frmr->fr_linv_done);
} }
/* Post a REG_MR Work Request to register a memory region /* Post a REG_MR Work Request to register a memory region
@ -396,7 +397,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
goto out_mapmr_err; goto out_mapmr_err;
dprintk("RPC: %s: Using frmr %p to map %u segments (%u bytes)\n", dprintk("RPC: %s: Using frmr %p to map %u segments (%u bytes)\n",
__func__, mw, mw->mw_nents, mr->length); __func__, frmr, mw->mw_nents, mr->length);
key = (u8)(mr->rkey & 0x000000FF); key = (u8)(mr->rkey & 0x000000FF);
ib_update_fast_reg_key(mr, ++key); ib_update_fast_reg_key(mr, ++key);
@ -449,6 +450,8 @@ __frwr_prepare_linv_wr(struct rpcrdma_mw *mw)
struct rpcrdma_frmr *f = &mw->frmr; struct rpcrdma_frmr *f = &mw->frmr;
struct ib_send_wr *invalidate_wr; struct ib_send_wr *invalidate_wr;
dprintk("RPC: %s: invalidating frmr %p\n", __func__, f);
f->fr_state = FRMR_IS_INVALID; f->fr_state = FRMR_IS_INVALID;
invalidate_wr = &f->fr_invwr; invalidate_wr = &f->fr_invwr;
@ -472,6 +475,7 @@ static void
frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req) frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
{ {
struct ib_send_wr *invalidate_wrs, *pos, *prev, *bad_wr; struct ib_send_wr *invalidate_wrs, *pos, *prev, *bad_wr;
struct rpcrdma_rep *rep = req->rl_reply;
struct rpcrdma_ia *ia = &r_xprt->rx_ia; struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mw *mw, *tmp; struct rpcrdma_mw *mw, *tmp;
struct rpcrdma_frmr *f; struct rpcrdma_frmr *f;
@ -487,6 +491,12 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
f = NULL; f = NULL;
invalidate_wrs = pos = prev = NULL; invalidate_wrs = pos = prev = NULL;
list_for_each_entry(mw, &req->rl_registered, mw_list) { list_for_each_entry(mw, &req->rl_registered, mw_list) {
if ((rep->rr_wc_flags & IB_WC_WITH_INVALIDATE) &&
(mw->mw_handle == rep->rr_inv_rkey)) {
mw->frmr.fr_state = FRMR_IS_INVALID;
continue;
}
pos = __frwr_prepare_linv_wr(mw); pos = __frwr_prepare_linv_wr(mw);
if (!invalidate_wrs) if (!invalidate_wrs)
@ -496,6 +506,8 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
prev = pos; prev = pos;
f = &mw->frmr; f = &mw->frmr;
} }
if (!f)
goto unmap;
/* Strong send queue ordering guarantees that when the /* Strong send queue ordering guarantees that when the
* last WR in the chain completes, all WRs in the chain * last WR in the chain completes, all WRs in the chain
@ -510,6 +522,7 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
* replaces the QP. The RPC reply handler won't call us * replaces the QP. The RPC reply handler won't call us
* unless ri_id->qp is a valid pointer. * unless ri_id->qp is a valid pointer.
*/ */
r_xprt->rx_stats.local_inv_needed++;
rc = ib_post_send(ia->ri_id->qp, invalidate_wrs, &bad_wr); rc = ib_post_send(ia->ri_id->qp, invalidate_wrs, &bad_wr);
if (rc) if (rc)
goto reset_mrs; goto reset_mrs;
@ -521,6 +534,8 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
*/ */
unmap: unmap:
list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) { list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
dprintk("RPC: %s: unmapping frmr %p\n",
__func__, &mw->frmr);
list_del_init(&mw->mw_list); list_del_init(&mw->mw_list);
ib_dma_unmap_sg(ia->ri_device, ib_dma_unmap_sg(ia->ri_device,
mw->mw_sg, mw->mw_nents, mw->mw_dir); mw->mw_sg, mw->mw_nents, mw->mw_dir);
@ -576,4 +591,5 @@ const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_init_mr = frwr_op_init_mr, .ro_init_mr = frwr_op_init_mr,
.ro_release_mr = frwr_op_release_mr, .ro_release_mr = frwr_op_release_mr,
.ro_displayname = "frwr", .ro_displayname = "frwr",
.ro_send_w_inv_ok = RPCRDMA_CMP_F_SND_W_INV_OK,
}; };

View File

@ -53,14 +53,6 @@
# define RPCDBG_FACILITY RPCDBG_TRANS # define RPCDBG_FACILITY RPCDBG_TRANS
#endif #endif
enum rpcrdma_chunktype {
rpcrdma_noch = 0,
rpcrdma_readch,
rpcrdma_areadch,
rpcrdma_writech,
rpcrdma_replych
};
static const char transfertypes[][12] = { static const char transfertypes[][12] = {
"inline", /* no chunks */ "inline", /* no chunks */
"read list", /* some argument via rdma read */ "read list", /* some argument via rdma read */
@ -118,10 +110,12 @@ static unsigned int rpcrdma_max_reply_header_size(unsigned int maxsegs)
return size; return size;
} }
void rpcrdma_set_max_header_sizes(struct rpcrdma_ia *ia, void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *r_xprt)
struct rpcrdma_create_data_internal *cdata,
unsigned int maxsegs)
{ {
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
unsigned int maxsegs = ia->ri_max_segs;
ia->ri_max_inline_write = cdata->inline_wsize - ia->ri_max_inline_write = cdata->inline_wsize -
rpcrdma_max_call_header_size(maxsegs); rpcrdma_max_call_header_size(maxsegs);
ia->ri_max_inline_read = cdata->inline_rsize - ia->ri_max_inline_read = cdata->inline_rsize -
@ -155,42 +149,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read; return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
} }
static int
rpcrdma_tail_pullup(struct xdr_buf *buf)
{
size_t tlen = buf->tail[0].iov_len;
size_t skip = tlen & 3;
/* Do not include the tail if it is only an XDR pad */
if (tlen < 4)
return 0;
/* xdr_write_pages() adds a pad at the beginning of the tail
* if the content in "buf->pages" is unaligned. Force the
* tail's actual content to land at the next XDR position
* after the head instead.
*/
if (skip) {
unsigned char *src, *dst;
unsigned int count;
src = buf->tail[0].iov_base;
dst = buf->head[0].iov_base;
dst += buf->head[0].iov_len;
src += skip;
tlen -= skip;
dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n",
__func__, skip, dst, src, tlen);
for (count = tlen; count; count--)
*dst++ = *src++;
}
return tlen;
}
/* Split "vec" on page boundaries into segments. FMR registers pages, /* Split "vec" on page boundaries into segments. FMR registers pages,
* not a byte range. Other modes coalesce these segments into a single * not a byte range. Other modes coalesce these segments into a single
* MR when they can. * MR when they can.
@ -229,7 +187,8 @@ rpcrdma_convert_kvec(struct kvec *vec, struct rpcrdma_mr_seg *seg, int n)
static int static int
rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg) enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
bool reminv_expected)
{ {
int len, n, p, page_base; int len, n, p, page_base;
struct page **ppages; struct page **ppages;
@ -271,6 +230,13 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
if (type == rpcrdma_readch) if (type == rpcrdma_readch)
return n; return n;
/* When encoding the Write list, some servers need to see an extra
* segment for odd-length Write chunks. The upper layer provides
* space in the tail iovec for this purpose.
*/
if (type == rpcrdma_writech && reminv_expected)
return n;
if (xdrbuf->tail[0].iov_len) { if (xdrbuf->tail[0].iov_len) {
/* the rpcrdma protocol allows us to omit any trailing /* the rpcrdma protocol allows us to omit any trailing
* xdr pad bytes, saving the server an RDMA operation. */ * xdr pad bytes, saving the server an RDMA operation. */
@ -327,7 +293,7 @@ rpcrdma_encode_read_list(struct rpcrdma_xprt *r_xprt,
if (rtype == rpcrdma_areadch) if (rtype == rpcrdma_areadch)
pos = 0; pos = 0;
seg = req->rl_segments; seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg); nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
if (nsegs < 0) if (nsegs < 0)
return ERR_PTR(nsegs); return ERR_PTR(nsegs);
@ -391,7 +357,8 @@ rpcrdma_encode_write_list(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
seg = req->rl_segments; seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
rqst->rq_rcv_buf.head[0].iov_len, rqst->rq_rcv_buf.head[0].iov_len,
wtype, seg); wtype, seg,
r_xprt->rx_ia.ri_reminv_expected);
if (nsegs < 0) if (nsegs < 0)
return ERR_PTR(nsegs); return ERR_PTR(nsegs);
@ -456,7 +423,8 @@ rpcrdma_encode_reply_chunk(struct rpcrdma_xprt *r_xprt,
} }
seg = req->rl_segments; seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg); nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
r_xprt->rx_ia.ri_reminv_expected);
if (nsegs < 0) if (nsegs < 0)
return ERR_PTR(nsegs); return ERR_PTR(nsegs);
@ -491,74 +459,184 @@ rpcrdma_encode_reply_chunk(struct rpcrdma_xprt *r_xprt,
return iptr; return iptr;
} }
/* /* Prepare the RPC-over-RDMA header SGE.
* Copy write data inline.
* This function is used for "small" requests. Data which is passed
* to RPC via iovecs (or page list) is copied directly into the
* pre-registered memory buffer for this request. For small amounts
* of data, this is efficient. The cutoff value is tunable.
*/ */
static void rpcrdma_inline_pullup(struct rpc_rqst *rqst) static bool
rpcrdma_prepare_hdr_sge(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
u32 len)
{ {
int i, npages, curlen; struct rpcrdma_regbuf *rb = req->rl_rdmabuf;
int copy_len; struct ib_sge *sge = &req->rl_send_sge[0];
unsigned char *srcp, *destp;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
int page_base;
struct page **ppages;
destp = rqst->rq_svec[0].iov_base; if (unlikely(!rpcrdma_regbuf_is_mapped(rb))) {
curlen = rqst->rq_svec[0].iov_len; if (!__rpcrdma_dma_map_regbuf(ia, rb))
destp += curlen; return false;
sge->addr = rdmab_addr(rb);
sge->lkey = rdmab_lkey(rb);
}
sge->length = len;
dprintk("RPC: %s: destp 0x%p len %d hdrlen %d\n", ib_dma_sync_single_for_device(ia->ri_device, sge->addr,
__func__, destp, rqst->rq_slen, curlen); sge->length, DMA_TO_DEVICE);
req->rl_send_wr.num_sge++;
return true;
}
copy_len = rqst->rq_snd_buf.page_len; /* Prepare the Send SGEs. The head and tail iovec, and each entry
* in the page list, gets its own SGE.
*/
static bool
rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
struct xdr_buf *xdr, enum rpcrdma_chunktype rtype)
{
unsigned int sge_no, page_base, len, remaining;
struct rpcrdma_regbuf *rb = req->rl_sendbuf;
struct ib_device *device = ia->ri_device;
struct ib_sge *sge = req->rl_send_sge;
u32 lkey = ia->ri_pd->local_dma_lkey;
struct page *page, **ppages;
if (rqst->rq_snd_buf.tail[0].iov_len) { /* The head iovec is straightforward, as it is already
curlen = rqst->rq_snd_buf.tail[0].iov_len; * DMA-mapped. Sync the content that has changed.
if (destp + copy_len != rqst->rq_snd_buf.tail[0].iov_base) { */
memmove(destp + copy_len, if (!rpcrdma_dma_map_regbuf(ia, rb))
rqst->rq_snd_buf.tail[0].iov_base, curlen); return false;
r_xprt->rx_stats.pullup_copy_count += curlen; sge_no = 1;
sge[sge_no].addr = rdmab_addr(rb);
sge[sge_no].length = xdr->head[0].iov_len;
sge[sge_no].lkey = rdmab_lkey(rb);
ib_dma_sync_single_for_device(device, sge[sge_no].addr,
sge[sge_no].length, DMA_TO_DEVICE);
/* If there is a Read chunk, the page list is being handled
* via explicit RDMA, and thus is skipped here. However, the
* tail iovec may include an XDR pad for the page list, as
* well as additional content, and may not reside in the
* same page as the head iovec.
*/
if (rtype == rpcrdma_readch) {
len = xdr->tail[0].iov_len;
/* Do not include the tail if it is only an XDR pad */
if (len < 4)
goto out;
page = virt_to_page(xdr->tail[0].iov_base);
page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
/* If the content in the page list is an odd length,
* xdr_write_pages() has added a pad at the beginning
* of the tail iovec. Force the tail's non-pad content
* to land at the next XDR position in the Send message.
*/
page_base += len & 3;
len -= len & 3;
goto map_tail;
}
/* If there is a page list present, temporarily DMA map
* and prepare an SGE for each page to be sent.
*/
if (xdr->page_len) {
ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);
page_base = xdr->page_base & ~PAGE_MASK;
remaining = xdr->page_len;
while (remaining) {
sge_no++;
if (sge_no > RPCRDMA_MAX_SEND_SGES - 2)
goto out_mapping_overflow;
len = min_t(u32, PAGE_SIZE - page_base, remaining);
sge[sge_no].addr = ib_dma_map_page(device, *ppages,
page_base, len,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(device, sge[sge_no].addr))
goto out_mapping_err;
sge[sge_no].length = len;
sge[sge_no].lkey = lkey;
req->rl_mapped_sges++;
ppages++;
remaining -= len;
page_base = 0;
} }
dprintk("RPC: %s: tail destp 0x%p len %d\n",
__func__, destp + copy_len, curlen);
rqst->rq_svec[0].iov_len += curlen;
} }
r_xprt->rx_stats.pullup_copy_count += copy_len;
page_base = rqst->rq_snd_buf.page_base; /* The tail iovec is not always constructed in the same
ppages = rqst->rq_snd_buf.pages + (page_base >> PAGE_SHIFT); * page where the head iovec resides (see, for example,
page_base &= ~PAGE_MASK; * gss_wrap_req_priv). To neatly accommodate that case,
npages = PAGE_ALIGN(page_base+copy_len) >> PAGE_SHIFT; * DMA map it separately.
for (i = 0; copy_len && i < npages; i++) { */
curlen = PAGE_SIZE - page_base; if (xdr->tail[0].iov_len) {
if (curlen > copy_len) page = virt_to_page(xdr->tail[0].iov_base);
curlen = copy_len; page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
dprintk("RPC: %s: page %d destp 0x%p len %d curlen %d\n", len = xdr->tail[0].iov_len;
__func__, i, destp, copy_len, curlen);
srcp = kmap_atomic(ppages[i]); map_tail:
memcpy(destp, srcp+page_base, curlen); sge_no++;
kunmap_atomic(srcp); sge[sge_no].addr = ib_dma_map_page(device, page,
rqst->rq_svec[0].iov_len += curlen; page_base, len,
destp += curlen; DMA_TO_DEVICE);
copy_len -= curlen; if (ib_dma_mapping_error(device, sge[sge_no].addr))
page_base = 0; goto out_mapping_err;
sge[sge_no].length = len;
sge[sge_no].lkey = lkey;
req->rl_mapped_sges++;
} }
/* header now contains entire send message */
out:
req->rl_send_wr.num_sge = sge_no + 1;
return true;
out_mapping_overflow:
pr_err("rpcrdma: too many Send SGEs (%u)\n", sge_no);
return false;
out_mapping_err:
pr_err("rpcrdma: Send mapping error\n");
return false;
}
bool
rpcrdma_prepare_send_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
u32 hdrlen, struct xdr_buf *xdr,
enum rpcrdma_chunktype rtype)
{
req->rl_send_wr.num_sge = 0;
req->rl_mapped_sges = 0;
if (!rpcrdma_prepare_hdr_sge(ia, req, hdrlen))
goto out_map;
if (rtype != rpcrdma_areadch)
if (!rpcrdma_prepare_msg_sges(ia, req, xdr, rtype))
goto out_map;
return true;
out_map:
pr_err("rpcrdma: failed to DMA map a Send buffer\n");
return false;
}
void
rpcrdma_unmap_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
{
struct ib_device *device = ia->ri_device;
struct ib_sge *sge;
int count;
sge = &req->rl_send_sge[2];
for (count = req->rl_mapped_sges; count--; sge++)
ib_dma_unmap_page(device, sge->addr, sge->length,
DMA_TO_DEVICE);
req->rl_mapped_sges = 0;
} }
/* /*
* Marshal a request: the primary job of this routine is to choose * Marshal a request: the primary job of this routine is to choose
* the transfer modes. See comments below. * the transfer modes. See comments below.
* *
* Prepares up to two IOVs per Call message:
*
* [0] -- RPC RDMA header
* [1] -- the RPC header/data
*
* Returns zero on success, otherwise a negative errno. * Returns zero on success, otherwise a negative errno.
*/ */
@ -626,12 +704,11 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
*/ */
if (rpcrdma_args_inline(r_xprt, rqst)) { if (rpcrdma_args_inline(r_xprt, rqst)) {
rtype = rpcrdma_noch; rtype = rpcrdma_noch;
rpcrdma_inline_pullup(rqst); rpclen = rqst->rq_snd_buf.len;
rpclen = rqst->rq_svec[0].iov_len;
} else if (ddp_allowed && rqst->rq_snd_buf.flags & XDRBUF_WRITE) { } else if (ddp_allowed && rqst->rq_snd_buf.flags & XDRBUF_WRITE) {
rtype = rpcrdma_readch; rtype = rpcrdma_readch;
rpclen = rqst->rq_svec[0].iov_len; rpclen = rqst->rq_snd_buf.head[0].iov_len +
rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); rqst->rq_snd_buf.tail[0].iov_len;
} else { } else {
r_xprt->rx_stats.nomsg_call_count++; r_xprt->rx_stats.nomsg_call_count++;
headerp->rm_type = htonl(RDMA_NOMSG); headerp->rm_type = htonl(RDMA_NOMSG);
@ -673,34 +750,18 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
goto out_unmap; goto out_unmap;
hdrlen = (unsigned char *)iptr - (unsigned char *)headerp; hdrlen = (unsigned char *)iptr - (unsigned char *)headerp;
if (hdrlen + rpclen > RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
goto out_overflow;
dprintk("RPC: %5u %s: %s/%s: hdrlen %zd rpclen %zd\n", dprintk("RPC: %5u %s: %s/%s: hdrlen %zd rpclen %zd\n",
rqst->rq_task->tk_pid, __func__, rqst->rq_task->tk_pid, __func__,
transfertypes[rtype], transfertypes[wtype], transfertypes[rtype], transfertypes[wtype],
hdrlen, rpclen); hdrlen, rpclen);
req->rl_send_iov[0].addr = rdmab_addr(req->rl_rdmabuf); if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, hdrlen,
req->rl_send_iov[0].length = hdrlen; &rqst->rq_snd_buf, rtype)) {
req->rl_send_iov[0].lkey = rdmab_lkey(req->rl_rdmabuf); iptr = ERR_PTR(-EIO);
goto out_unmap;
req->rl_niovs = 1; }
if (rtype == rpcrdma_areadch)
return 0;
req->rl_send_iov[1].addr = rdmab_addr(req->rl_sendbuf);
req->rl_send_iov[1].length = rpclen;
req->rl_send_iov[1].lkey = rdmab_lkey(req->rl_sendbuf);
req->rl_niovs = 2;
return 0; return 0;
out_overflow:
pr_err("rpcrdma: send overflow: hdrlen %zd rpclen %zu %s/%s\n",
hdrlen, rpclen, transfertypes[rtype], transfertypes[wtype]);
iptr = ERR_PTR(-EIO);
out_unmap: out_unmap:
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false); r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
return PTR_ERR(iptr); return PTR_ERR(iptr);
@ -916,8 +977,10 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
* allowed to timeout, to discover the errors at that time. * allowed to timeout, to discover the errors at that time.
*/ */
void void
rpcrdma_reply_handler(struct rpcrdma_rep *rep) rpcrdma_reply_handler(struct work_struct *work)
{ {
struct rpcrdma_rep *rep =
container_of(work, struct rpcrdma_rep, rr_work);
struct rpcrdma_msg *headerp; struct rpcrdma_msg *headerp;
struct rpcrdma_req *req; struct rpcrdma_req *req;
struct rpc_rqst *rqst; struct rpc_rqst *rqst;
@ -1132,6 +1195,6 @@ out_duplicate:
repost: repost:
r_xprt->rx_stats.bad_reply_count++; r_xprt->rx_stats.bad_reply_count++;
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep)) if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
rpcrdma_recv_buffer_put(rep); rpcrdma_recv_buffer_put(rep);
} }

View File

@ -159,33 +159,34 @@ out_unmap:
/* Server-side transport endpoint wants a whole page for its send /* Server-side transport endpoint wants a whole page for its send
* buffer. The client RPC code constructs the RPC header in this * buffer. The client RPC code constructs the RPC header in this
* buffer before it invokes ->send_request. * buffer before it invokes ->send_request.
*
* Returns NULL if there was a temporary allocation failure.
*/ */
static void * static int
xprt_rdma_bc_allocate(struct rpc_task *task, size_t size) xprt_rdma_bc_allocate(struct rpc_task *task)
{ {
struct rpc_rqst *rqst = task->tk_rqstp; struct rpc_rqst *rqst = task->tk_rqstp;
struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt; struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
size_t size = rqst->rq_callsize;
struct svcxprt_rdma *rdma; struct svcxprt_rdma *rdma;
struct page *page; struct page *page;
rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt); rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
/* Prevent an infinite loop: try to make this case work */ if (size > PAGE_SIZE) {
if (size > PAGE_SIZE)
WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n", WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n",
size); size);
return -EINVAL;
}
page = alloc_page(RPCRDMA_DEF_GFP); page = alloc_page(RPCRDMA_DEF_GFP);
if (!page) if (!page)
return NULL; return -ENOMEM;
return page_address(page); rqst->rq_buffer = page_address(page);
return 0;
} }
static void static void
xprt_rdma_bc_free(void *buffer) xprt_rdma_bc_free(struct rpc_task *task)
{ {
/* No-op: ctxt and page have already been freed. */ /* No-op: ctxt and page have already been freed. */
} }

View File

@ -97,7 +97,7 @@ static struct ctl_table xr_tunables_table[] = {
.data = &xprt_rdma_max_inline_read, .data = &xprt_rdma_max_inline_read,
.maxlen = sizeof(unsigned int), .maxlen = sizeof(unsigned int),
.mode = 0644, .mode = 0644,
.proc_handler = proc_dointvec, .proc_handler = proc_dointvec_minmax,
.extra1 = &min_inline_size, .extra1 = &min_inline_size,
.extra2 = &max_inline_size, .extra2 = &max_inline_size,
}, },
@ -106,7 +106,7 @@ static struct ctl_table xr_tunables_table[] = {
.data = &xprt_rdma_max_inline_write, .data = &xprt_rdma_max_inline_write,
.maxlen = sizeof(unsigned int), .maxlen = sizeof(unsigned int),
.mode = 0644, .mode = 0644,
.proc_handler = proc_dointvec, .proc_handler = proc_dointvec_minmax,
.extra1 = &min_inline_size, .extra1 = &min_inline_size,
.extra2 = &max_inline_size, .extra2 = &max_inline_size,
}, },
@ -477,115 +477,152 @@ xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
} }
} }
/* /* Allocate a fixed-size buffer in which to construct and send the
* The RDMA allocate/free functions need the task structure as a place * RPC-over-RDMA header for this request.
* to hide the struct rpcrdma_req, which is necessary for the actual send/recv
* sequence.
*
* The RPC layer allocates both send and receive buffers in the same call
* (rq_send_buf and rq_rcv_buf are both part of a single contiguous buffer).
* We may register rq_rcv_buf when using reply chunks.
*/ */
static void * static bool
xprt_rdma_allocate(struct rpc_task *task, size_t size) rpcrdma_get_rdmabuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
gfp_t flags)
{ {
struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt; size_t size = RPCRDMA_HDRBUF_SIZE;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
struct rpcrdma_regbuf *rb; struct rpcrdma_regbuf *rb;
if (req->rl_rdmabuf)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
if (IS_ERR(rb))
return false;
r_xprt->rx_stats.hardway_register_count += size;
req->rl_rdmabuf = rb;
return true;
}
static bool
rpcrdma_get_sendbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
size_t size, gfp_t flags)
{
struct rpcrdma_regbuf *rb;
if (req->rl_sendbuf && rdmab_length(req->rl_sendbuf) >= size)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
if (IS_ERR(rb))
return false;
rpcrdma_free_regbuf(req->rl_sendbuf);
r_xprt->rx_stats.hardway_register_count += size;
req->rl_sendbuf = rb;
return true;
}
/* The rq_rcv_buf is used only if a Reply chunk is necessary.
* The decision to use a Reply chunk is made later in
* rpcrdma_marshal_req. This buffer is registered at that time.
*
* Otherwise, the associated RPC Reply arrives in a separate
* Receive buffer, arbitrarily chosen by the HCA. The buffer
* allocated here for the RPC Reply is not utilized in that
* case. See rpcrdma_inline_fixup.
*
* A regbuf is used here to remember the buffer size.
*/
static bool
rpcrdma_get_recvbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
size_t size, gfp_t flags)
{
struct rpcrdma_regbuf *rb;
if (req->rl_recvbuf && rdmab_length(req->rl_recvbuf) >= size)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
if (IS_ERR(rb))
return false;
rpcrdma_free_regbuf(req->rl_recvbuf);
r_xprt->rx_stats.hardway_register_count += size;
req->rl_recvbuf = rb;
return true;
}
/**
* xprt_rdma_allocate - allocate transport resources for an RPC
* @task: RPC task
*
* Return values:
* 0: Success; rq_buffer points to RPC buffer to use
* ENOMEM: Out of memory, call again later
* EIO: A permanent error occurred, do not retry
*
* The RDMA allocate/free functions need the task structure as a place
* to hide the struct rpcrdma_req, which is necessary for the actual
* send/recv sequence.
*
* xprt_rdma_allocate provides buffers that are already mapped for
* DMA, and a local DMA lkey is provided for each.
*/
static int
xprt_rdma_allocate(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
struct rpcrdma_req *req; struct rpcrdma_req *req;
size_t min_size;
gfp_t flags; gfp_t flags;
req = rpcrdma_buffer_get(&r_xprt->rx_buf); req = rpcrdma_buffer_get(&r_xprt->rx_buf);
if (req == NULL) if (req == NULL)
return NULL; return -ENOMEM;
flags = RPCRDMA_DEF_GFP; flags = RPCRDMA_DEF_GFP;
if (RPC_IS_SWAPPER(task)) if (RPC_IS_SWAPPER(task))
flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
if (req->rl_rdmabuf == NULL) if (!rpcrdma_get_rdmabuf(r_xprt, req, flags))
goto out_rdmabuf; goto out_fail;
if (req->rl_sendbuf == NULL) if (!rpcrdma_get_sendbuf(r_xprt, req, rqst->rq_callsize, flags))
goto out_sendbuf; goto out_fail;
if (size > req->rl_sendbuf->rg_size) if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
goto out_sendbuf; goto out_fail;
dprintk("RPC: %5u %s: send size = %zd, recv size = %zd, req = %p\n",
task->tk_pid, __func__, rqst->rq_callsize,
rqst->rq_rcvsize, req);
out:
dprintk("RPC: %s: size %zd, request 0x%p\n", __func__, size, req);
req->rl_connect_cookie = 0; /* our reserved value */ req->rl_connect_cookie = 0; /* our reserved value */
req->rl_task = task; rpcrdma_set_xprtdata(rqst, req);
return req->rl_sendbuf->rg_base; rqst->rq_buffer = req->rl_sendbuf->rg_base;
rqst->rq_rbuffer = req->rl_recvbuf->rg_base;
out_rdmabuf: return 0;
min_size = RPCRDMA_INLINE_WRITE_THRESHOLD(task->tk_rqstp);
rb = rpcrdma_alloc_regbuf(&r_xprt->rx_ia, min_size, flags);
if (IS_ERR(rb))
goto out_fail;
req->rl_rdmabuf = rb;
out_sendbuf:
/* XDR encoding and RPC/RDMA marshaling of this request has not
* yet occurred. Thus a lower bound is needed to prevent buffer
* overrun during marshaling.
*
* RPC/RDMA marshaling may choose to send payload bearing ops
* inline, if the result is smaller than the inline threshold.
* The value of the "size" argument accounts for header
* requirements but not for the payload in these cases.
*
* Likewise, allocate enough space to receive a reply up to the
* size of the inline threshold.
*
* It's unlikely that both the send header and the received
* reply will be large, but slush is provided here to allow
* flexibility when marshaling.
*/
min_size = RPCRDMA_INLINE_READ_THRESHOLD(task->tk_rqstp);
min_size += RPCRDMA_INLINE_WRITE_THRESHOLD(task->tk_rqstp);
if (size < min_size)
size = min_size;
rb = rpcrdma_alloc_regbuf(&r_xprt->rx_ia, size, flags);
if (IS_ERR(rb))
goto out_fail;
rb->rg_owner = req;
r_xprt->rx_stats.hardway_register_count += size;
rpcrdma_free_regbuf(&r_xprt->rx_ia, req->rl_sendbuf);
req->rl_sendbuf = rb;
goto out;
out_fail: out_fail:
rpcrdma_buffer_put(req); rpcrdma_buffer_put(req);
return NULL; return -ENOMEM;
} }
/* /**
* This function returns all RDMA resources to the pool. * xprt_rdma_free - release resources allocated by xprt_rdma_allocate
* @task: RPC task
*
* Caller guarantees rqst->rq_buffer is non-NULL.
*/ */
static void static void
xprt_rdma_free(void *buffer) xprt_rdma_free(struct rpc_task *task)
{ {
struct rpcrdma_req *req; struct rpc_rqst *rqst = task->tk_rqstp;
struct rpcrdma_xprt *r_xprt; struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
struct rpcrdma_regbuf *rb; struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
if (buffer == NULL)
return;
rb = container_of(buffer, struct rpcrdma_regbuf, rg_base[0]);
req = rb->rg_owner;
if (req->rl_backchannel) if (req->rl_backchannel)
return; return;
r_xprt = container_of(req->rl_buffer, struct rpcrdma_xprt, rx_buf);
dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply); dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply);
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
!RPC_IS_ASYNC(req->rl_task)); rpcrdma_unmap_sges(ia, req);
rpcrdma_buffer_put(req); rpcrdma_buffer_put(req);
} }
@ -685,10 +722,11 @@ void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
r_xprt->rx_stats.failed_marshal_count, r_xprt->rx_stats.failed_marshal_count,
r_xprt->rx_stats.bad_reply_count, r_xprt->rx_stats.bad_reply_count,
r_xprt->rx_stats.nomsg_call_count); r_xprt->rx_stats.nomsg_call_count);
seq_printf(seq, "%lu %lu %lu\n", seq_printf(seq, "%lu %lu %lu %lu\n",
r_xprt->rx_stats.mrs_recovered, r_xprt->rx_stats.mrs_recovered,
r_xprt->rx_stats.mrs_orphaned, r_xprt->rx_stats.mrs_orphaned,
r_xprt->rx_stats.mrs_allocated); r_xprt->rx_stats.mrs_allocated,
r_xprt->rx_stats.local_inv_needed);
} }
static int static int

View File

@ -129,15 +129,6 @@ rpcrdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
wc->status, wc->vendor_err); wc->status, wc->vendor_err);
} }
static void
rpcrdma_receive_worker(struct work_struct *work)
{
struct rpcrdma_rep *rep =
container_of(work, struct rpcrdma_rep, rr_work);
rpcrdma_reply_handler(rep);
}
/* Perform basic sanity checking to avoid using garbage /* Perform basic sanity checking to avoid using garbage
* to update the credit grant value. * to update the credit grant value.
*/ */
@ -161,13 +152,13 @@ rpcrdma_update_granted_credits(struct rpcrdma_rep *rep)
} }
/** /**
* rpcrdma_receive_wc - Invoked by RDMA provider for each polled Receive WC * rpcrdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
* @cq: completion queue (ignored) * @cq: completion queue (ignored)
* @wc: completed WR * @wc: completed WR
* *
*/ */
static void static void
rpcrdma_receive_wc(struct ib_cq *cq, struct ib_wc *wc) rpcrdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
{ {
struct ib_cqe *cqe = wc->wr_cqe; struct ib_cqe *cqe = wc->wr_cqe;
struct rpcrdma_rep *rep = container_of(cqe, struct rpcrdma_rep, struct rpcrdma_rep *rep = container_of(cqe, struct rpcrdma_rep,
@ -185,6 +176,9 @@ rpcrdma_receive_wc(struct ib_cq *cq, struct ib_wc *wc)
__func__, rep, wc->byte_len); __func__, rep, wc->byte_len);
rep->rr_len = wc->byte_len; rep->rr_len = wc->byte_len;
rep->rr_wc_flags = wc->wc_flags;
rep->rr_inv_rkey = wc->ex.invalidate_rkey;
ib_dma_sync_single_for_cpu(rep->rr_device, ib_dma_sync_single_for_cpu(rep->rr_device,
rdmab_addr(rep->rr_rdmabuf), rdmab_addr(rep->rr_rdmabuf),
rep->rr_len, DMA_FROM_DEVICE); rep->rr_len, DMA_FROM_DEVICE);
@ -204,6 +198,36 @@ out_fail:
goto out_schedule; goto out_schedule;
} }
static void
rpcrdma_update_connect_private(struct rpcrdma_xprt *r_xprt,
struct rdma_conn_param *param)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
const struct rpcrdma_connect_private *pmsg = param->private_data;
unsigned int rsize, wsize;
/* Default settings for RPC-over-RDMA Version One */
r_xprt->rx_ia.ri_reminv_expected = false;
rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
wsize = RPCRDMA_V1_DEF_INLINE_SIZE;
if (pmsg &&
pmsg->cp_magic == rpcrdma_cmp_magic &&
pmsg->cp_version == RPCRDMA_CMP_VERSION) {
r_xprt->rx_ia.ri_reminv_expected = true;
rsize = rpcrdma_decode_buffer_size(pmsg->cp_send_size);
wsize = rpcrdma_decode_buffer_size(pmsg->cp_recv_size);
}
if (rsize < cdata->inline_rsize)
cdata->inline_rsize = rsize;
if (wsize < cdata->inline_wsize)
cdata->inline_wsize = wsize;
pr_info("rpcrdma: max send %u, max recv %u\n",
cdata->inline_wsize, cdata->inline_rsize);
rpcrdma_set_max_header_sizes(r_xprt);
}
static int static int
rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
{ {
@ -244,6 +268,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
" (%d initiator)\n", " (%d initiator)\n",
__func__, attr->max_dest_rd_atomic, __func__, attr->max_dest_rd_atomic,
attr->max_rd_atomic); attr->max_rd_atomic);
rpcrdma_update_connect_private(xprt, &event->param.conn);
goto connected; goto connected;
case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_CONNECT_ERROR:
connstate = -ENOTCONN; connstate = -ENOTCONN;
@ -454,11 +479,12 @@ int
rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
struct rpcrdma_create_data_internal *cdata) struct rpcrdma_create_data_internal *cdata)
{ {
struct rpcrdma_connect_private *pmsg = &ep->rep_cm_private;
struct ib_cq *sendcq, *recvcq; struct ib_cq *sendcq, *recvcq;
unsigned int max_qp_wr; unsigned int max_qp_wr;
int rc; int rc;
if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_IOVS) { if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_SEND_SGES) {
dprintk("RPC: %s: insufficient sge's available\n", dprintk("RPC: %s: insufficient sge's available\n",
__func__); __func__);
return -ENOMEM; return -ENOMEM;
@ -487,7 +513,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
ep->rep_attr.cap.max_recv_wr = cdata->max_requests; ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS; ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
ep->rep_attr.cap.max_recv_wr += 1; /* drain cqe */ ep->rep_attr.cap.max_recv_wr += 1; /* drain cqe */
ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_IOVS; ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_SEND_SGES;
ep->rep_attr.cap.max_recv_sge = 1; ep->rep_attr.cap.max_recv_sge = 1;
ep->rep_attr.cap.max_inline_data = 0; ep->rep_attr.cap.max_inline_data = 0;
ep->rep_attr.sq_sig_type = IB_SIGNAL_REQ_WR; ep->rep_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@ -536,9 +562,14 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
/* Initialize cma parameters */ /* Initialize cma parameters */
memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma)); memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma));
/* RPC/RDMA does not use private data */ /* Prepare RDMA-CM private message */
ep->rep_remote_cma.private_data = NULL; pmsg->cp_magic = rpcrdma_cmp_magic;
ep->rep_remote_cma.private_data_len = 0; pmsg->cp_version = RPCRDMA_CMP_VERSION;
pmsg->cp_flags |= ia->ri_ops->ro_send_w_inv_ok;
pmsg->cp_send_size = rpcrdma_encode_buffer_size(cdata->inline_wsize);
pmsg->cp_recv_size = rpcrdma_encode_buffer_size(cdata->inline_rsize);
ep->rep_remote_cma.private_data = pmsg;
ep->rep_remote_cma.private_data_len = sizeof(*pmsg);
/* Client offers RDMA Read but does not initiate */ /* Client offers RDMA Read but does not initiate */
ep->rep_remote_cma.initiator_depth = 0; ep->rep_remote_cma.initiator_depth = 0;
@ -849,6 +880,10 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
req->rl_cqe.done = rpcrdma_wc_send; req->rl_cqe.done = rpcrdma_wc_send;
req->rl_buffer = &r_xprt->rx_buf; req->rl_buffer = &r_xprt->rx_buf;
INIT_LIST_HEAD(&req->rl_registered); INIT_LIST_HEAD(&req->rl_registered);
req->rl_send_wr.next = NULL;
req->rl_send_wr.wr_cqe = &req->rl_cqe;
req->rl_send_wr.sg_list = req->rl_send_sge;
req->rl_send_wr.opcode = IB_WR_SEND;
return req; return req;
} }
@ -865,17 +900,21 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
if (rep == NULL) if (rep == NULL)
goto out; goto out;
rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize, rep->rr_rdmabuf = rpcrdma_alloc_regbuf(cdata->inline_rsize,
GFP_KERNEL); DMA_FROM_DEVICE, GFP_KERNEL);
if (IS_ERR(rep->rr_rdmabuf)) { if (IS_ERR(rep->rr_rdmabuf)) {
rc = PTR_ERR(rep->rr_rdmabuf); rc = PTR_ERR(rep->rr_rdmabuf);
goto out_free; goto out_free;
} }
rep->rr_device = ia->ri_device; rep->rr_device = ia->ri_device;
rep->rr_cqe.done = rpcrdma_receive_wc; rep->rr_cqe.done = rpcrdma_wc_receive;
rep->rr_rxprt = r_xprt; rep->rr_rxprt = r_xprt;
INIT_WORK(&rep->rr_work, rpcrdma_receive_worker); INIT_WORK(&rep->rr_work, rpcrdma_reply_handler);
rep->rr_recv_wr.next = NULL;
rep->rr_recv_wr.wr_cqe = &rep->rr_cqe;
rep->rr_recv_wr.sg_list = &rep->rr_rdmabuf->rg_iov;
rep->rr_recv_wr.num_sge = 1;
return rep; return rep;
out_free: out_free:
@ -966,17 +1005,18 @@ rpcrdma_buffer_get_rep_locked(struct rpcrdma_buffer *buf)
} }
static void static void
rpcrdma_destroy_rep(struct rpcrdma_ia *ia, struct rpcrdma_rep *rep) rpcrdma_destroy_rep(struct rpcrdma_rep *rep)
{ {
rpcrdma_free_regbuf(ia, rep->rr_rdmabuf); rpcrdma_free_regbuf(rep->rr_rdmabuf);
kfree(rep); kfree(rep);
} }
void void
rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct rpcrdma_req *req) rpcrdma_destroy_req(struct rpcrdma_req *req)
{ {
rpcrdma_free_regbuf(ia, req->rl_sendbuf); rpcrdma_free_regbuf(req->rl_recvbuf);
rpcrdma_free_regbuf(ia, req->rl_rdmabuf); rpcrdma_free_regbuf(req->rl_sendbuf);
rpcrdma_free_regbuf(req->rl_rdmabuf);
kfree(req); kfree(req);
} }
@ -1009,15 +1049,13 @@ rpcrdma_destroy_mrs(struct rpcrdma_buffer *buf)
void void
rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf) rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{ {
struct rpcrdma_ia *ia = rdmab_to_ia(buf);
cancel_delayed_work_sync(&buf->rb_recovery_worker); cancel_delayed_work_sync(&buf->rb_recovery_worker);
while (!list_empty(&buf->rb_recv_bufs)) { while (!list_empty(&buf->rb_recv_bufs)) {
struct rpcrdma_rep *rep; struct rpcrdma_rep *rep;
rep = rpcrdma_buffer_get_rep_locked(buf); rep = rpcrdma_buffer_get_rep_locked(buf);
rpcrdma_destroy_rep(ia, rep); rpcrdma_destroy_rep(rep);
} }
buf->rb_send_count = 0; buf->rb_send_count = 0;
@ -1030,7 +1068,7 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
list_del(&req->rl_all); list_del(&req->rl_all);
spin_unlock(&buf->rb_reqslock); spin_unlock(&buf->rb_reqslock);
rpcrdma_destroy_req(ia, req); rpcrdma_destroy_req(req);
spin_lock(&buf->rb_reqslock); spin_lock(&buf->rb_reqslock);
} }
spin_unlock(&buf->rb_reqslock); spin_unlock(&buf->rb_reqslock);
@ -1129,7 +1167,7 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
struct rpcrdma_buffer *buffers = req->rl_buffer; struct rpcrdma_buffer *buffers = req->rl_buffer;
struct rpcrdma_rep *rep = req->rl_reply; struct rpcrdma_rep *rep = req->rl_reply;
req->rl_niovs = 0; req->rl_send_wr.num_sge = 0;
req->rl_reply = NULL; req->rl_reply = NULL;
spin_lock(&buffers->rb_lock); spin_lock(&buffers->rb_lock);
@ -1171,70 +1209,81 @@ rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
spin_unlock(&buffers->rb_lock); spin_unlock(&buffers->rb_lock);
} }
/*
* Wrappers for internal-use kmalloc memory registration, used by buffer code.
*/
/** /**
* rpcrdma_alloc_regbuf - kmalloc and register memory for SEND/RECV buffers * rpcrdma_alloc_regbuf - allocate and DMA-map memory for SEND/RECV buffers
* @ia: controlling rpcrdma_ia
* @size: size of buffer to be allocated, in bytes * @size: size of buffer to be allocated, in bytes
* @direction: direction of data movement
* @flags: GFP flags * @flags: GFP flags
* *
* Returns pointer to private header of an area of internally * Returns an ERR_PTR, or a pointer to a regbuf, a buffer that
* registered memory, or an ERR_PTR. The registered buffer follows * can be persistently DMA-mapped for I/O.
* the end of the private header.
* *
* xprtrdma uses a regbuf for posting an outgoing RDMA SEND, or for * xprtrdma uses a regbuf for posting an outgoing RDMA SEND, or for
* receiving the payload of RDMA RECV operations. regbufs are not * receiving the payload of RDMA RECV operations. During Long Calls
* used for RDMA READ/WRITE operations, thus are registered only for * or Replies they may be registered externally via ro_map.
* LOCAL access.
*/ */
struct rpcrdma_regbuf * struct rpcrdma_regbuf *
rpcrdma_alloc_regbuf(struct rpcrdma_ia *ia, size_t size, gfp_t flags) rpcrdma_alloc_regbuf(size_t size, enum dma_data_direction direction,
gfp_t flags)
{ {
struct rpcrdma_regbuf *rb; struct rpcrdma_regbuf *rb;
struct ib_sge *iov;
rb = kmalloc(sizeof(*rb) + size, flags); rb = kmalloc(sizeof(*rb) + size, flags);
if (rb == NULL) if (rb == NULL)
goto out; return ERR_PTR(-ENOMEM);
iov = &rb->rg_iov; rb->rg_device = NULL;
iov->addr = ib_dma_map_single(ia->ri_device, rb->rg_direction = direction;
(void *)rb->rg_base, size, rb->rg_iov.length = size;
DMA_BIDIRECTIONAL);
if (ib_dma_mapping_error(ia->ri_device, iov->addr))
goto out_free;
iov->length = size;
iov->lkey = ia->ri_pd->local_dma_lkey;
rb->rg_size = size;
rb->rg_owner = NULL;
return rb; return rb;
}
out_free: /**
kfree(rb); * __rpcrdma_map_regbuf - DMA-map a regbuf
out: * @ia: controlling rpcrdma_ia
return ERR_PTR(-ENOMEM); * @rb: regbuf to be mapped
*/
bool
__rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
{
if (rb->rg_direction == DMA_NONE)
return false;
rb->rg_iov.addr = ib_dma_map_single(ia->ri_device,
(void *)rb->rg_base,
rdmab_length(rb),
rb->rg_direction);
if (ib_dma_mapping_error(ia->ri_device, rdmab_addr(rb)))
return false;
rb->rg_device = ia->ri_device;
rb->rg_iov.lkey = ia->ri_pd->local_dma_lkey;
return true;
}
static void
rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb)
{
if (!rpcrdma_regbuf_is_mapped(rb))
return;
ib_dma_unmap_single(rb->rg_device, rdmab_addr(rb),
rdmab_length(rb), rb->rg_direction);
rb->rg_device = NULL;
} }
/** /**
* rpcrdma_free_regbuf - deregister and free registered buffer * rpcrdma_free_regbuf - deregister and free registered buffer
* @ia: controlling rpcrdma_ia
* @rb: regbuf to be deregistered and freed * @rb: regbuf to be deregistered and freed
*/ */
void void
rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb) rpcrdma_free_regbuf(struct rpcrdma_regbuf *rb)
{ {
struct ib_sge *iov;
if (!rb) if (!rb)
return; return;
iov = &rb->rg_iov; rpcrdma_dma_unmap_regbuf(rb);
ib_dma_unmap_single(ia->ri_device,
iov->addr, iov->length, DMA_BIDIRECTIONAL);
kfree(rb); kfree(rb);
} }
@ -1248,39 +1297,28 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
struct rpcrdma_ep *ep, struct rpcrdma_ep *ep,
struct rpcrdma_req *req) struct rpcrdma_req *req)
{ {
struct ib_device *device = ia->ri_device; struct ib_send_wr *send_wr = &req->rl_send_wr;
struct ib_send_wr send_wr, *send_wr_fail; struct ib_send_wr *send_wr_fail;
struct rpcrdma_rep *rep = req->rl_reply; int rc;
struct ib_sge *iov = req->rl_send_iov;
int i, rc;
if (rep) { if (req->rl_reply) {
rc = rpcrdma_ep_post_recv(ia, ep, rep); rc = rpcrdma_ep_post_recv(ia, req->rl_reply);
if (rc) if (rc)
return rc; return rc;
req->rl_reply = NULL; req->rl_reply = NULL;
} }
send_wr.next = NULL;
send_wr.wr_cqe = &req->rl_cqe;
send_wr.sg_list = iov;
send_wr.num_sge = req->rl_niovs;
send_wr.opcode = IB_WR_SEND;
for (i = 0; i < send_wr.num_sge; i++)
ib_dma_sync_single_for_device(device, iov[i].addr,
iov[i].length, DMA_TO_DEVICE);
dprintk("RPC: %s: posting %d s/g entries\n", dprintk("RPC: %s: posting %d s/g entries\n",
__func__, send_wr.num_sge); __func__, send_wr->num_sge);
if (DECR_CQCOUNT(ep) > 0) if (DECR_CQCOUNT(ep) > 0)
send_wr.send_flags = 0; send_wr->send_flags = 0;
else { /* Provider must take a send completion every now and then */ else { /* Provider must take a send completion every now and then */
INIT_CQCOUNT(ep); INIT_CQCOUNT(ep);
send_wr.send_flags = IB_SEND_SIGNALED; send_wr->send_flags = IB_SEND_SIGNALED;
} }
rc = ib_post_send(ia->ri_id->qp, &send_wr, &send_wr_fail); rc = ib_post_send(ia->ri_id->qp, send_wr, &send_wr_fail);
if (rc) if (rc)
goto out_postsend_err; goto out_postsend_err;
return 0; return 0;
@ -1290,32 +1328,24 @@ out_postsend_err:
return -ENOTCONN; return -ENOTCONN;
} }
/*
* (Re)post a receive buffer.
*/
int int
rpcrdma_ep_post_recv(struct rpcrdma_ia *ia, rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
struct rpcrdma_ep *ep,
struct rpcrdma_rep *rep) struct rpcrdma_rep *rep)
{ {
struct ib_recv_wr recv_wr, *recv_wr_fail; struct ib_recv_wr *recv_wr_fail;
int rc; int rc;
recv_wr.next = NULL; if (!rpcrdma_dma_map_regbuf(ia, rep->rr_rdmabuf))
recv_wr.wr_cqe = &rep->rr_cqe; goto out_map;
recv_wr.sg_list = &rep->rr_rdmabuf->rg_iov; rc = ib_post_recv(ia->ri_id->qp, &rep->rr_recv_wr, &recv_wr_fail);
recv_wr.num_sge = 1;
ib_dma_sync_single_for_cpu(ia->ri_device,
rdmab_addr(rep->rr_rdmabuf),
rdmab_length(rep->rr_rdmabuf),
DMA_BIDIRECTIONAL);
rc = ib_post_recv(ia->ri_id->qp, &recv_wr, &recv_wr_fail);
if (rc) if (rc)
goto out_postrecv; goto out_postrecv;
return 0; return 0;
out_map:
pr_err("rpcrdma: failed to DMA map the Receive buffer\n");
return -EIO;
out_postrecv: out_postrecv:
pr_err("rpcrdma: ib_post_recv returned %i\n", rc); pr_err("rpcrdma: ib_post_recv returned %i\n", rc);
return -ENOTCONN; return -ENOTCONN;
@ -1333,7 +1363,6 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, unsigned int count)
{ {
struct rpcrdma_buffer *buffers = &r_xprt->rx_buf; struct rpcrdma_buffer *buffers = &r_xprt->rx_buf;
struct rpcrdma_ia *ia = &r_xprt->rx_ia; struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_ep *ep = &r_xprt->rx_ep;
struct rpcrdma_rep *rep; struct rpcrdma_rep *rep;
int rc; int rc;
@ -1344,7 +1373,7 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, unsigned int count)
rep = rpcrdma_buffer_get_rep_locked(buffers); rep = rpcrdma_buffer_get_rep_locked(buffers);
spin_unlock(&buffers->rb_lock); spin_unlock(&buffers->rb_lock);
rc = rpcrdma_ep_post_recv(ia, ep, rep); rc = rpcrdma_ep_post_recv(ia, rep);
if (rc) if (rc)
goto out_rc; goto out_rc;
} }

View File

@ -70,9 +70,11 @@ struct rpcrdma_ia {
struct ib_pd *ri_pd; struct ib_pd *ri_pd;
struct completion ri_done; struct completion ri_done;
int ri_async_rc; int ri_async_rc;
unsigned int ri_max_segs;
unsigned int ri_max_frmr_depth; unsigned int ri_max_frmr_depth;
unsigned int ri_max_inline_write; unsigned int ri_max_inline_write;
unsigned int ri_max_inline_read; unsigned int ri_max_inline_read;
bool ri_reminv_expected;
struct ib_qp_attr ri_qp_attr; struct ib_qp_attr ri_qp_attr;
struct ib_qp_init_attr ri_qp_init_attr; struct ib_qp_init_attr ri_qp_init_attr;
}; };
@ -87,6 +89,7 @@ struct rpcrdma_ep {
int rep_connected; int rep_connected;
struct ib_qp_init_attr rep_attr; struct ib_qp_init_attr rep_attr;
wait_queue_head_t rep_connect_wait; wait_queue_head_t rep_connect_wait;
struct rpcrdma_connect_private rep_cm_private;
struct rdma_conn_param rep_remote_cma; struct rdma_conn_param rep_remote_cma;
struct sockaddr_storage rep_remote_addr; struct sockaddr_storage rep_remote_addr;
struct delayed_work rep_connect_worker; struct delayed_work rep_connect_worker;
@ -112,9 +115,9 @@ struct rpcrdma_ep {
*/ */
struct rpcrdma_regbuf { struct rpcrdma_regbuf {
size_t rg_size;
struct rpcrdma_req *rg_owner;
struct ib_sge rg_iov; struct ib_sge rg_iov;
struct ib_device *rg_device;
enum dma_data_direction rg_direction;
__be32 rg_base[0] __attribute__ ((aligned(256))); __be32 rg_base[0] __attribute__ ((aligned(256)));
}; };
@ -162,7 +165,10 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
* The smallest inline threshold is 1024 bytes, ensuring that * The smallest inline threshold is 1024 bytes, ensuring that
* at least 750 bytes are available for RPC messages. * at least 750 bytes are available for RPC messages.
*/ */
#define RPCRDMA_MAX_HDR_SEGS (8) enum {
RPCRDMA_MAX_HDR_SEGS = 8,
RPCRDMA_HDRBUF_SIZE = 256,
};
/* /*
* struct rpcrdma_rep -- this structure encapsulates state required to recv * struct rpcrdma_rep -- this structure encapsulates state required to recv
@ -182,10 +188,13 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
struct rpcrdma_rep { struct rpcrdma_rep {
struct ib_cqe rr_cqe; struct ib_cqe rr_cqe;
unsigned int rr_len; unsigned int rr_len;
int rr_wc_flags;
u32 rr_inv_rkey;
struct ib_device *rr_device; struct ib_device *rr_device;
struct rpcrdma_xprt *rr_rxprt; struct rpcrdma_xprt *rr_rxprt;
struct work_struct rr_work; struct work_struct rr_work;
struct list_head rr_list; struct list_head rr_list;
struct ib_recv_wr rr_recv_wr;
struct rpcrdma_regbuf *rr_rdmabuf; struct rpcrdma_regbuf *rr_rdmabuf;
}; };
@ -276,19 +285,30 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
char *mr_offset; /* kva if no page, else offset */ char *mr_offset; /* kva if no page, else offset */
}; };
#define RPCRDMA_MAX_IOVS (2) /* Reserve enough Send SGEs to send a maximum size inline request:
* - RPC-over-RDMA header
* - xdr_buf head iovec
* - RPCRDMA_MAX_INLINE bytes, possibly unaligned, in pages
* - xdr_buf tail iovec
*/
enum {
RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,
};
struct rpcrdma_buffer; struct rpcrdma_buffer;
struct rpcrdma_req { struct rpcrdma_req {
struct list_head rl_free; struct list_head rl_free;
unsigned int rl_niovs; unsigned int rl_mapped_sges;
unsigned int rl_connect_cookie; unsigned int rl_connect_cookie;
struct rpc_task *rl_task;
struct rpcrdma_buffer *rl_buffer; struct rpcrdma_buffer *rl_buffer;
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */ struct rpcrdma_rep *rl_reply;
struct ib_sge rl_send_iov[RPCRDMA_MAX_IOVS]; struct ib_send_wr rl_send_wr;
struct rpcrdma_regbuf *rl_rdmabuf; struct ib_sge rl_send_sge[RPCRDMA_MAX_SEND_SGES];
struct rpcrdma_regbuf *rl_sendbuf; struct rpcrdma_regbuf *rl_rdmabuf; /* xprt header */
struct rpcrdma_regbuf *rl_sendbuf; /* rq_snd_buf */
struct rpcrdma_regbuf *rl_recvbuf; /* rq_rcv_buf */
struct ib_cqe rl_cqe; struct ib_cqe rl_cqe;
struct list_head rl_all; struct list_head rl_all;
@ -298,14 +318,16 @@ struct rpcrdma_req {
struct rpcrdma_mr_seg rl_segments[RPCRDMA_MAX_SEGS]; struct rpcrdma_mr_seg rl_segments[RPCRDMA_MAX_SEGS];
}; };
static inline void
rpcrdma_set_xprtdata(struct rpc_rqst *rqst, struct rpcrdma_req *req)
{
rqst->rq_xprtdata = req;
}
static inline struct rpcrdma_req * static inline struct rpcrdma_req *
rpcr_to_rdmar(struct rpc_rqst *rqst) rpcr_to_rdmar(struct rpc_rqst *rqst)
{ {
void *buffer = rqst->rq_buffer; return rqst->rq_xprtdata;
struct rpcrdma_regbuf *rb;
rb = container_of(buffer, struct rpcrdma_regbuf, rg_base);
return rb->rg_owner;
} }
/* /*
@ -356,15 +378,6 @@ struct rpcrdma_create_data_internal {
unsigned int padding; /* non-rdma write header padding */ unsigned int padding; /* non-rdma write header padding */
}; };
#define RPCRDMA_INLINE_READ_THRESHOLD(rq) \
(rpcx_to_rdmad(rq->rq_xprt).inline_rsize)
#define RPCRDMA_INLINE_WRITE_THRESHOLD(rq)\
(rpcx_to_rdmad(rq->rq_xprt).inline_wsize)
#define RPCRDMA_INLINE_PAD_VALUE(rq)\
rpcx_to_rdmad(rq->rq_xprt).padding
/* /*
* Statistics for RPCRDMA * Statistics for RPCRDMA
*/ */
@ -386,6 +399,7 @@ struct rpcrdma_stats {
unsigned long mrs_recovered; unsigned long mrs_recovered;
unsigned long mrs_orphaned; unsigned long mrs_orphaned;
unsigned long mrs_allocated; unsigned long mrs_allocated;
unsigned long local_inv_needed;
}; };
/* /*
@ -409,6 +423,7 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mw *); struct rpcrdma_mw *);
void (*ro_release_mr)(struct rpcrdma_mw *); void (*ro_release_mr)(struct rpcrdma_mw *);
const char *ro_displayname; const char *ro_displayname;
const int ro_send_w_inv_ok;
}; };
extern const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops; extern const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops;
@ -461,15 +476,14 @@ void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *, int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
struct rpcrdma_req *); struct rpcrdma_req *);
int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_ep *, int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_rep *);
struct rpcrdma_rep *);
/* /*
* Buffer calls - xprtrdma/verbs.c * Buffer calls - xprtrdma/verbs.c
*/ */
struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *); struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *); struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *);
void rpcrdma_destroy_req(struct rpcrdma_ia *, struct rpcrdma_req *); void rpcrdma_destroy_req(struct rpcrdma_req *);
int rpcrdma_buffer_create(struct rpcrdma_xprt *); int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *); void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);
@ -482,10 +496,24 @@ void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);
void rpcrdma_defer_mr_recovery(struct rpcrdma_mw *); void rpcrdma_defer_mr_recovery(struct rpcrdma_mw *);
struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum dma_data_direction,
size_t, gfp_t); gfp_t);
void rpcrdma_free_regbuf(struct rpcrdma_ia *, bool __rpcrdma_dma_map_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *);
struct rpcrdma_regbuf *); void rpcrdma_free_regbuf(struct rpcrdma_regbuf *);
static inline bool
rpcrdma_regbuf_is_mapped(struct rpcrdma_regbuf *rb)
{
return rb->rg_device != NULL;
}
static inline bool
rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
{
if (likely(rpcrdma_regbuf_is_mapped(rb)))
return true;
return __rpcrdma_dma_map_regbuf(ia, rb);
}
int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int); int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int);
@ -507,15 +535,25 @@ rpcrdma_data_dir(bool writing)
*/ */
void rpcrdma_connect_worker(struct work_struct *); void rpcrdma_connect_worker(struct work_struct *);
void rpcrdma_conn_func(struct rpcrdma_ep *); void rpcrdma_conn_func(struct rpcrdma_ep *);
void rpcrdma_reply_handler(struct rpcrdma_rep *); void rpcrdma_reply_handler(struct work_struct *);
/* /*
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c * RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/ */
enum rpcrdma_chunktype {
rpcrdma_noch = 0,
rpcrdma_readch,
rpcrdma_areadch,
rpcrdma_writech,
rpcrdma_replych
};
bool rpcrdma_prepare_send_sges(struct rpcrdma_ia *, struct rpcrdma_req *,
u32, struct xdr_buf *, enum rpcrdma_chunktype);
void rpcrdma_unmap_sges(struct rpcrdma_ia *, struct rpcrdma_req *);
int rpcrdma_marshal_req(struct rpc_rqst *); int rpcrdma_marshal_req(struct rpc_rqst *);
void rpcrdma_set_max_header_sizes(struct rpcrdma_ia *, void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
struct rpcrdma_create_data_internal *,
unsigned int);
/* RPC/RDMA module init - xprtrdma/transport.c /* RPC/RDMA module init - xprtrdma/transport.c
*/ */

View File

@ -473,7 +473,16 @@ static int xs_nospace(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock); spin_unlock_bh(&xprt->transport_lock);
/* Race breaker in case memory is freed before above code is called */ /* Race breaker in case memory is freed before above code is called */
sk->sk_write_space(sk); if (ret == -EAGAIN) {
struct socket_wq *wq;
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags);
rcu_read_unlock();
sk->sk_write_space(sk);
}
return ret; return ret;
} }
@ -2533,35 +2542,38 @@ static void xs_tcp_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
* we allocate pages instead doing a kmalloc like rpc_malloc is because we want * we allocate pages instead doing a kmalloc like rpc_malloc is because we want
* to use the server side send routines. * to use the server side send routines.
*/ */
static void *bc_malloc(struct rpc_task *task, size_t size) static int bc_malloc(struct rpc_task *task)
{ {
struct rpc_rqst *rqst = task->tk_rqstp;
size_t size = rqst->rq_callsize;
struct page *page; struct page *page;
struct rpc_buffer *buf; struct rpc_buffer *buf;
WARN_ON_ONCE(size > PAGE_SIZE - sizeof(struct rpc_buffer)); if (size > PAGE_SIZE - sizeof(struct rpc_buffer)) {
if (size > PAGE_SIZE - sizeof(struct rpc_buffer)) WARN_ONCE(1, "xprtsock: large bc buffer request (size %zu)\n",
return NULL; size);
return -EINVAL;
}
page = alloc_page(GFP_KERNEL); page = alloc_page(GFP_KERNEL);
if (!page) if (!page)
return NULL; return -ENOMEM;
buf = page_address(page); buf = page_address(page);
buf->len = PAGE_SIZE; buf->len = PAGE_SIZE;
return buf->data; rqst->rq_buffer = buf->data;
return 0;
} }
/* /*
* Free the space allocated in the bc_alloc routine * Free the space allocated in the bc_alloc routine
*/ */
static void bc_free(void *buffer) static void bc_free(struct rpc_task *task)
{ {
void *buffer = task->tk_rqstp->rq_buffer;
struct rpc_buffer *buf; struct rpc_buffer *buf;
if (!buffer)
return;
buf = container_of(buffer, struct rpc_buffer, data); buf = container_of(buffer, struct rpc_buffer, data);
free_page((unsigned long)buf); free_page((unsigned long)buf);
} }