2005-07-08 04:57:13 +04:00
/*
* Copyright ( c ) 2005 Topspin Communications . All rights reserved .
2007-03-05 03:15:11 +03:00
* Copyright ( c ) 2005 , 2006 , 2007 Cisco Systems . All rights reserved .
2005-09-28 02:07:25 +04:00
* Copyright ( c ) 2005 PathScale , Inc . All rights reserved .
2006-02-14 03:31:57 +03:00
* Copyright ( c ) 2006 Mellanox Technologies . All rights reserved .
2005-07-08 04:57:13 +04:00
*
* This software is available to you under a choice of one of two
* licenses . You may choose to be licensed under the terms of the GNU
* General Public License ( GPL ) Version 2 , available from the file
* COPYING in the main directory of this source tree , or the
* OpenIB . org BSD license below :
*
* Redistribution and use in source and binary forms , with or
* without modification , are permitted provided that the following
* conditions are met :
*
* - Redistributions of source code must retain the above
* copyright notice , this list of conditions and the following
* disclaimer .
*
* - Redistributions in binary form must reproduce the above
* copyright notice , this list of conditions and the following
* disclaimer in the documentation and / or other materials
* provided with the distribution .
*
* THE SOFTWARE IS PROVIDED " AS IS " , WITHOUT WARRANTY OF ANY KIND ,
* EXPRESS OR IMPLIED , INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY , FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT . IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM , DAMAGES OR OTHER LIABILITY , WHETHER IN AN
* ACTION OF CONTRACT , TORT OR OTHERWISE , ARISING FROM , OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE .
*/
2005-09-27 00:53:25 +04:00
# include <linux/file.h>
2005-10-29 02:38:26 +04:00
# include <linux/fs.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
# include <linux/slab.h>
2014-12-11 18:04:17 +03:00
# include <linux/sched.h>
2005-09-27 00:53:25 +04:00
2016-12-24 22:46:01 +03:00
# include <linux/uaccess.h>
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
# include <rdma/uverbs_types.h>
# include <rdma/uverbs_std_types.h>
# include "rdma_core.h"
2005-07-08 04:57:13 +04:00
# include "uverbs.h"
2013-12-12 20:03:17 +04:00
# include "core_priv.h"
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:47 +03:00
static struct ib_uverbs_completion_event_file *
ib_uverbs_lookup_comp_file ( int fd , struct ib_ucontext * context )
{
2018-03-19 16:02:33 +03:00
struct ib_uobject * uobj = uobj_get_read ( UVERBS_OBJECT_COMP_CHANNEL ,
2017-04-04 13:31:47 +03:00
fd , context ) ;
struct ib_uobject_file * uobj_file ;
if ( IS_ERR ( uobj ) )
return ( void * ) uobj ;
uverbs_uobject_get ( uobj ) ;
uobj_put_read ( uobj ) ;
uobj_file = container_of ( uobj , struct ib_uobject_file , uobj ) ;
return container_of ( uobj_file , struct ib_uverbs_completion_event_file ,
uobj_file ) ;
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_get_context ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_get_context cmd ;
struct ib_uverbs_get_context_resp resp ;
struct ib_udata udata ;
2005-09-27 00:01:03 +04:00
struct ib_ucontext * ucontext ;
2005-09-27 00:53:25 +04:00
struct file * filp ;
2017-01-10 03:02:14 +03:00
struct ib_rdmacg_object cg_obj ;
2005-09-27 00:01:03 +04:00
int ret ;
2005-07-08 04:57:13 +04:00
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2006-01-14 01:51:39 +03:00
mutex_lock ( & file - > mutex ) ;
2005-09-27 00:01:03 +04:00
if ( file - > ucontext ) {
ret = - EINVAL ;
goto err ;
}
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2005-07-08 04:57:13 +04:00
2017-01-10 03:02:14 +03:00
ret = ib_rdmacg_try_charge ( & cg_obj , ib_dev , RDMACG_RESOURCE_HCA_HANDLE ) ;
if ( ret )
goto err ;
2015-08-13 18:32:04 +03:00
ucontext = ib_dev - > alloc_ucontext ( ib_dev , & udata ) ;
2006-06-18 07:37:40 +04:00
if ( IS_ERR ( ucontext ) ) {
2009-12-10 01:30:44 +03:00
ret = PTR_ERR ( ucontext ) ;
2017-01-10 03:02:14 +03:00
goto err_alloc ;
2006-06-18 07:37:40 +04:00
}
2005-07-08 04:57:13 +04:00
2015-08-13 18:32:04 +03:00
ucontext - > device = ib_dev ;
2017-01-10 03:02:14 +03:00
ucontext - > cg_obj = cg_obj ;
2017-04-04 13:31:41 +03:00
/* ufile is required when some objects are released */
ucontext - > ufile = file ;
2017-04-04 13:31:44 +03:00
uverbs_initialize_ucontext ( ucontext ) ;
2014-12-11 18:04:17 +03:00
rcu_read_lock ( ) ;
ucontext - > tgid = get_task_pid ( current - > group_leader , PIDTYPE_PID ) ;
rcu_read_unlock ( ) ;
2007-03-05 03:15:11 +03:00
ucontext - > closing = 0 ;
2005-07-08 04:57:13 +04:00
2014-12-11 18:04:18 +03:00
# ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
2017-09-09 02:15:08 +03:00
ucontext - > umem_tree = RB_ROOT_CACHED ;
2014-12-11 18:04:18 +03:00
init_rwsem ( & ucontext - > umem_rwsem ) ;
ucontext - > odp_mrs_count = 0 ;
INIT_LIST_HEAD ( & ucontext - > no_private_counters ) ;
2015-12-18 11:59:45 +03:00
if ( ! ( ib_dev - > attrs . device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING ) )
2014-12-11 18:04:18 +03:00
ucontext - > invalidate_range = NULL ;
# endif
2005-09-27 00:53:25 +04:00
resp . num_comp_vectors = file - > device - > num_comp_vectors ;
2013-07-08 22:15:45 +04:00
ret = get_unused_fd_flags ( O_CLOEXEC ) ;
2010-01-18 09:38:00 +03:00
if ( ret < 0 )
goto err_free ;
resp . async_fd = ret ;
2017-04-04 13:31:47 +03:00
filp = ib_uverbs_alloc_async_event_file ( file , ib_dev ) ;
2005-09-27 00:53:25 +04:00
if ( IS_ERR ( filp ) ) {
ret = PTR_ERR ( filp ) ;
2010-01-18 09:38:00 +03:00
goto err_fd ;
2005-09-27 00:53:25 +04:00
}
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2005-09-27 00:01:03 +04:00
ret = - EFAULT ;
2005-09-27 00:53:25 +04:00
goto err_file ;
2005-09-27 00:01:03 +04:00
}
2005-10-29 02:38:26 +04:00
file - > ucontext = ucontext ;
2005-09-27 00:53:25 +04:00
fd_install ( resp . async_fd , filp ) ;
2006-01-14 01:51:39 +03:00
mutex_unlock ( & file - > mutex ) ;
2005-07-08 04:57:13 +04:00
return in_len ;
2005-09-27 00:53:25 +04:00
err_file :
2015-08-13 18:32:02 +03:00
ib_uverbs_free_async_event_file ( file ) ;
2005-09-27 00:53:25 +04:00
fput ( filp ) ;
2010-01-18 09:38:00 +03:00
err_fd :
put_unused_fd ( resp . async_fd ) ;
2005-09-27 00:01:03 +04:00
err_free :
2014-12-11 18:04:17 +03:00
put_pid ( ucontext - > tgid ) ;
2015-08-13 18:32:04 +03:00
ib_dev - > dealloc_ucontext ( ucontext ) ;
2005-07-08 04:57:13 +04:00
2017-01-10 03:02:14 +03:00
err_alloc :
ib_rdmacg_uncharge ( & cg_obj , ib_dev , RDMACG_RESOURCE_HCA_HANDLE ) ;
2005-09-27 00:01:03 +04:00
err :
2006-01-14 01:51:39 +03:00
mutex_unlock ( & file - > mutex ) ;
2005-09-27 00:01:03 +04:00
return ret ;
2005-07-08 04:57:13 +04:00
}
2015-02-08 14:28:50 +03:00
static void copy_query_dev_fields ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2015-02-08 14:28:50 +03:00
struct ib_uverbs_query_device_resp * resp ,
struct ib_device_attr * attr )
{
resp - > fw_ver = attr - > fw_ver ;
2015-08-13 18:32:04 +03:00
resp - > node_guid = ib_dev - > node_guid ;
2015-02-08 14:28:50 +03:00
resp - > sys_image_guid = attr - > sys_image_guid ;
resp - > max_mr_size = attr - > max_mr_size ;
resp - > page_size_cap = attr - > page_size_cap ;
resp - > vendor_id = attr - > vendor_id ;
resp - > vendor_part_id = attr - > vendor_part_id ;
resp - > hw_ver = attr - > hw_ver ;
resp - > max_qp = attr - > max_qp ;
resp - > max_qp_wr = attr - > max_qp_wr ;
2016-02-23 11:25:25 +03:00
resp - > device_cap_flags = lower_32_bits ( attr - > device_cap_flags ) ;
2015-02-08 14:28:50 +03:00
resp - > max_sge = attr - > max_sge ;
resp - > max_sge_rd = attr - > max_sge_rd ;
resp - > max_cq = attr - > max_cq ;
resp - > max_cqe = attr - > max_cqe ;
resp - > max_mr = attr - > max_mr ;
resp - > max_pd = attr - > max_pd ;
resp - > max_qp_rd_atom = attr - > max_qp_rd_atom ;
resp - > max_ee_rd_atom = attr - > max_ee_rd_atom ;
resp - > max_res_rd_atom = attr - > max_res_rd_atom ;
resp - > max_qp_init_rd_atom = attr - > max_qp_init_rd_atom ;
resp - > max_ee_init_rd_atom = attr - > max_ee_init_rd_atom ;
resp - > atomic_cap = attr - > atomic_cap ;
resp - > max_ee = attr - > max_ee ;
resp - > max_rdd = attr - > max_rdd ;
resp - > max_mw = attr - > max_mw ;
resp - > max_raw_ipv6_qp = attr - > max_raw_ipv6_qp ;
resp - > max_raw_ethy_qp = attr - > max_raw_ethy_qp ;
resp - > max_mcast_grp = attr - > max_mcast_grp ;
resp - > max_mcast_qp_attach = attr - > max_mcast_qp_attach ;
resp - > max_total_mcast_qp_attach = attr - > max_total_mcast_qp_attach ;
resp - > max_ah = attr - > max_ah ;
resp - > max_fmr = attr - > max_fmr ;
resp - > max_map_per_fmr = attr - > max_map_per_fmr ;
resp - > max_srq = attr - > max_srq ;
resp - > max_srq_wr = attr - > max_srq_wr ;
resp - > max_srq_sge = attr - > max_srq_sge ;
resp - > max_pkeys = attr - > max_pkeys ;
resp - > local_ca_ack_delay = attr - > local_ca_ack_delay ;
2015-08-13 18:32:04 +03:00
resp - > phys_port_cnt = ib_dev - > phys_port_cnt ;
2015-02-08 14:28:50 +03:00
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_query_device ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_query_device cmd ;
struct ib_uverbs_query_device_resp resp ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
memset ( & resp , 0 , sizeof resp ) ;
2015-12-18 11:59:45 +03:00
copy_query_dev_fields ( file , ib_dev , & resp , & ib_dev - > attrs ) ;
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2005-07-08 04:57:13 +04:00
return - EFAULT ;
return in_len ;
}
ssize_t ib_uverbs_query_port ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_query_port cmd ;
struct ib_uverbs_query_port_resp resp ;
struct ib_port_attr attr ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2015-08-13 18:32:04 +03:00
ret = ib_query_port ( ib_dev , cmd . port_num , & attr ) ;
2005-07-08 04:57:13 +04:00
if ( ret )
return ret ;
memset ( & resp , 0 , sizeof resp ) ;
resp . state = attr . state ;
resp . max_mtu = attr . max_mtu ;
resp . active_mtu = attr . active_mtu ;
resp . gid_tbl_len = attr . gid_tbl_len ;
resp . port_cap_flags = attr . port_cap_flags ;
resp . max_msg_sz = attr . max_msg_sz ;
resp . bad_pkey_cntr = attr . bad_pkey_cntr ;
resp . qkey_viol_cntr = attr . qkey_viol_cntr ;
resp . pkey_tbl_len = attr . pkey_tbl_len ;
2017-08-14 21:17:43 +03:00
2017-06-08 20:37:48 +03:00
if ( rdma_cap_opa_ah ( ib_dev , cmd . port_num ) ) {
2017-08-14 21:17:43 +03:00
resp . lid = OPA_TO_IB_UCAST_LID ( attr . lid ) ;
2017-06-08 20:37:48 +03:00
resp . sm_lid = OPA_TO_IB_UCAST_LID ( attr . sm_lid ) ;
} else {
2017-08-14 21:17:43 +03:00
resp . lid = ib_lid_cpu16 ( attr . lid ) ;
resp . sm_lid = ib_lid_cpu16 ( attr . sm_lid ) ;
2017-06-08 20:37:48 +03:00
}
2005-07-08 04:57:13 +04:00
resp . lmc = attr . lmc ;
resp . max_vl_num = attr . max_vl_num ;
resp . sm_sl = attr . sm_sl ;
resp . subnet_timeout = attr . subnet_timeout ;
resp . init_type_reply = attr . init_type_reply ;
resp . active_width = attr . active_width ;
resp . active_speed = attr . active_speed ;
resp . phys_state = attr . phys_state ;
2015-08-13 18:32:04 +03:00
resp . link_layer = rdma_port_get_link_layer ( ib_dev ,
2010-10-19 01:45:20 +04:00
cmd . port_num ) ;
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2005-07-08 04:57:13 +04:00
return - EFAULT ;
return in_len ;
}
ssize_t ib_uverbs_alloc_pd ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_alloc_pd cmd ;
struct ib_uverbs_alloc_pd_resp resp ;
struct ib_udata udata ;
struct ib_uobject * uobj ;
struct ib_pd * pd ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2005-07-08 04:57:13 +04:00
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_PD , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2005-07-08 04:57:13 +04:00
2015-08-13 18:32:04 +03:00
pd = ib_dev - > alloc_pd ( ib_dev , file - > ucontext , & udata ) ;
2005-07-08 04:57:13 +04:00
if ( IS_ERR ( pd ) ) {
ret = PTR_ERR ( pd ) ;
goto err ;
}
2015-08-13 18:32:04 +03:00
pd - > device = ib_dev ;
2005-07-08 04:57:13 +04:00
pd - > uobject = uobj ;
2016-09-05 13:56:16 +03:00
pd - > __internal_mr = NULL ;
2005-07-08 04:57:13 +04:00
atomic_set ( & pd - > usecnt , 0 ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
uobj - > object = pd ;
2005-07-08 04:57:13 +04:00
memset ( & resp , 0 , sizeof resp ) ;
resp . pd_handle = uobj - > id ;
2018-01-28 12:17:23 +03:00
pd - > res . type = RDMA_RESTRACK_PD ;
rdma_restrack_add ( & pd - > res ) ;
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2005-07-08 04:57:13 +04:00
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_copy ;
2005-07-08 04:57:13 +04:00
}
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( uobj ) ;
2005-07-08 04:57:13 +04:00
2005-09-28 02:07:25 +04:00
return in_len ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_copy :
2005-07-08 04:57:13 +04:00
ib_dealloc_pd ( pd ) ;
err :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2005-07-08 04:57:13 +04:00
return ret ;
}
ssize_t ib_uverbs_dealloc_pd ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_dealloc_pd cmd ;
struct ib_uobject * uobj ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
int ret ;
2005-07-08 04:57:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_PD , cmd . pd_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
return ret ? : in_len ;
2005-07-08 04:57:13 +04:00
}
2011-05-24 19:33:46 +04:00
struct xrcd_table_entry {
struct rb_node node ;
struct ib_xrcd * xrcd ;
struct inode * inode ;
} ;
static int xrcd_table_insert ( struct ib_uverbs_device * dev ,
struct inode * inode ,
struct ib_xrcd * xrcd )
{
struct xrcd_table_entry * entry , * scan ;
struct rb_node * * p = & dev - > xrcd_tree . rb_node ;
struct rb_node * parent = NULL ;
entry = kmalloc ( sizeof * entry , GFP_KERNEL ) ;
if ( ! entry )
return - ENOMEM ;
entry - > xrcd = xrcd ;
entry - > inode = inode ;
while ( * p ) {
parent = * p ;
scan = rb_entry ( parent , struct xrcd_table_entry , node ) ;
if ( inode < scan - > inode ) {
p = & ( * p ) - > rb_left ;
} else if ( inode > scan - > inode ) {
p = & ( * p ) - > rb_right ;
} else {
kfree ( entry ) ;
return - EEXIST ;
}
}
rb_link_node ( & entry - > node , parent , p ) ;
rb_insert_color ( & entry - > node , & dev - > xrcd_tree ) ;
igrab ( inode ) ;
return 0 ;
}
static struct xrcd_table_entry * xrcd_table_search ( struct ib_uverbs_device * dev ,
struct inode * inode )
{
struct xrcd_table_entry * entry ;
struct rb_node * p = dev - > xrcd_tree . rb_node ;
while ( p ) {
entry = rb_entry ( p , struct xrcd_table_entry , node ) ;
if ( inode < entry - > inode )
p = p - > rb_left ;
else if ( inode > entry - > inode )
p = p - > rb_right ;
else
return entry ;
}
return NULL ;
}
static struct ib_xrcd * find_xrcd ( struct ib_uverbs_device * dev , struct inode * inode )
{
struct xrcd_table_entry * entry ;
entry = xrcd_table_search ( dev , inode ) ;
if ( ! entry )
return NULL ;
return entry - > xrcd ;
}
static void xrcd_table_delete ( struct ib_uverbs_device * dev ,
struct inode * inode )
{
struct xrcd_table_entry * entry ;
entry = xrcd_table_search ( dev , inode ) ;
if ( entry ) {
iput ( inode ) ;
rb_erase ( & entry - > node , & dev - > xrcd_tree ) ;
kfree ( entry ) ;
}
}
ssize_t ib_uverbs_open_xrcd ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-05-24 19:33:46 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_open_xrcd cmd ;
struct ib_uverbs_open_xrcd_resp resp ;
struct ib_udata udata ;
struct ib_uxrcd_object * obj ;
struct ib_xrcd * xrcd = NULL ;
2012-08-28 20:52:22 +04:00
struct fd f = { NULL , 0 } ;
2011-05-24 19:33:46 +04:00
struct inode * inode = NULL ;
2012-08-28 20:52:22 +04:00
int ret = 0 ;
2011-05-24 19:33:46 +04:00
int new_xrcd = 0 ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2011-05-24 19:33:46 +04:00
mutex_lock ( & file - > device - > xrcd_tree_mutex ) ;
if ( cmd . fd ! = - 1 ) {
/* search for file descriptor */
2012-08-28 20:52:22 +04:00
f = fdget ( cmd . fd ) ;
if ( ! f . file ) {
2011-05-24 19:33:46 +04:00
ret = - EBADF ;
goto err_tree_mutex_unlock ;
}
2013-01-24 02:07:38 +04:00
inode = file_inode ( f . file ) ;
2011-05-24 19:33:46 +04:00
xrcd = find_xrcd ( file - > device , inode ) ;
if ( ! xrcd & & ! ( cmd . oflags & O_CREAT ) ) {
/* no file descriptor. Need CREATE flag */
ret = - EAGAIN ;
goto err_tree_mutex_unlock ;
}
if ( xrcd & & cmd . oflags & O_EXCL ) {
ret = - EINVAL ;
goto err_tree_mutex_unlock ;
}
}
2018-03-19 16:02:33 +03:00
obj = ( struct ib_uxrcd_object * ) uobj_alloc ( UVERBS_OBJECT_XRCD ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) ) {
ret = PTR_ERR ( obj ) ;
2011-05-24 19:33:46 +04:00
goto err_tree_mutex_unlock ;
}
if ( ! xrcd ) {
2015-08-13 18:32:04 +03:00
xrcd = ib_dev - > alloc_xrcd ( ib_dev , file - > ucontext , & udata ) ;
2011-05-24 19:33:46 +04:00
if ( IS_ERR ( xrcd ) ) {
ret = PTR_ERR ( xrcd ) ;
goto err ;
}
xrcd - > inode = inode ;
2015-08-13 18:32:04 +03:00
xrcd - > device = ib_dev ;
2011-05-24 19:33:46 +04:00
atomic_set ( & xrcd - > usecnt , 0 ) ;
mutex_init ( & xrcd - > tgt_qp_mutex ) ;
INIT_LIST_HEAD ( & xrcd - > tgt_qp_list ) ;
new_xrcd = 1 ;
}
atomic_set ( & obj - > refcnt , 0 ) ;
obj - > uobject . object = xrcd ;
memset ( & resp , 0 , sizeof resp ) ;
resp . xrcd_handle = obj - > uobject . id ;
if ( inode ) {
if ( new_xrcd ) {
/* create new inode/xrcd table entry */
ret = xrcd_table_insert ( file - > device , inode , xrcd ) ;
if ( ret )
2017-04-04 13:31:44 +03:00
goto err_dealloc_xrcd ;
2011-05-24 19:33:46 +04:00
}
atomic_inc ( & xrcd - > usecnt ) ;
}
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2011-05-24 19:33:46 +04:00
ret = - EFAULT ;
goto err_copy ;
}
2012-08-28 20:52:22 +04:00
if ( f . file )
fdput ( f ) ;
2011-05-24 19:33:46 +04:00
2018-02-14 13:35:39 +03:00
mutex_unlock ( & file - > device - > xrcd_tree_mutex ) ;
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( & obj - > uobject ) ;
2011-05-24 19:33:46 +04:00
return in_len ;
err_copy :
if ( inode ) {
if ( new_xrcd )
xrcd_table_delete ( file - > device , inode ) ;
atomic_dec ( & xrcd - > usecnt ) ;
}
2017-04-04 13:31:44 +03:00
err_dealloc_xrcd :
2011-05-24 19:33:46 +04:00
ib_dealloc_xrcd ( xrcd ) ;
err :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uobject ) ;
2011-05-24 19:33:46 +04:00
err_tree_mutex_unlock :
2012-08-28 20:52:22 +04:00
if ( f . file )
fdput ( f ) ;
2011-05-24 19:33:46 +04:00
mutex_unlock ( & file - > device - > xrcd_tree_mutex ) ;
return ret ;
}
ssize_t ib_uverbs_close_xrcd ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-05-24 19:33:46 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_close_xrcd cmd ;
struct ib_uobject * uobj ;
int ret = 0 ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_XRCD , cmd . xrcd_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2018-02-14 13:35:38 +03:00
if ( IS_ERR ( uobj ) )
2017-04-04 13:31:44 +03:00
return PTR_ERR ( uobj ) ;
2011-05-24 19:33:46 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
return ret ? : in_len ;
2011-05-24 19:33:46 +04:00
}
2017-04-04 13:31:43 +03:00
int ib_uverbs_dealloc_xrcd ( struct ib_uverbs_device * dev ,
struct ib_xrcd * xrcd ,
enum rdma_remove_reason why )
2011-05-24 19:33:46 +04:00
{
struct inode * inode ;
2017-04-04 13:31:43 +03:00
int ret ;
2011-05-24 19:33:46 +04:00
inode = xrcd - > inode ;
if ( inode & & ! atomic_dec_and_test ( & xrcd - > usecnt ) )
2017-04-04 13:31:43 +03:00
return 0 ;
2011-05-24 19:33:46 +04:00
2017-04-04 13:31:43 +03:00
ret = ib_dealloc_xrcd ( xrcd ) ;
2011-05-24 19:33:46 +04:00
2017-04-04 13:31:43 +03:00
if ( why = = RDMA_REMOVE_DESTROY & & ret )
atomic_inc ( & xrcd - > usecnt ) ;
else if ( inode )
2011-05-24 19:33:46 +04:00
xrcd_table_delete ( dev , inode ) ;
2017-04-04 13:31:43 +03:00
return ret ;
2011-05-24 19:33:46 +04:00
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_reg_mr ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_reg_mr cmd ;
struct ib_uverbs_reg_mr_resp resp ;
struct ib_udata udata ;
2007-03-05 03:15:11 +03:00
struct ib_uobject * uobj ;
2005-07-08 04:57:13 +04:00
struct ib_pd * pd ;
struct ib_mr * mr ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2005-07-08 04:57:13 +04:00
if ( ( cmd . start & ~ PAGE_MASK ) ! = ( cmd . hca_va & ~ PAGE_MASK ) )
return - EINVAL ;
2013-10-31 17:26:32 +04:00
ret = ib_check_mr_access ( cmd . access_flags ) ;
if ( ret )
return ret ;
2005-10-03 20:18:02 +04:00
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_MR , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2005-07-08 04:57:13 +04:00
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd . pd_handle , file - > ucontext ) ;
2007-02-23 00:16:51 +03:00
if ( ! pd ) {
ret = - EINVAL ;
2007-03-05 03:15:11 +03:00
goto err_free ;
2007-02-23 00:16:51 +03:00
}
2005-07-08 04:57:13 +04:00
2014-12-11 18:04:16 +03:00
if ( cmd . access_flags & IB_ACCESS_ON_DEMAND ) {
2015-12-18 11:59:45 +03:00
if ( ! ( pd - > device - > attrs . device_cap_flags &
IB_DEVICE_ON_DEMAND_PAGING ) ) {
2014-12-11 18:04:16 +03:00
pr_debug ( " ODP support not available \n " ) ;
ret = - EINVAL ;
goto err_put ;
}
}
2007-03-05 03:15:11 +03:00
mr = pd - > device - > reg_user_mr ( pd , cmd . start , cmd . length , cmd . hca_va ,
cmd . access_flags , & udata ) ;
2005-07-08 04:57:13 +04:00
if ( IS_ERR ( mr ) ) {
ret = PTR_ERR ( mr ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_put ;
2005-07-08 04:57:13 +04:00
}
mr - > device = pd - > device ;
mr - > pd = pd ;
2007-03-05 03:15:11 +03:00
mr - > uobject = uobj ;
2005-07-08 04:57:13 +04:00
atomic_inc ( & pd - > usecnt ) ;
2018-03-02 00:58:13 +03:00
mr - > res . type = RDMA_RESTRACK_MR ;
rdma_restrack_add ( & mr - > res ) ;
2005-07-08 04:57:13 +04:00
2007-03-05 03:15:11 +03:00
uobj - > object = mr ;
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
memset ( & resp , 0 , sizeof resp ) ;
resp . lkey = mr - > lkey ;
resp . rkey = mr - > rkey ;
2007-03-05 03:15:11 +03:00
resp . mr_handle = uobj - > id ;
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2005-07-08 04:57:13 +04:00
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_copy ;
2005-07-08 04:57:13 +04:00
}
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2005-09-28 02:07:25 +04:00
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( uobj ) ;
2005-07-08 04:57:13 +04:00
return in_len ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_copy :
2005-07-08 04:57:13 +04:00
ib_dereg_mr ( mr ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2005-07-08 04:57:13 +04:00
err_free :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2005-07-08 04:57:13 +04:00
return ret ;
}
2014-07-31 12:01:28 +04:00
ssize_t ib_uverbs_rereg_mr ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2014-07-31 12:01:28 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_rereg_mr cmd ;
struct ib_uverbs_rereg_mr_resp resp ;
struct ib_udata udata ;
struct ib_pd * pd = NULL ;
struct ib_mr * mr ;
struct ib_pd * old_pd ;
int ret ;
struct ib_uobject * uobj ;
if ( out_len < sizeof ( resp ) )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof ( cmd ) ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2014-07-31 12:01:28 +04:00
if ( cmd . flags & ~ IB_MR_REREG_SUPPORTED | | ! cmd . flags )
return - EINVAL ;
if ( ( cmd . flags & IB_MR_REREG_TRANS ) & &
( ! cmd . start | | ! cmd . hca_va | | 0 > = cmd . length | |
( cmd . start & ~ PAGE_MASK ) ! = ( cmd . hca_va & ~ PAGE_MASK ) ) )
return - EINVAL ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_MR , cmd . mr_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2014-07-31 12:01:28 +04:00
mr = uobj - > object ;
if ( cmd . flags & IB_MR_REREG_ACCESS ) {
ret = ib_check_mr_access ( cmd . access_flags ) ;
if ( ret )
goto put_uobjs ;
}
if ( cmd . flags & IB_MR_REREG_PD ) {
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd . pd_handle , file - > ucontext ) ;
2014-07-31 12:01:28 +04:00
if ( ! pd ) {
ret = - EINVAL ;
goto put_uobjs ;
}
}
old_pd = mr - > pd ;
ret = mr - > device - > rereg_user_mr ( mr , cmd . flags , cmd . start ,
cmd . length , cmd . hca_va ,
cmd . access_flags , pd , & udata ) ;
if ( ! ret ) {
if ( cmd . flags & IB_MR_REREG_PD ) {
atomic_inc ( & pd - > usecnt ) ;
mr - > pd = pd ;
atomic_dec ( & old_pd - > usecnt ) ;
}
} else {
goto put_uobj_pd ;
}
memset ( & resp , 0 , sizeof ( resp ) ) ;
resp . lkey = mr - > lkey ;
resp . rkey = mr - > rkey ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof ( resp ) ) )
2014-07-31 12:01:28 +04:00
ret = - EFAULT ;
else
ret = in_len ;
put_uobj_pd :
if ( cmd . flags & IB_MR_REREG_PD )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2014-07-31 12:01:28 +04:00
put_uobjs :
2017-04-04 13:31:44 +03:00
uobj_put_write ( uobj ) ;
2014-07-31 12:01:28 +04:00
return ret ;
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_dereg_mr ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_dereg_mr cmd ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uobject * uobj ;
2005-07-08 04:57:13 +04:00
int ret = - EINVAL ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_MR , cmd . mr_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
return ret ? : in_len ;
2005-07-08 04:57:13 +04:00
}
2013-02-06 20:19:13 +04:00
ssize_t ib_uverbs_alloc_mw ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
const char __user * buf , int in_len ,
int out_len )
2013-02-06 20:19:13 +04:00
{
struct ib_uverbs_alloc_mw cmd ;
struct ib_uverbs_alloc_mw_resp resp ;
struct ib_uobject * uobj ;
struct ib_pd * pd ;
struct ib_mw * mw ;
2016-02-29 19:05:29 +03:00
struct ib_udata udata ;
2013-02-06 20:19:13 +04:00
int ret ;
if ( out_len < sizeof ( resp ) )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof ( cmd ) ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_MW , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2013-02-06 20:19:13 +04:00
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd . pd_handle , file - > ucontext ) ;
2013-02-06 20:19:13 +04:00
if ( ! pd ) {
ret = - EINVAL ;
goto err_free ;
}
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2016-02-29 19:05:29 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
mw = pd - > device - > alloc_mw ( pd , cmd . mw_type , & udata ) ;
2013-02-06 20:19:13 +04:00
if ( IS_ERR ( mw ) ) {
ret = PTR_ERR ( mw ) ;
goto err_put ;
}
mw - > device = pd - > device ;
mw - > pd = pd ;
mw - > uobject = uobj ;
atomic_inc ( & pd - > usecnt ) ;
uobj - > object = mw ;
memset ( & resp , 0 , sizeof ( resp ) ) ;
resp . rkey = mw - > rkey ;
resp . mw_handle = uobj - > id ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof ( resp ) ) ) {
2013-02-06 20:19:13 +04:00
ret = - EFAULT ;
goto err_copy ;
}
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
uobj_alloc_commit ( uobj ) ;
2013-02-06 20:19:13 +04:00
return in_len ;
err_copy :
2015-12-23 21:12:48 +03:00
uverbs_dealloc_mw ( mw ) ;
2013-02-06 20:19:13 +04:00
err_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2013-02-06 20:19:13 +04:00
err_free :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2013-02-06 20:19:13 +04:00
return ret ;
}
ssize_t ib_uverbs_dealloc_mw ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
const char __user * buf , int in_len ,
int out_len )
2013-02-06 20:19:13 +04:00
{
struct ib_uverbs_dealloc_mw cmd ;
struct ib_uobject * uobj ;
int ret = - EINVAL ;
if ( copy_from_user ( & cmd , buf , sizeof ( cmd ) ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_MW , cmd . mw_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2013-02-06 20:19:13 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
return ret ? : in_len ;
2013-02-06 20:19:13 +04:00
}
2005-09-27 00:53:25 +04:00
ssize_t ib_uverbs_create_comp_channel ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-09-27 00:53:25 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_create_comp_channel cmd ;
struct ib_uverbs_create_comp_channel_resp resp ;
2017-04-04 13:31:47 +03:00
struct ib_uobject * uobj ;
struct ib_uverbs_completion_event_file * ev_file ;
2005-09-27 00:53:25 +04:00
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_COMP_CHANNEL , file - > ucontext ) ;
2017-04-04 13:31:47 +03:00
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2010-01-18 09:38:00 +03:00
2017-04-04 13:31:47 +03:00
resp . fd = uobj - > id ;
ev_file = container_of ( uobj , struct ib_uverbs_completion_event_file ,
uobj_file . uobj ) ;
2017-04-18 12:03:42 +03:00
ib_uverbs_init_event_queue ( & ev_file - > ev_queue ) ;
2005-09-27 00:53:25 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2017-04-04 13:31:47 +03:00
uobj_alloc_abort ( uobj ) ;
2005-09-27 00:53:25 +04:00
return - EFAULT ;
}
2017-04-04 13:31:47 +03:00
uobj_alloc_commit ( uobj ) ;
2005-09-27 00:53:25 +04:00
return in_len ;
}
2015-06-11 16:35:23 +03:00
static struct ib_ucq_object * create_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2015-06-11 16:35:23 +03:00
struct ib_udata * ucore ,
struct ib_udata * uhw ,
struct ib_uverbs_ex_create_cq * cmd ,
size_t cmd_sz ,
int ( * cb ) ( struct ib_uverbs_file * file ,
struct ib_ucq_object * obj ,
struct ib_uverbs_ex_create_cq_resp * resp ,
struct ib_udata * udata ,
void * context ) ,
void * context )
2005-07-08 04:57:13 +04:00
{
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_ucq_object * obj ;
2017-04-04 13:31:47 +03:00
struct ib_uverbs_completion_event_file * ev_file = NULL ;
2005-07-08 04:57:13 +04:00
struct ib_cq * cq ;
int ret ;
2015-06-11 16:35:23 +03:00
struct ib_uverbs_ex_create_cq_resp resp ;
2015-06-11 16:35:20 +03:00
struct ib_cq_init_attr attr = { } ;
2005-07-08 04:57:13 +04:00
2018-02-14 15:38:43 +03:00
if ( ! ib_dev - > create_cq )
return ERR_PTR ( - EOPNOTSUPP ) ;
2015-06-11 16:35:23 +03:00
if ( cmd - > comp_vector > = file - > device - > num_comp_vectors )
return ERR_PTR ( - EINVAL ) ;
2005-07-08 04:57:13 +04:00
2018-03-19 16:02:33 +03:00
obj = ( struct ib_ucq_object * ) uobj_alloc ( UVERBS_OBJECT_CQ ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) )
return obj ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2015-06-11 16:35:23 +03:00
if ( cmd - > comp_channel > = 0 ) {
2017-04-04 13:31:47 +03:00
ev_file = ib_uverbs_lookup_comp_file ( cmd - > comp_channel ,
file - > ucontext ) ;
if ( IS_ERR ( ev_file ) ) {
ret = PTR_ERR ( ev_file ) ;
2006-01-07 03:43:14 +03:00
goto err ;
}
}
2017-04-04 13:31:44 +03:00
obj - > uobject . user_handle = cmd - > user_handle ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj - > uverbs_file = file ;
obj - > comp_events_reported = 0 ;
obj - > async_events_reported = 0 ;
INIT_LIST_HEAD ( & obj - > comp_list ) ;
INIT_LIST_HEAD ( & obj - > async_list ) ;
2005-07-08 04:57:13 +04:00
2015-06-11 16:35:23 +03:00
attr . cqe = cmd - > cqe ;
attr . comp_vector = cmd - > comp_vector ;
if ( cmd_sz > offsetof ( typeof ( * cmd ) , flags ) + sizeof ( cmd - > flags ) )
attr . flags = cmd - > flags ;
2017-04-04 13:31:44 +03:00
cq = ib_dev - > create_cq ( ib_dev , & attr , file - > ucontext , uhw ) ;
2005-07-08 04:57:13 +04:00
if ( IS_ERR ( cq ) ) {
ret = PTR_ERR ( cq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_file ;
2005-07-08 04:57:13 +04:00
}
2015-08-13 18:32:04 +03:00
cq - > device = ib_dev ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
cq - > uobject = & obj - > uobject ;
2005-07-08 04:57:13 +04:00
cq - > comp_handler = ib_uverbs_comp_handler ;
cq - > event_handler = ib_uverbs_cq_event_handler ;
2017-08-01 08:28:35 +03:00
cq - > cq_context = ev_file ? & ev_file - > ev_queue : NULL ;
2005-07-08 04:57:13 +04:00
atomic_set ( & cq - > usecnt , 0 ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj - > uobject . object = cq ;
2005-07-08 04:57:13 +04:00
memset ( & resp , 0 , sizeof resp ) ;
2015-06-11 16:35:23 +03:00
resp . base . cq_handle = obj - > uobject . id ;
resp . base . cqe = cq - > cqe ;
2005-07-08 04:57:13 +04:00
2015-06-11 16:35:23 +03:00
resp . response_length = offsetof ( typeof ( resp ) , response_length ) +
sizeof ( resp . response_length ) ;
2018-02-14 13:35:37 +03:00
cq - > res . type = RDMA_RESTRACK_CQ ;
rdma_restrack_add ( & cq - > res ) ;
2015-06-11 16:35:23 +03:00
ret = cb ( file , obj , & resp , ucore , context ) ;
if ( ret )
goto err_cb ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( & obj - > uobject ) ;
2015-06-11 16:35:23 +03:00
return obj ;
2005-09-28 02:07:25 +04:00
2015-06-11 16:35:23 +03:00
err_cb :
2005-07-08 04:57:13 +04:00
ib_destroy_cq ( cq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_file :
2006-01-07 03:43:14 +03:00
if ( ev_file )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
ib_uverbs_release_ucq ( file , ev_file , obj ) ;
err :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uobject ) ;
2015-06-11 16:35:23 +03:00
return ERR_PTR ( ret ) ;
}
static int ib_uverbs_create_cq_cb ( struct ib_uverbs_file * file ,
struct ib_ucq_object * obj ,
struct ib_uverbs_ex_create_cq_resp * resp ,
struct ib_udata * ucore , void * context )
{
if ( ib_copy_to_udata ( ucore , & resp - > base , sizeof ( resp - > base ) ) )
return - EFAULT ;
return 0 ;
}
ssize_t ib_uverbs_create_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2015-06-11 16:35:23 +03:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_create_cq cmd ;
struct ib_uverbs_ex_create_cq cmd_ex ;
struct ib_uverbs_create_cq_resp resp ;
struct ib_udata ucore ;
struct ib_udata uhw ;
struct ib_ucq_object * obj ;
if ( out_len < sizeof ( resp ) )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof ( cmd ) ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & ucore , buf , u64_to_user_ptr ( cmd . response ) ,
sizeof ( cmd ) , sizeof ( resp ) ) ;
2015-06-11 16:35:23 +03:00
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & uhw , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2015-06-11 16:35:23 +03:00
memset ( & cmd_ex , 0 , sizeof ( cmd_ex ) ) ;
cmd_ex . user_handle = cmd . user_handle ;
cmd_ex . cqe = cmd . cqe ;
cmd_ex . comp_vector = cmd . comp_vector ;
cmd_ex . comp_channel = cmd . comp_channel ;
2015-08-13 18:32:04 +03:00
obj = create_cq ( file , ib_dev , & ucore , & uhw , & cmd_ex ,
2015-06-11 16:35:23 +03:00
offsetof ( typeof ( cmd_ex ) , comp_channel ) +
sizeof ( cmd . comp_channel ) , ib_uverbs_create_cq_cb ,
NULL ) ;
if ( IS_ERR ( obj ) )
return PTR_ERR ( obj ) ;
return in_len ;
}
static int ib_uverbs_ex_create_cq_cb ( struct ib_uverbs_file * file ,
struct ib_ucq_object * obj ,
struct ib_uverbs_ex_create_cq_resp * resp ,
struct ib_udata * ucore , void * context )
{
if ( ib_copy_to_udata ( ucore , resp , resp - > response_length ) )
return - EFAULT ;
return 0 ;
}
int ib_uverbs_ex_create_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2015-06-11 16:35:23 +03:00
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_create_cq_resp resp ;
struct ib_uverbs_ex_create_cq cmd ;
struct ib_ucq_object * obj ;
int err ;
if ( ucore - > inlen < sizeof ( cmd ) )
return - EINVAL ;
err = ib_copy_from_udata ( & cmd , ucore , sizeof ( cmd ) ) ;
if ( err )
return err ;
if ( cmd . comp_mask )
return - EINVAL ;
if ( cmd . reserved )
return - EINVAL ;
if ( ucore - > outlen < ( offsetof ( typeof ( resp ) , response_length ) +
sizeof ( resp . response_length ) ) )
return - ENOSPC ;
2015-08-13 18:32:04 +03:00
obj = create_cq ( file , ib_dev , ucore , uhw , & cmd ,
2015-06-11 16:35:23 +03:00
min ( ucore - > inlen , sizeof ( cmd ) ) ,
ib_uverbs_ex_create_cq_cb , NULL ) ;
2017-11-28 18:18:07 +03:00
return PTR_ERR_OR_ZERO ( obj ) ;
2005-07-08 04:57:13 +04:00
}
2006-01-31 01:29:21 +03:00
ssize_t ib_uverbs_resize_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-01-31 01:29:21 +03:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_resize_cq cmd ;
2017-08-01 09:41:35 +03:00
struct ib_uverbs_resize_cq_resp resp = { } ;
2006-01-31 01:29:21 +03:00
struct ib_udata udata ;
struct ib_cq * cq ;
int ret = - EINVAL ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2006-01-31 01:29:21 +03:00
2018-03-19 16:02:33 +03:00
cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd . cq_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! cq )
return - EINVAL ;
2006-01-31 01:29:21 +03:00
ret = cq - > device - > resize_cq ( cq , cmd . cqe , & udata ) ;
if ( ret )
goto out ;
resp . cqe = cq - > cqe ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp . cqe ) )
2006-01-31 01:29:21 +03:00
ret = - EFAULT ;
out :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( cq ) ;
2006-01-31 01:29:21 +03:00
return ret ? ret : in_len ;
}
2017-06-08 20:37:49 +03:00
static int copy_wc_to_user ( struct ib_device * ib_dev , void __user * dest ,
struct ib_wc * wc )
2010-10-13 13:13:12 +04:00
{
struct ib_uverbs_wc tmp ;
tmp . wr_id = wc - > wr_id ;
tmp . status = wc - > status ;
tmp . opcode = wc - > opcode ;
tmp . vendor_err = wc - > vendor_err ;
tmp . byte_len = wc - > byte_len ;
2018-01-12 00:43:05 +03:00
tmp . ex . imm_data = wc - > ex . imm_data ;
2010-10-13 13:13:12 +04:00
tmp . qp_num = wc - > qp - > qp_num ;
tmp . src_qp = wc - > src_qp ;
tmp . wc_flags = wc - > wc_flags ;
tmp . pkey_index = wc - > pkey_index ;
2017-06-08 20:37:49 +03:00
if ( rdma_cap_opa_ah ( ib_dev , wc - > port_num ) )
2017-08-14 21:17:43 +03:00
tmp . slid = OPA_TO_IB_UCAST_LID ( wc - > slid ) ;
2017-06-08 20:37:49 +03:00
else
2017-08-14 21:17:43 +03:00
tmp . slid = ib_lid_cpu16 ( wc - > slid ) ;
2010-10-13 13:13:12 +04:00
tmp . sl = wc - > sl ;
tmp . dlid_path_bits = wc - > dlid_path_bits ;
tmp . port_num = wc - > port_num ;
tmp . reserved = 0 ;
if ( copy_to_user ( dest , & tmp , sizeof tmp ) )
return - EFAULT ;
return 0 ;
}
2005-10-15 02:26:04 +04:00
ssize_t ib_uverbs_poll_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-10-15 02:26:04 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_poll_cq cmd ;
2010-10-13 13:13:12 +04:00
struct ib_uverbs_poll_cq_resp resp ;
u8 __user * header_ptr ;
u8 __user * data_ptr ;
2005-10-15 02:26:04 +04:00
struct ib_cq * cq ;
2010-10-13 13:13:12 +04:00
struct ib_wc wc ;
int ret ;
2005-10-15 02:26:04 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd . cq_handle , file - > ucontext ) ;
2010-10-13 13:13:12 +04:00
if ( ! cq )
return - EINVAL ;
2005-10-15 02:26:04 +04:00
2010-10-13 13:13:12 +04:00
/* we copy a struct ib_uverbs_poll_cq_resp to user space */
2017-09-07 00:34:26 +03:00
header_ptr = u64_to_user_ptr ( cmd . response ) ;
2010-10-13 13:13:12 +04:00
data_ptr = header_ptr + sizeof resp ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2010-10-13 13:13:12 +04:00
memset ( & resp , 0 , sizeof resp ) ;
while ( resp . count < cmd . ne ) {
ret = ib_poll_cq ( cq , 1 , & wc ) ;
if ( ret < 0 )
goto out_put ;
if ( ! ret )
break ;
2017-06-08 20:37:49 +03:00
ret = copy_wc_to_user ( ib_dev , data_ptr , & wc ) ;
2010-10-13 13:13:12 +04:00
if ( ret )
goto out_put ;
data_ptr + = sizeof ( struct ib_uverbs_wc ) ;
+ + resp . count ;
2005-10-15 02:26:04 +04:00
}
2010-10-13 13:13:12 +04:00
if ( copy_to_user ( header_ptr , & resp , sizeof resp ) ) {
2005-10-15 02:26:04 +04:00
ret = - EFAULT ;
2010-10-13 13:13:12 +04:00
goto out_put ;
}
2005-10-15 02:26:04 +04:00
2010-10-13 13:13:12 +04:00
ret = in_len ;
2005-10-15 02:26:04 +04:00
2010-10-13 13:13:12 +04:00
out_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( cq ) ;
2010-10-13 13:13:12 +04:00
return ret ;
2005-10-15 02:26:04 +04:00
}
ssize_t ib_uverbs_req_notify_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-10-15 02:26:04 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_req_notify_cq cmd ;
struct ib_cq * cq ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd . cq_handle , file - > ucontext ) ;
2006-09-23 02:17:19 +04:00
if ( ! cq )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return - EINVAL ;
2005-10-15 02:26:04 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
ib_req_notify_cq ( cq , cmd . solicited_only ?
IB_CQ_SOLICITED : IB_CQ_NEXT_COMP ) ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( cq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return in_len ;
2005-10-15 02:26:04 +04:00
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_destroy_cq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
2005-09-10 02:55:08 +04:00
struct ib_uverbs_destroy_cq cmd ;
struct ib_uverbs_destroy_cq_resp resp ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uobject * uobj ;
2005-09-10 02:55:08 +04:00
struct ib_cq * cq ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_ucq_object * obj ;
2005-09-10 02:55:08 +04:00
int ret = - EINVAL ;
2005-07-08 04:57:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_CQ , cmd . cq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
/*
* Make sure we don ' t free the memory in remove_commit as we still
* needs the uobject memory to create the response .
*/
uverbs_uobject_get ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
cq = uobj - > object ;
obj = container_of ( cq - > uobject , struct ib_ucq_object , uobject ) ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
memset ( & resp , 0 , sizeof ( resp ) ) ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
if ( ret ) {
uverbs_uobject_put ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return ret ;
2017-04-04 13:31:44 +03:00
}
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
resp . comp_events_reported = obj - > comp_events_reported ;
resp . async_events_reported = obj - > async_events_reported ;
2005-09-10 02:55:08 +04:00
2017-04-04 13:31:44 +03:00
uverbs_uobject_put ( uobj ) ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return - EFAULT ;
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return in_len ;
2005-07-08 04:57:13 +04:00
}
2015-10-21 17:00:42 +03:00
static int create_qp ( struct ib_uverbs_file * file ,
struct ib_udata * ucore ,
struct ib_udata * uhw ,
struct ib_uverbs_ex_create_qp * cmd ,
size_t cmd_sz ,
int ( * cb ) ( struct ib_uverbs_file * file ,
struct ib_uverbs_ex_create_qp_resp * resp ,
struct ib_udata * udata ) ,
void * context )
2005-07-08 04:57:13 +04:00
{
2015-10-21 17:00:42 +03:00
struct ib_uqp_object * obj ;
struct ib_device * device ;
struct ib_pd * pd = NULL ;
struct ib_xrcd * xrcd = NULL ;
2017-04-04 13:31:44 +03:00
struct ib_uobject * xrcd_uobj = ERR_PTR ( - ENOENT ) ;
2015-10-21 17:00:42 +03:00
struct ib_cq * scq = NULL , * rcq = NULL ;
struct ib_srq * srq = NULL ;
struct ib_qp * qp ;
char * buf ;
2016-05-23 15:20:55 +03:00
struct ib_qp_init_attr attr = { } ;
2015-10-21 17:00:42 +03:00
struct ib_uverbs_ex_create_qp_resp resp ;
int ret ;
2016-05-23 15:20:55 +03:00
struct ib_rwq_ind_table * ind_tbl = NULL ;
bool has_sq = true ;
2015-10-21 17:00:42 +03:00
if ( cmd - > qp_type = = IB_QPT_RAW_PACKET & & ! capable ( CAP_NET_RAW ) )
2012-03-01 14:17:51 +04:00
return - EPERM ;
2018-03-19 16:02:33 +03:00
obj = ( struct ib_uqp_object * ) uobj_alloc ( UVERBS_OBJECT_QP ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) )
return PTR_ERR ( obj ) ;
obj - > uxrcd = NULL ;
obj - > uevent . uobject . user_handle = cmd - > user_handle ;
2017-04-04 13:31:45 +03:00
mutex_init ( & obj - > mcast_lock ) ;
2005-07-08 04:57:13 +04:00
2016-05-23 15:20:55 +03:00
if ( cmd_sz > = offsetof ( typeof ( * cmd ) , rwq_ind_tbl_handle ) +
sizeof ( cmd - > rwq_ind_tbl_handle ) & &
( cmd - > comp_mask & IB_UVERBS_CREATE_QP_MASK_IND_TABLE ) ) {
2018-03-19 16:02:33 +03:00
ind_tbl = uobj_get_obj_read ( rwq_ind_table , UVERBS_OBJECT_RWQ_IND_TBL ,
2017-04-04 13:31:44 +03:00
cmd - > rwq_ind_tbl_handle ,
file - > ucontext ) ;
2016-05-23 15:20:55 +03:00
if ( ! ind_tbl ) {
ret = - EINVAL ;
goto err_put ;
}
attr . rwq_ind_tbl = ind_tbl ;
}
2017-06-08 16:15:07 +03:00
if ( cmd_sz > sizeof ( * cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( * cmd ) ,
cmd_sz - sizeof ( * cmd ) ) ) {
2016-05-23 15:20:55 +03:00
ret = - EOPNOTSUPP ;
goto err_put ;
}
if ( ind_tbl & & ( cmd - > max_recv_wr | | cmd - > max_recv_sge | | cmd - > is_srq ) ) {
ret = - EINVAL ;
goto err_put ;
}
if ( ind_tbl & & ! cmd - > max_send_wr )
has_sq = false ;
2005-07-08 04:57:13 +04:00
2015-10-21 17:00:42 +03:00
if ( cmd - > qp_type = = IB_QPT_XRC_TGT ) {
2018-03-19 16:02:33 +03:00
xrcd_uobj = uobj_get_read ( UVERBS_OBJECT_XRCD , cmd - > pd_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( xrcd_uobj ) ) {
ret = - EINVAL ;
goto err_put ;
}
xrcd = ( struct ib_xrcd * ) xrcd_uobj - > object ;
2011-05-27 11:00:12 +04:00
if ( ! xrcd ) {
ret = - EINVAL ;
goto err_put ;
}
device = xrcd - > device ;
2011-05-26 19:17:04 +04:00
} else {
2015-10-21 17:00:42 +03:00
if ( cmd - > qp_type = = IB_QPT_XRC_INI ) {
cmd - > max_recv_wr = 0 ;
cmd - > max_recv_sge = 0 ;
2011-05-27 11:00:12 +04:00
} else {
2015-10-21 17:00:42 +03:00
if ( cmd - > is_srq ) {
2018-03-19 16:02:33 +03:00
srq = uobj_get_obj_read ( srq , UVERBS_OBJECT_SRQ , cmd - > srq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2017-08-17 15:52:07 +03:00
if ( ! srq | | srq - > srq_type = = IB_SRQT_XRC ) {
2011-05-27 11:00:12 +04:00
ret = - EINVAL ;
goto err_put ;
}
}
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
2016-05-23 15:20:55 +03:00
if ( ! ind_tbl ) {
if ( cmd - > recv_cq_handle ! = cmd - > send_cq_handle ) {
2018-03-19 16:02:33 +03:00
rcq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd - > recv_cq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2016-05-23 15:20:55 +03:00
if ( ! rcq ) {
ret = - EINVAL ;
goto err_put ;
}
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
}
2011-05-26 19:17:04 +04:00
}
}
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
2016-05-23 15:20:55 +03:00
if ( has_sq )
2018-03-19 16:02:33 +03:00
scq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd - > send_cq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2016-05-23 15:20:55 +03:00
if ( ! ind_tbl )
rcq = rcq ? : scq ;
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd - > pd_handle , file - > ucontext ) ;
2016-05-23 15:20:55 +03:00
if ( ! pd | | ( ! scq & & has_sq ) ) {
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
ret = - EINVAL ;
goto err_put ;
}
2011-05-27 11:00:12 +04:00
device = pd - > device ;
2011-05-26 19:17:04 +04:00
}
2005-07-08 04:57:13 +04:00
attr . event_handler = ib_uverbs_qp_event_handler ;
attr . qp_context = file ;
attr . send_cq = scq ;
attr . recv_cq = rcq ;
2005-08-18 23:24:13 +04:00
attr . srq = srq ;
2011-05-27 11:00:12 +04:00
attr . xrcd = xrcd ;
2015-10-21 17:00:42 +03:00
attr . sq_sig_type = cmd - > sq_sig_all ? IB_SIGNAL_ALL_WR :
IB_SIGNAL_REQ_WR ;
attr . qp_type = cmd - > qp_type ;
2008-04-17 08:09:27 +04:00
attr . create_flags = 0 ;
2005-07-08 04:57:13 +04:00
2015-10-21 17:00:42 +03:00
attr . cap . max_send_wr = cmd - > max_send_wr ;
attr . cap . max_recv_wr = cmd - > max_recv_wr ;
attr . cap . max_send_sge = cmd - > max_send_sge ;
attr . cap . max_recv_sge = cmd - > max_recv_sge ;
attr . cap . max_inline_data = cmd - > max_inline_data ;
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj - > uevent . events_reported = 0 ;
INIT_LIST_HEAD ( & obj - > uevent . event_list ) ;
INIT_LIST_HEAD ( & obj - > mcast_list ) ;
2005-07-08 04:57:13 +04:00
2015-10-21 17:00:42 +03:00
if ( cmd_sz > = offsetof ( typeof ( * cmd ) , create_flags ) +
sizeof ( cmd - > create_flags ) )
attr . create_flags = cmd - > create_flags ;
2015-12-20 13:16:10 +03:00
if ( attr . create_flags & ~ ( IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
IB_QP_CREATE_CROSS_CHANNEL |
IB_QP_CREATE_MANAGED_SEND |
2016-04-17 17:19:36 +03:00
IB_QP_CREATE_MANAGED_RECV |
2017-01-18 16:40:00 +03:00
IB_QP_CREATE_SCATTER_FCS |
2017-06-08 16:15:07 +03:00
IB_QP_CREATE_CVLAN_STRIPPING |
2017-10-29 14:59:44 +03:00
IB_QP_CREATE_SOURCE_QPN |
IB_QP_CREATE_PCI_WRITE_END_PADDING ) ) {
2015-10-21 17:00:42 +03:00
ret = - EINVAL ;
goto err_put ;
}
2017-06-08 16:15:07 +03:00
if ( attr . create_flags & IB_QP_CREATE_SOURCE_QPN ) {
if ( ! capable ( CAP_NET_RAW ) ) {
ret = - EPERM ;
goto err_put ;
}
attr . source_qpn = cmd - > source_qpn ;
}
2015-10-21 17:00:42 +03:00
buf = ( void * ) cmd + sizeof ( * cmd ) ;
if ( cmd_sz > sizeof ( * cmd ) )
if ( ! ( buf [ 0 ] = = 0 & & ! memcmp ( buf , buf + 1 ,
cmd_sz - sizeof ( * cmd ) - 1 ) ) ) {
ret = - EINVAL ;
goto err_put ;
}
if ( cmd - > qp_type = = IB_QPT_XRC_TGT )
2011-05-27 11:00:12 +04:00
qp = ib_create_qp ( pd , & attr ) ;
else
2018-02-15 05:43:36 +03:00
qp = _ib_create_qp ( device , pd , & attr , uhw ,
& obj - > uevent . uobject ) ;
2011-05-27 11:00:12 +04:00
2005-07-08 04:57:13 +04:00
if ( IS_ERR ( qp ) ) {
ret = PTR_ERR ( qp ) ;
2017-04-04 13:31:44 +03:00
goto err_put ;
2005-07-08 04:57:13 +04:00
}
2015-10-21 17:00:42 +03:00
if ( cmd - > qp_type ! = IB_QPT_XRC_TGT ) {
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 15:48:52 +03:00
ret = ib_create_qp_security ( qp , device ) ;
if ( ret )
goto err_cb ;
2011-08-09 02:31:51 +04:00
qp - > real_qp = qp ;
2011-05-27 11:00:12 +04:00
qp - > pd = pd ;
qp - > send_cq = attr . send_cq ;
qp - > recv_cq = attr . recv_cq ;
qp - > srq = attr . srq ;
2016-05-23 15:20:55 +03:00
qp - > rwq_ind_tbl = ind_tbl ;
2011-05-27 11:00:12 +04:00
qp - > event_handler = attr . event_handler ;
qp - > qp_context = attr . qp_context ;
qp - > qp_type = attr . qp_type ;
2012-01-20 22:43:54 +04:00
atomic_set ( & qp - > usecnt , 0 ) ;
2011-05-27 11:00:12 +04:00
atomic_inc ( & pd - > usecnt ) ;
2017-08-23 08:35:40 +03:00
qp - > port = 0 ;
2016-05-23 15:20:55 +03:00
if ( attr . send_cq )
atomic_inc ( & attr . send_cq - > usecnt ) ;
2011-05-27 11:00:12 +04:00
if ( attr . recv_cq )
atomic_inc ( & attr . recv_cq - > usecnt ) ;
if ( attr . srq )
atomic_inc ( & attr . srq - > usecnt ) ;
2016-05-23 15:20:55 +03:00
if ( ind_tbl )
atomic_inc ( & ind_tbl - > usecnt ) ;
2018-02-21 11:25:01 +03:00
} else {
/* It is done in _ib_create_qp for other QP types */
qp - > uobject = & obj - > uevent . uobject ;
2011-05-27 11:00:12 +04:00
}
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj - > uevent . uobject . object = qp ;
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
memset ( & resp , 0 , sizeof resp ) ;
2015-10-21 17:00:42 +03:00
resp . base . qpn = qp - > qp_num ;
resp . base . qp_handle = obj - > uevent . uobject . id ;
resp . base . max_recv_sge = attr . cap . max_recv_sge ;
resp . base . max_send_sge = attr . cap . max_send_sge ;
resp . base . max_recv_wr = attr . cap . max_recv_wr ;
resp . base . max_send_wr = attr . cap . max_send_wr ;
resp . base . max_inline_data = attr . cap . max_inline_data ;
2005-07-08 04:57:13 +04:00
2015-10-21 17:00:42 +03:00
resp . response_length = offsetof ( typeof ( resp ) , response_length ) +
sizeof ( resp . response_length ) ;
ret = cb ( file , & resp , ucore ) ;
if ( ret )
goto err_cb ;
2005-07-08 04:57:13 +04:00
2013-08-01 19:49:54 +04:00
if ( xrcd ) {
obj - > uxrcd = container_of ( xrcd_uobj , struct ib_uxrcd_object ,
uobject ) ;
atomic_inc ( & obj - > uxrcd - > refcnt ) ;
2017-04-04 13:31:44 +03:00
uobj_put_read ( xrcd_uobj ) ;
2013-08-01 19:49:54 +04:00
}
2011-05-27 11:00:12 +04:00
if ( pd )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2011-05-27 11:00:12 +04:00
if ( scq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( scq ) ;
2011-05-26 19:17:04 +04:00
if ( rcq & & rcq ! = scq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( rcq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( srq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( srq ) ;
2016-05-23 15:20:55 +03:00
if ( ind_tbl )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( ind_tbl ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( & obj - > uevent . uobject ) ;
2005-07-08 04:57:13 +04:00
2015-10-21 17:00:42 +03:00
return 0 ;
err_cb :
2005-07-08 04:57:13 +04:00
ib_destroy_qp ( qp ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_put :
2017-04-04 13:31:44 +03:00
if ( ! IS_ERR ( xrcd_uobj ) )
uobj_put_read ( xrcd_uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( pd )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( scq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( scq ) ;
2006-07-24 02:16:04 +04:00
if ( rcq & & rcq ! = scq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( rcq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( srq )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( srq ) ;
2016-05-23 15:20:55 +03:00
if ( ind_tbl )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( ind_tbl ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uevent . uobject ) ;
2005-07-08 04:57:13 +04:00
return ret ;
}
2015-10-21 17:00:42 +03:00
static int ib_uverbs_create_qp_cb ( struct ib_uverbs_file * file ,
struct ib_uverbs_ex_create_qp_resp * resp ,
struct ib_udata * ucore )
{
if ( ib_copy_to_udata ( ucore , & resp - > base , sizeof ( resp - > base ) ) )
return - EFAULT ;
return 0 ;
}
ssize_t ib_uverbs_create_qp ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_create_qp cmd ;
struct ib_uverbs_ex_create_qp cmd_ex ;
struct ib_udata ucore ;
struct ib_udata uhw ;
ssize_t resp_size = sizeof ( struct ib_uverbs_create_qp_resp ) ;
int err ;
if ( out_len < resp_size )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof ( cmd ) ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & ucore , buf , u64_to_user_ptr ( cmd . response ) ,
sizeof ( cmd ) , resp_size ) ;
ib_uverbs_init_udata ( & uhw , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + resp_size ,
2016-02-14 19:35:52 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - resp_size ) ;
2015-10-21 17:00:42 +03:00
memset ( & cmd_ex , 0 , sizeof ( cmd_ex ) ) ;
cmd_ex . user_handle = cmd . user_handle ;
cmd_ex . pd_handle = cmd . pd_handle ;
cmd_ex . send_cq_handle = cmd . send_cq_handle ;
cmd_ex . recv_cq_handle = cmd . recv_cq_handle ;
cmd_ex . srq_handle = cmd . srq_handle ;
cmd_ex . max_send_wr = cmd . max_send_wr ;
cmd_ex . max_recv_wr = cmd . max_recv_wr ;
cmd_ex . max_send_sge = cmd . max_send_sge ;
cmd_ex . max_recv_sge = cmd . max_recv_sge ;
cmd_ex . max_inline_data = cmd . max_inline_data ;
cmd_ex . sq_sig_all = cmd . sq_sig_all ;
cmd_ex . qp_type = cmd . qp_type ;
cmd_ex . is_srq = cmd . is_srq ;
err = create_qp ( file , & ucore , & uhw , & cmd_ex ,
offsetof ( typeof ( cmd_ex ) , is_srq ) +
sizeof ( cmd . is_srq ) , ib_uverbs_create_qp_cb ,
NULL ) ;
if ( err )
return err ;
return in_len ;
}
static int ib_uverbs_ex_create_qp_cb ( struct ib_uverbs_file * file ,
struct ib_uverbs_ex_create_qp_resp * resp ,
struct ib_udata * ucore )
{
if ( ib_copy_to_udata ( ucore , resp , resp - > response_length ) )
return - EFAULT ;
return 0 ;
}
int ib_uverbs_ex_create_qp ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_create_qp_resp resp ;
struct ib_uverbs_ex_create_qp cmd = { 0 } ;
int err ;
if ( ucore - > inlen < ( offsetof ( typeof ( cmd ) , comp_mask ) +
sizeof ( cmd . comp_mask ) ) )
return - EINVAL ;
err = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( err )
return err ;
2016-05-23 15:20:55 +03:00
if ( cmd . comp_mask & ~ IB_UVERBS_CREATE_QP_SUP_COMP_MASK )
2015-10-21 17:00:42 +03:00
return - EINVAL ;
if ( cmd . reserved )
return - EINVAL ;
if ( ucore - > outlen < ( offsetof ( typeof ( resp ) , response_length ) +
sizeof ( resp . response_length ) ) )
return - ENOSPC ;
err = create_qp ( file , ucore , uhw , & cmd ,
min ( ucore - > inlen , sizeof ( cmd ) ) ,
ib_uverbs_ex_create_qp_cb , NULL ) ;
if ( err )
return err ;
return 0 ;
}
2011-08-12 00:57:43 +04:00
ssize_t ib_uverbs_open_qp ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-08-12 00:57:43 +04:00
const char __user * buf , int in_len , int out_len )
{
struct ib_uverbs_open_qp cmd ;
struct ib_uverbs_create_qp_resp resp ;
struct ib_udata udata ;
struct ib_uqp_object * obj ;
struct ib_xrcd * xrcd ;
struct ib_uobject * uninitialized_var ( xrcd_uobj ) ;
struct ib_qp * qp ;
struct ib_qp_open_attr attr ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2011-08-12 00:57:43 +04:00
2018-03-19 16:02:33 +03:00
obj = ( struct ib_uqp_object * ) uobj_alloc ( UVERBS_OBJECT_QP ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) )
return PTR_ERR ( obj ) ;
2011-08-12 00:57:43 +04:00
2018-03-19 16:02:33 +03:00
xrcd_uobj = uobj_get_read ( UVERBS_OBJECT_XRCD , cmd . pd_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( xrcd_uobj ) ) {
ret = - EINVAL ;
goto err_put ;
}
2011-08-12 00:57:43 +04:00
2017-04-04 13:31:44 +03:00
xrcd = ( struct ib_xrcd * ) xrcd_uobj - > object ;
2011-08-12 00:57:43 +04:00
if ( ! xrcd ) {
ret = - EINVAL ;
2017-04-04 13:31:44 +03:00
goto err_xrcd ;
2011-08-12 00:57:43 +04:00
}
attr . event_handler = ib_uverbs_qp_event_handler ;
attr . qp_context = file ;
attr . qp_num = cmd . qpn ;
attr . qp_type = cmd . qp_type ;
obj - > uevent . events_reported = 0 ;
INIT_LIST_HEAD ( & obj - > uevent . event_list ) ;
INIT_LIST_HEAD ( & obj - > mcast_list ) ;
qp = ib_open_qp ( xrcd , & attr ) ;
if ( IS_ERR ( qp ) ) {
ret = PTR_ERR ( qp ) ;
2017-04-04 13:31:44 +03:00
goto err_xrcd ;
2011-08-12 00:57:43 +04:00
}
obj - > uevent . uobject . object = qp ;
2017-04-04 13:31:44 +03:00
obj - > uevent . uobject . user_handle = cmd . user_handle ;
2011-08-12 00:57:43 +04:00
memset ( & resp , 0 , sizeof resp ) ;
resp . qpn = qp - > qp_num ;
resp . qp_handle = obj - > uevent . uobject . id ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2011-08-12 00:57:43 +04:00
ret = - EFAULT ;
2017-04-04 13:31:44 +03:00
goto err_destroy ;
2011-08-12 00:57:43 +04:00
}
2013-08-01 19:49:54 +04:00
obj - > uxrcd = container_of ( xrcd_uobj , struct ib_uxrcd_object , uobject ) ;
atomic_inc ( & obj - > uxrcd - > refcnt ) ;
2017-04-04 13:31:44 +03:00
qp - > uobject = & obj - > uevent . uobject ;
uobj_put_read ( xrcd_uobj ) ;
2011-08-12 00:57:43 +04:00
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( & obj - > uevent . uobject ) ;
2011-08-12 00:57:43 +04:00
return in_len ;
err_destroy :
ib_destroy_qp ( qp ) ;
2017-04-04 13:31:44 +03:00
err_xrcd :
uobj_put_read ( xrcd_uobj ) ;
2011-08-12 00:57:43 +04:00
err_put :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uevent . uobject ) ;
2011-08-12 00:57:43 +04:00
return ret ;
}
2017-08-17 15:50:33 +03:00
static void copy_ah_attr_to_uverbs ( struct ib_uverbs_qp_dest * uverb_attr ,
struct rdma_ah_attr * rdma_attr )
{
const struct ib_global_route * grh ;
uverb_attr - > dlid = rdma_ah_get_dlid ( rdma_attr ) ;
uverb_attr - > sl = rdma_ah_get_sl ( rdma_attr ) ;
uverb_attr - > src_path_bits = rdma_ah_get_path_bits ( rdma_attr ) ;
uverb_attr - > static_rate = rdma_ah_get_static_rate ( rdma_attr ) ;
uverb_attr - > is_global = ! ! ( rdma_ah_get_ah_flags ( rdma_attr ) &
IB_AH_GRH ) ;
if ( uverb_attr - > is_global ) {
grh = rdma_ah_read_grh ( rdma_attr ) ;
memcpy ( uverb_attr - > dgid , grh - > dgid . raw , 16 ) ;
uverb_attr - > flow_label = grh - > flow_label ;
uverb_attr - > sgid_index = grh - > sgid_index ;
uverb_attr - > hop_limit = grh - > hop_limit ;
uverb_attr - > traffic_class = grh - > traffic_class ;
}
uverb_attr - > port_num = rdma_ah_get_port_num ( rdma_attr ) ;
}
2006-02-14 03:31:25 +03:00
ssize_t ib_uverbs_query_qp ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-02-14 03:31:25 +03:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_query_qp cmd ;
struct ib_uverbs_query_qp_resp resp ;
struct ib_qp * qp ;
struct ib_qp_attr * attr ;
struct ib_qp_init_attr * init_attr ;
int ret ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
attr = kmalloc ( sizeof * attr , GFP_KERNEL ) ;
init_attr = kmalloc ( sizeof * init_attr , GFP_KERNEL ) ;
if ( ! attr | | ! init_attr ) {
ret = - ENOMEM ;
goto out ;
}
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp ) {
2006-02-14 03:31:25 +03:00
ret = - EINVAL ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out ;
}
ret = ib_query_qp ( qp , attr , cmd . attr_mask , init_attr ) ;
2006-02-14 03:31:25 +03:00
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2006-02-14 03:31:25 +03:00
if ( ret )
goto out ;
memset ( & resp , 0 , sizeof resp ) ;
resp . qp_state = attr - > qp_state ;
resp . cur_qp_state = attr - > cur_qp_state ;
resp . path_mtu = attr - > path_mtu ;
resp . path_mig_state = attr - > path_mig_state ;
resp . qkey = attr - > qkey ;
resp . rq_psn = attr - > rq_psn ;
resp . sq_psn = attr - > sq_psn ;
resp . dest_qp_num = attr - > dest_qp_num ;
resp . qp_access_flags = attr - > qp_access_flags ;
resp . pkey_index = attr - > pkey_index ;
resp . alt_pkey_index = attr - > alt_pkey_index ;
2006-10-25 14:54:20 +04:00
resp . sq_draining = attr - > sq_draining ;
2006-02-14 03:31:25 +03:00
resp . max_rd_atomic = attr - > max_rd_atomic ;
resp . max_dest_rd_atomic = attr - > max_dest_rd_atomic ;
resp . min_rnr_timer = attr - > min_rnr_timer ;
resp . port_num = attr - > port_num ;
resp . timeout = attr - > timeout ;
resp . retry_cnt = attr - > retry_cnt ;
resp . rnr_retry = attr - > rnr_retry ;
resp . alt_port_num = attr - > alt_port_num ;
resp . alt_timeout = attr - > alt_timeout ;
2017-08-17 15:50:33 +03:00
copy_ah_attr_to_uverbs ( & resp . dest , & attr - > ah_attr ) ;
copy_ah_attr_to_uverbs ( & resp . alt_dest , & attr - > alt_ah_attr ) ;
2006-02-14 03:31:25 +03:00
resp . max_send_wr = init_attr - > cap . max_send_wr ;
resp . max_recv_wr = init_attr - > cap . max_recv_wr ;
resp . max_send_sge = init_attr - > cap . max_send_sge ;
resp . max_recv_sge = init_attr - > cap . max_recv_sge ;
resp . max_inline_data = init_attr - > cap . max_inline_data ;
2006-03-02 22:25:27 +03:00
resp . sq_sig_all = init_attr - > sq_sig_type = = IB_SIGNAL_ALL_WR ;
2006-02-14 03:31:25 +03:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2006-02-14 03:31:25 +03:00
ret = - EFAULT ;
out :
kfree ( attr ) ;
kfree ( init_attr ) ;
return ret ? ret : in_len ;
}
2011-05-26 19:17:04 +04:00
/* Remove ignored fields set in the attribute mask */
static int modify_qp_mask ( enum ib_qp_type qp_type , int mask )
{
switch ( qp_type ) {
case IB_QPT_XRC_INI :
return mask & ~ ( IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER ) ;
2011-05-27 11:00:12 +04:00
case IB_QPT_XRC_TGT :
return mask & ~ ( IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
IB_QP_RNR_RETRY ) ;
2011-05-26 19:17:04 +04:00
default :
return mask ;
}
}
2017-08-17 15:50:33 +03:00
static void copy_ah_attr_from_uverbs ( struct ib_device * dev ,
struct rdma_ah_attr * rdma_attr ,
struct ib_uverbs_qp_dest * uverb_attr )
{
rdma_attr - > type = rdma_ah_find_type ( dev , uverb_attr - > port_num ) ;
if ( uverb_attr - > is_global ) {
rdma_ah_set_grh ( rdma_attr , NULL ,
uverb_attr - > flow_label ,
uverb_attr - > sgid_index ,
uverb_attr - > hop_limit ,
uverb_attr - > traffic_class ) ;
rdma_ah_set_dgid_raw ( rdma_attr , uverb_attr - > dgid ) ;
} else {
rdma_ah_set_ah_flags ( rdma_attr , 0 ) ;
}
rdma_ah_set_dlid ( rdma_attr , uverb_attr - > dlid ) ;
rdma_ah_set_sl ( rdma_attr , uverb_attr - > sl ) ;
rdma_ah_set_path_bits ( rdma_attr , uverb_attr - > src_path_bits ) ;
rdma_ah_set_static_rate ( rdma_attr , uverb_attr - > static_rate ) ;
rdma_ah_set_port_num ( rdma_attr , uverb_attr - > port_num ) ;
rdma_ah_set_make_grd ( rdma_attr , false ) ;
}
2016-12-01 14:43:15 +03:00
static int modify_qp ( struct ib_uverbs_file * file ,
struct ib_uverbs_ex_modify_qp * cmd , struct ib_udata * udata )
2005-07-08 04:57:13 +04:00
{
2016-12-01 14:43:15 +03:00
struct ib_qp_attr * attr ;
struct ib_qp * qp ;
int ret ;
2006-08-12 01:58:09 +04:00
2005-07-08 04:57:13 +04:00
attr = kmalloc ( sizeof * attr , GFP_KERNEL ) ;
if ( ! attr )
return - ENOMEM ;
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd - > base . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp ) {
2005-07-08 04:57:13 +04:00
ret = - EINVAL ;
goto out ;
}
2017-07-14 17:41:30 +03:00
if ( ( cmd - > base . attr_mask & IB_QP_PORT ) & &
! rdma_is_port_valid ( qp - > device , cmd - > base . port_num ) ) {
2017-06-27 15:09:13 +03:00
ret = - EINVAL ;
goto release_qp ;
}
2018-02-14 13:35:40 +03:00
if ( ( cmd - > base . attr_mask & IB_QP_AV ) & &
! rdma_is_port_valid ( qp - > device , cmd - > base . dest . port_num ) ) {
ret = - EINVAL ;
goto release_qp ;
}
2017-12-05 23:30:01 +03:00
if ( ( cmd - > base . attr_mask & IB_QP_ALT_PATH ) & &
2018-02-14 13:35:40 +03:00
( ! rdma_is_port_valid ( qp - > device , cmd - > base . alt_port_num ) | |
! rdma_is_port_valid ( qp - > device , cmd - > base . alt_dest . port_num ) ) ) {
2017-12-05 23:30:01 +03:00
ret = - EINVAL ;
goto release_qp ;
}
2018-03-11 14:51:33 +03:00
if ( ( cmd - > base . attr_mask & IB_QP_CUR_STATE & &
cmd - > base . cur_qp_state > IB_QPS_ERR ) | |
cmd - > base . qp_state > IB_QPS_ERR ) {
ret = - EINVAL ;
goto release_qp ;
}
2016-12-01 14:43:15 +03:00
attr - > qp_state = cmd - > base . qp_state ;
attr - > cur_qp_state = cmd - > base . cur_qp_state ;
attr - > path_mtu = cmd - > base . path_mtu ;
attr - > path_mig_state = cmd - > base . path_mig_state ;
attr - > qkey = cmd - > base . qkey ;
attr - > rq_psn = cmd - > base . rq_psn ;
attr - > sq_psn = cmd - > base . sq_psn ;
attr - > dest_qp_num = cmd - > base . dest_qp_num ;
attr - > qp_access_flags = cmd - > base . qp_access_flags ;
attr - > pkey_index = cmd - > base . pkey_index ;
attr - > alt_pkey_index = cmd - > base . alt_pkey_index ;
attr - > en_sqd_async_notify = cmd - > base . en_sqd_async_notify ;
attr - > max_rd_atomic = cmd - > base . max_rd_atomic ;
attr - > max_dest_rd_atomic = cmd - > base . max_dest_rd_atomic ;
attr - > min_rnr_timer = cmd - > base . min_rnr_timer ;
attr - > port_num = cmd - > base . port_num ;
attr - > timeout = cmd - > base . timeout ;
attr - > retry_cnt = cmd - > base . retry_cnt ;
attr - > rnr_retry = cmd - > base . rnr_retry ;
attr - > alt_port_num = cmd - > base . alt_port_num ;
attr - > alt_timeout = cmd - > base . alt_timeout ;
attr - > rate_limit = cmd - > rate_limit ;
2017-08-23 08:35:40 +03:00
if ( cmd - > base . attr_mask & IB_QP_AV )
2017-08-17 15:50:33 +03:00
copy_ah_attr_from_uverbs ( qp - > device , & attr - > ah_attr ,
& cmd - > base . dest ) ;
2016-12-01 14:43:15 +03:00
2017-08-23 08:35:40 +03:00
if ( cmd - > base . attr_mask & IB_QP_ALT_PATH )
2017-08-17 15:50:33 +03:00
copy_ah_attr_from_uverbs ( qp - > device , & attr - > alt_ah_attr ,
& cmd - > base . alt_dest ) ;
2005-07-08 04:57:13 +04:00
2017-05-23 11:26:09 +03:00
ret = ib_modify_qp_with_udata ( qp , attr ,
modify_qp_mask ( qp - > qp_type ,
cmd - > base . attr_mask ) ,
udata ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2015-02-05 14:53:52 +03:00
release_qp :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2005-07-08 04:57:13 +04:00
out :
kfree ( attr ) ;
return ret ;
}
2016-12-01 14:43:15 +03:00
ssize_t ib_uverbs_modify_qp ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_ex_modify_qp cmd = { } ;
struct ib_udata udata ;
int ret ;
if ( copy_from_user ( & cmd . base , buf , sizeof ( cmd . base ) ) )
return - EFAULT ;
if ( cmd . base . attr_mask &
~ ( ( IB_USER_LEGACY_LAST_QP_ATTR_MASK < < 1 ) - 1 ) )
return - EOPNOTSUPP ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd . base ) , NULL ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd . base ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len ) ;
2016-12-01 14:43:15 +03:00
ret = modify_qp ( file , & cmd , & udata ) ;
if ( ret )
return ret ;
return in_len ;
}
int ib_uverbs_ex_modify_qp ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_modify_qp cmd = { } ;
int ret ;
/*
* Last bit is reserved for extending the attr_mask by
* using another field .
*/
BUILD_BUG_ON ( IB_USER_LAST_QP_ATTR_MASK = = ( 1 < < 31 ) ) ;
if ( ucore - > inlen < sizeof ( cmd . base ) )
return - EINVAL ;
ret = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( ret )
return ret ;
if ( cmd . base . attr_mask &
~ ( ( IB_USER_LAST_QP_ATTR_MASK < < 1 ) - 1 ) )
return - EOPNOTSUPP ;
if ( ucore - > inlen > sizeof ( cmd ) ) {
2017-12-24 14:54:57 +03:00
if ( ! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
2016-12-01 14:43:15 +03:00
return - EOPNOTSUPP ;
}
ret = modify_qp ( file , & cmd , uhw ) ;
return ret ;
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_destroy_qp ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
2005-09-10 02:55:08 +04:00
struct ib_uverbs_destroy_qp cmd ;
struct ib_uverbs_destroy_qp_resp resp ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uobject * uobj ;
struct ib_uqp_object * obj ;
2005-09-10 02:55:08 +04:00
int ret = - EINVAL ;
2005-07-08 04:57:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2005-09-10 02:55:08 +04:00
memset ( & resp , 0 , sizeof resp ) ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_QP , cmd . qp_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj = container_of ( uobj , struct ib_uqp_object , uevent . uobject ) ;
2017-04-04 13:31:44 +03:00
/*
* Make sure we don ' t free the memory in remove_commit as we still
* needs the uobject memory to create the response .
*/
uverbs_uobject_get ( uobj ) ;
2005-11-30 03:57:01 +03:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
if ( ret ) {
uverbs_uobject_put ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return ret ;
2017-04-04 13:31:44 +03:00
}
2005-09-10 02:55:08 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
resp . events_reported = obj - > uevent . events_reported ;
2017-04-04 13:31:44 +03:00
uverbs_uobject_put ( uobj ) ;
2005-07-08 04:57:13 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return - EFAULT ;
2005-07-08 04:57:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return in_len ;
2005-07-08 04:57:13 +04:00
}
2015-10-08 11:16:33 +03:00
static void * alloc_wr ( size_t wr_size , __u32 num_sge )
{
2017-03-24 22:55:17 +03:00
if ( num_sge > = ( U32_MAX - ALIGN ( wr_size , sizeof ( struct ib_sge ) ) ) /
sizeof ( struct ib_sge ) )
return NULL ;
2015-10-08 11:16:33 +03:00
return kmalloc ( ALIGN ( wr_size , sizeof ( struct ib_sge ) ) +
num_sge * sizeof ( struct ib_sge ) , GFP_KERNEL ) ;
2017-03-24 22:55:17 +03:00
}
2015-10-08 11:16:33 +03:00
2005-10-15 02:26:04 +04:00
ssize_t ib_uverbs_post_send ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-02-14 03:30:49 +03:00
const char __user * buf , int in_len ,
int out_len )
2005-10-15 02:26:04 +04:00
{
struct ib_uverbs_post_send cmd ;
struct ib_uverbs_post_send_resp resp ;
struct ib_uverbs_send_wr * user_wr ;
struct ib_send_wr * wr = NULL , * last , * next , * bad_wr ;
struct ib_qp * qp ;
int i , sg_ind ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
int is_ud ;
2005-10-15 02:26:04 +04:00
ssize_t ret = - EINVAL ;
2015-12-01 18:13:51 +03:00
size_t next_size ;
2005-10-15 02:26:04 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
if ( in_len < sizeof cmd + cmd . wqe_size * cmd . wr_count +
cmd . sge_count * sizeof ( struct ib_uverbs_sge ) )
return - EINVAL ;
if ( cmd . wqe_size < sizeof ( struct ib_uverbs_send_wr ) )
return - EINVAL ;
user_wr = kmalloc ( cmd . wqe_size , GFP_KERNEL ) ;
if ( ! user_wr )
return - ENOMEM ;
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp )
2005-10-15 02:26:04 +04:00
goto out ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
is_ud = qp - > qp_type = = IB_QPT_UD ;
2005-10-15 02:26:04 +04:00
sg_ind = 0 ;
last = NULL ;
for ( i = 0 ; i < cmd . wr_count ; + + i ) {
if ( copy_from_user ( user_wr ,
buf + sizeof cmd + i * cmd . wqe_size ,
cmd . wqe_size ) ) {
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-10-15 02:26:04 +04:00
}
if ( user_wr - > num_sge + sg_ind > cmd . sge_count ) {
ret = - EINVAL ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-10-15 02:26:04 +04:00
}
2015-10-08 11:16:33 +03:00
if ( is_ud ) {
struct ib_ud_wr * ud ;
if ( user_wr - > opcode ! = IB_WR_SEND & &
user_wr - > opcode ! = IB_WR_SEND_WITH_IMM ) {
ret = - EINVAL ;
goto out_put ;
}
2015-12-01 18:13:51 +03:00
next_size = sizeof ( * ud ) ;
ud = alloc_wr ( next_size , user_wr - > num_sge ) ;
2015-10-08 11:16:33 +03:00
if ( ! ud ) {
ret = - ENOMEM ;
goto out_put ;
}
2018-03-19 16:02:33 +03:00
ud - > ah = uobj_get_obj_read ( ah , UVERBS_OBJECT_AH , user_wr - > wr . ud . ah ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2015-10-08 11:16:33 +03:00
if ( ! ud - > ah ) {
kfree ( ud ) ;
ret = - EINVAL ;
goto out_put ;
}
ud - > remote_qpn = user_wr - > wr . ud . remote_qpn ;
ud - > remote_qkey = user_wr - > wr . ud . remote_qkey ;
next = & ud - > wr ;
} else if ( user_wr - > opcode = = IB_WR_RDMA_WRITE_WITH_IMM | |
user_wr - > opcode = = IB_WR_RDMA_WRITE | |
user_wr - > opcode = = IB_WR_RDMA_READ ) {
struct ib_rdma_wr * rdma ;
2015-12-01 18:13:51 +03:00
next_size = sizeof ( * rdma ) ;
rdma = alloc_wr ( next_size , user_wr - > num_sge ) ;
2015-10-08 11:16:33 +03:00
if ( ! rdma ) {
ret = - ENOMEM ;
goto out_put ;
}
rdma - > remote_addr = user_wr - > wr . rdma . remote_addr ;
rdma - > rkey = user_wr - > wr . rdma . rkey ;
next = & rdma - > wr ;
} else if ( user_wr - > opcode = = IB_WR_ATOMIC_CMP_AND_SWP | |
user_wr - > opcode = = IB_WR_ATOMIC_FETCH_AND_ADD ) {
struct ib_atomic_wr * atomic ;
2015-12-01 18:13:51 +03:00
next_size = sizeof ( * atomic ) ;
atomic = alloc_wr ( next_size , user_wr - > num_sge ) ;
2015-10-08 11:16:33 +03:00
if ( ! atomic ) {
ret = - ENOMEM ;
goto out_put ;
}
atomic - > remote_addr = user_wr - > wr . atomic . remote_addr ;
atomic - > compare_add = user_wr - > wr . atomic . compare_add ;
atomic - > swap = user_wr - > wr . atomic . swap ;
atomic - > rkey = user_wr - > wr . atomic . rkey ;
next = & atomic - > wr ;
} else if ( user_wr - > opcode = = IB_WR_SEND | |
user_wr - > opcode = = IB_WR_SEND_WITH_IMM | |
user_wr - > opcode = = IB_WR_SEND_WITH_INV ) {
2015-12-01 18:13:51 +03:00
next_size = sizeof ( * next ) ;
next = alloc_wr ( next_size , user_wr - > num_sge ) ;
2015-10-08 11:16:33 +03:00
if ( ! next ) {
ret = - ENOMEM ;
goto out_put ;
}
} else {
ret = - EINVAL ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-10-15 02:26:04 +04:00
}
2015-10-08 11:16:33 +03:00
if ( user_wr - > opcode = = IB_WR_SEND_WITH_IMM | |
user_wr - > opcode = = IB_WR_RDMA_WRITE_WITH_IMM ) {
next - > ex . imm_data =
( __be32 __force ) user_wr - > ex . imm_data ;
} else if ( user_wr - > opcode = = IB_WR_SEND_WITH_INV ) {
next - > ex . invalidate_rkey = user_wr - > ex . invalidate_rkey ;
}
2005-10-15 02:26:04 +04:00
if ( ! last )
wr = next ;
else
last - > next = next ;
last = next ;
next - > next = NULL ;
next - > wr_id = user_wr - > wr_id ;
next - > num_sge = user_wr - > num_sge ;
next - > opcode = user_wr - > opcode ;
next - > send_flags = user_wr - > send_flags ;
if ( next - > num_sge ) {
next - > sg_list = ( void * ) next +
2015-12-01 18:13:51 +03:00
ALIGN ( next_size , sizeof ( struct ib_sge ) ) ;
2005-10-15 02:26:04 +04:00
if ( copy_from_user ( next - > sg_list ,
buf + sizeof cmd +
cmd . wr_count * cmd . wqe_size +
sg_ind * sizeof ( struct ib_sge ) ,
next - > num_sge * sizeof ( struct ib_sge ) ) ) {
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-10-15 02:26:04 +04:00
}
sg_ind + = next - > num_sge ;
} else
next - > sg_list = NULL ;
}
resp . bad_wr = 0 ;
2011-08-09 02:31:51 +04:00
ret = qp - > device - > post_send ( qp - > real_qp , wr , & bad_wr ) ;
2005-10-15 02:26:04 +04:00
if ( ret )
for ( next = wr ; next ; next = next - > next ) {
+ + resp . bad_wr ;
if ( next = = bad_wr )
break ;
}
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2005-10-15 02:26:04 +04:00
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
out_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2005-10-15 02:26:04 +04:00
while ( wr ) {
2015-10-08 11:16:33 +03:00
if ( is_ud & & ud_wr ( wr ) - > ah )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( ud_wr ( wr ) - > ah ) ;
2005-10-15 02:26:04 +04:00
next = wr - > next ;
kfree ( wr ) ;
wr = next ;
}
2006-06-22 18:47:27 +04:00
out :
2005-10-15 02:26:04 +04:00
kfree ( user_wr ) ;
return ret ? ret : in_len ;
}
static struct ib_recv_wr * ib_uverbs_unmarshall_recv ( const char __user * buf ,
int in_len ,
u32 wr_count ,
u32 sge_count ,
u32 wqe_size )
{
struct ib_uverbs_recv_wr * user_wr ;
struct ib_recv_wr * wr = NULL , * last , * next ;
int sg_ind ;
int i ;
int ret ;
if ( in_len < wqe_size * wr_count +
sge_count * sizeof ( struct ib_uverbs_sge ) )
return ERR_PTR ( - EINVAL ) ;
if ( wqe_size < sizeof ( struct ib_uverbs_recv_wr ) )
return ERR_PTR ( - EINVAL ) ;
user_wr = kmalloc ( wqe_size , GFP_KERNEL ) ;
if ( ! user_wr )
return ERR_PTR ( - ENOMEM ) ;
sg_ind = 0 ;
last = NULL ;
for ( i = 0 ; i < wr_count ; + + i ) {
if ( copy_from_user ( user_wr , buf + i * wqe_size ,
wqe_size ) ) {
ret = - EFAULT ;
goto err ;
}
if ( user_wr - > num_sge + sg_ind > sge_count ) {
ret = - EINVAL ;
goto err ;
}
2017-03-24 22:55:17 +03:00
if ( user_wr - > num_sge > =
( U32_MAX - ALIGN ( sizeof * next , sizeof ( struct ib_sge ) ) ) /
sizeof ( struct ib_sge ) ) {
ret = - EINVAL ;
goto err ;
}
2005-10-15 02:26:04 +04:00
next = kmalloc ( ALIGN ( sizeof * next , sizeof ( struct ib_sge ) ) +
user_wr - > num_sge * sizeof ( struct ib_sge ) ,
GFP_KERNEL ) ;
if ( ! next ) {
ret = - ENOMEM ;
goto err ;
}
if ( ! last )
wr = next ;
else
last - > next = next ;
last = next ;
next - > next = NULL ;
next - > wr_id = user_wr - > wr_id ;
next - > num_sge = user_wr - > num_sge ;
if ( next - > num_sge ) {
next - > sg_list = ( void * ) next +
ALIGN ( sizeof * next , sizeof ( struct ib_sge ) ) ;
if ( copy_from_user ( next - > sg_list ,
buf + wr_count * wqe_size +
sg_ind * sizeof ( struct ib_sge ) ,
next - > num_sge * sizeof ( struct ib_sge ) ) ) {
ret = - EFAULT ;
goto err ;
}
sg_ind + = next - > num_sge ;
} else
next - > sg_list = NULL ;
}
kfree ( user_wr ) ;
return wr ;
err :
kfree ( user_wr ) ;
while ( wr ) {
next = wr - > next ;
kfree ( wr ) ;
wr = next ;
}
return ERR_PTR ( ret ) ;
}
ssize_t ib_uverbs_post_recv ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-02-14 03:30:49 +03:00
const char __user * buf , int in_len ,
int out_len )
2005-10-15 02:26:04 +04:00
{
struct ib_uverbs_post_recv cmd ;
struct ib_uverbs_post_recv_resp resp ;
struct ib_recv_wr * wr , * next , * bad_wr ;
struct ib_qp * qp ;
ssize_t ret = - EINVAL ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
wr = ib_uverbs_unmarshall_recv ( buf + sizeof cmd ,
in_len - sizeof cmd , cmd . wr_count ,
cmd . sge_count , cmd . wqe_size ) ;
if ( IS_ERR ( wr ) )
return PTR_ERR ( wr ) ;
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp )
2005-10-15 02:26:04 +04:00
goto out ;
resp . bad_wr = 0 ;
2011-08-09 02:31:51 +04:00
ret = qp - > device - > post_recv ( qp - > real_qp , wr , & bad_wr ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
if ( ret ) {
2005-10-15 02:26:04 +04:00
for ( next = wr ; next ; next = next - > next ) {
+ + resp . bad_wr ;
if ( next = = bad_wr )
break ;
}
2017-04-04 13:31:44 +03:00
}
2005-10-15 02:26:04 +04:00
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2005-10-15 02:26:04 +04:00
ret = - EFAULT ;
out :
while ( wr ) {
next = wr - > next ;
kfree ( wr ) ;
wr = next ;
}
return ret ? ret : in_len ;
}
ssize_t ib_uverbs_post_srq_recv ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-02-14 03:30:49 +03:00
const char __user * buf , int in_len ,
int out_len )
2005-10-15 02:26:04 +04:00
{
struct ib_uverbs_post_srq_recv cmd ;
struct ib_uverbs_post_srq_recv_resp resp ;
struct ib_recv_wr * wr , * next , * bad_wr ;
struct ib_srq * srq ;
ssize_t ret = - EINVAL ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
wr = ib_uverbs_unmarshall_recv ( buf + sizeof cmd ,
in_len - sizeof cmd , cmd . wr_count ,
cmd . sge_count , cmd . wqe_size ) ;
if ( IS_ERR ( wr ) )
return PTR_ERR ( wr ) ;
2018-03-19 16:02:33 +03:00
srq = uobj_get_obj_read ( srq , UVERBS_OBJECT_SRQ , cmd . srq_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! srq )
2005-10-15 02:26:04 +04:00
goto out ;
resp . bad_wr = 0 ;
ret = srq - > device - > post_srq_recv ( srq , wr , & bad_wr ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( srq ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
2005-10-15 02:26:04 +04:00
if ( ret )
for ( next = wr ; next ; next = next - > next ) {
+ + resp . bad_wr ;
if ( next = = bad_wr )
break ;
}
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
2005-10-15 02:26:04 +04:00
ret = - EFAULT ;
out :
while ( wr ) {
next = wr - > next ;
kfree ( wr ) ;
wr = next ;
}
return ret ? ret : in_len ;
}
ssize_t ib_uverbs_create_ah ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-10-15 02:26:04 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_create_ah cmd ;
struct ib_uverbs_create_ah_resp resp ;
struct ib_uobject * uobj ;
struct ib_pd * pd ;
struct ib_ah * ah ;
2017-04-29 21:41:18 +03:00
struct rdma_ah_attr attr ;
2005-10-15 02:26:04 +04:00
int ret ;
2016-11-23 09:23:24 +03:00
struct ib_udata udata ;
2005-10-15 02:26:04 +04:00
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-06-27 15:09:13 +03:00
if ( ! rdma_is_port_valid ( ib_dev , cmd . attr . port_num ) )
return - EINVAL ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2016-11-23 09:23:24 +03:00
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_AH , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2005-10-15 02:26:04 +04:00
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd . pd_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! pd ) {
2005-10-15 02:26:04 +04:00
ret = - EINVAL ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err ;
2005-10-15 02:26:04 +04:00
}
2017-04-29 21:41:29 +03:00
attr . type = rdma_ah_find_type ( ib_dev , cmd . attr . port_num ) ;
2017-08-04 23:54:16 +03:00
rdma_ah_set_make_grd ( & attr , false ) ;
2017-04-29 21:41:28 +03:00
rdma_ah_set_dlid ( & attr , cmd . attr . dlid ) ;
rdma_ah_set_sl ( & attr , cmd . attr . sl ) ;
rdma_ah_set_path_bits ( & attr , cmd . attr . src_path_bits ) ;
rdma_ah_set_static_rate ( & attr , cmd . attr . static_rate ) ;
rdma_ah_set_port_num ( & attr , cmd . attr . port_num ) ;
2017-04-29 21:41:16 +03:00
if ( cmd . attr . is_global ) {
2017-04-29 21:41:28 +03:00
rdma_ah_set_grh ( & attr , NULL , cmd . attr . grh . flow_label ,
cmd . attr . grh . sgid_index ,
cmd . attr . grh . hop_limit ,
cmd . attr . grh . traffic_class ) ;
rdma_ah_set_dgid_raw ( & attr , cmd . attr . grh . dgid ) ;
2017-04-29 21:41:16 +03:00
} else {
2017-04-29 21:41:28 +03:00
rdma_ah_set_ah_flags ( & attr , 0 ) ;
2017-04-29 21:41:16 +03:00
}
2016-11-23 09:23:24 +03:00
2017-10-16 08:45:12 +03:00
ah = rdma_create_user_ah ( pd , & attr , & udata ) ;
2005-10-15 02:26:04 +04:00
if ( IS_ERR ( ah ) ) {
ret = PTR_ERR ( ah ) ;
2017-04-04 13:31:44 +03:00
goto err_put ;
2005-10-15 02:26:04 +04:00
}
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
ah - > uobject = uobj ;
2017-04-04 13:31:44 +03:00
uobj - > user_handle = cmd . user_handle ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
uobj - > object = ah ;
2005-10-15 02:26:04 +04:00
resp . ah_handle = uobj - > id ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) ) {
2005-10-15 02:26:04 +04:00
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_copy ;
2005-10-15 02:26:04 +04:00
}
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
uobj_alloc_commit ( uobj ) ;
2005-10-15 02:26:04 +04:00
return in_len ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_copy :
2017-04-29 21:41:22 +03:00
rdma_destroy_ah ( ah ) ;
2005-10-15 02:26:04 +04:00
2017-04-04 13:31:44 +03:00
err_put :
uobj_put_obj_read ( pd ) ;
2006-07-17 19:20:51 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2005-10-15 02:26:04 +04:00
return ret ;
}
ssize_t ib_uverbs_destroy_ah ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-10-15 02:26:04 +04:00
const char __user * buf , int in_len , int out_len )
{
struct ib_uverbs_destroy_ah cmd ;
struct ib_uobject * uobj ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
int ret ;
2005-10-15 02:26:04 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_AH , cmd . ah_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2005-10-15 02:26:04 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
return ret ? : in_len ;
2005-10-15 02:26:04 +04:00
}
2005-07-08 04:57:13 +04:00
ssize_t ib_uverbs_attach_mcast ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_attach_mcast cmd ;
struct ib_qp * qp ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uqp_object * obj ;
2005-11-30 03:57:01 +03:00
struct ib_uverbs_mcast_entry * mcast ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
int ret ;
2005-07-08 04:57:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp )
return - EINVAL ;
2005-11-30 03:57:01 +03:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj = container_of ( qp - > uobject , struct ib_uqp_object , uevent . uobject ) ;
2005-11-30 03:57:01 +03:00
2017-04-04 13:31:45 +03:00
mutex_lock ( & obj - > mcast_lock ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
list_for_each_entry ( mcast , & obj - > mcast_list , list )
2005-11-30 03:57:01 +03:00
if ( cmd . mlid = = mcast - > lid & &
! memcmp ( cmd . gid , mcast - > gid . raw , sizeof mcast - > gid . raw ) ) {
ret = 0 ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-11-30 03:57:01 +03:00
}
mcast = kmalloc ( sizeof * mcast , GFP_KERNEL ) ;
if ( ! mcast ) {
ret = - ENOMEM ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto out_put ;
2005-11-30 03:57:01 +03:00
}
mcast - > lid = cmd . mlid ;
memcpy ( mcast - > gid . raw , cmd . gid , sizeof mcast - > gid . raw ) ;
2005-07-08 04:57:13 +04:00
2005-11-30 03:57:01 +03:00
ret = ib_attach_mcast ( qp , & mcast - > gid , cmd . mlid ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! ret )
list_add_tail ( & mcast - > list , & obj - > mcast_list ) ;
else
2005-11-30 03:57:01 +03:00
kfree ( mcast ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
out_put :
2017-04-04 13:31:45 +03:00
mutex_unlock ( & obj - > mcast_lock ) ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2005-07-08 04:57:13 +04:00
return ret ? ret : in_len ;
}
ssize_t ib_uverbs_detach_mcast ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-07-08 04:57:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_detach_mcast cmd ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uqp_object * obj ;
2005-07-08 04:57:13 +04:00
struct ib_qp * qp ;
2005-11-30 03:57:01 +03:00
struct ib_uverbs_mcast_entry * mcast ;
2005-07-08 04:57:13 +04:00
int ret = - EINVAL ;
2017-04-09 20:15:32 +03:00
bool found = false ;
2005-07-08 04:57:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! qp )
return - EINVAL ;
2005-07-08 04:57:13 +04:00
2017-04-04 13:31:44 +03:00
obj = container_of ( qp - > uobject , struct ib_uqp_object , uevent . uobject ) ;
2017-04-04 13:31:45 +03:00
mutex_lock ( & obj - > mcast_lock ) ;
2017-04-04 13:31:44 +03:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
list_for_each_entry ( mcast , & obj - > mcast_list , list )
2005-11-30 03:57:01 +03:00
if ( cmd . mlid = = mcast - > lid & &
! memcmp ( cmd . gid , mcast - > gid . raw , sizeof mcast - > gid . raw ) ) {
list_del ( & mcast - > list ) ;
kfree ( mcast ) ;
2017-04-09 20:15:32 +03:00
found = true ;
2005-11-30 03:57:01 +03:00
break ;
}
2017-04-09 20:15:32 +03:00
if ( ! found ) {
ret = - EINVAL ;
goto out_put ;
}
ret = ib_detach_mcast ( qp , ( union ib_gid * ) cmd . gid , cmd . mlid ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
out_put :
2017-04-04 13:31:45 +03:00
mutex_unlock ( & obj - > mcast_lock ) ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2005-07-08 04:57:13 +04:00
return ret ? ret : in_len ;
}
2005-08-18 23:24:13 +04:00
2018-03-28 09:27:46 +03:00
struct ib_uflow_resources {
size_t max ;
size_t num ;
struct ib_flow_action * collection [ 0 ] ;
} ;
static struct ib_uflow_resources * flow_resources_alloc ( size_t num_specs )
{
struct ib_uflow_resources * resources ;
resources =
kmalloc ( sizeof ( * resources ) +
num_specs * sizeof ( * resources - > collection ) , GFP_KERNEL ) ;
if ( ! resources )
return NULL ;
resources - > num = 0 ;
resources - > max = num_specs ;
return resources ;
}
void ib_uverbs_flow_resources_free ( struct ib_uflow_resources * uflow_res )
{
unsigned int i ;
for ( i = 0 ; i < uflow_res - > num ; i + + )
atomic_dec ( & uflow_res - > collection [ i ] - > usecnt ) ;
kfree ( uflow_res ) ;
}
static void flow_resources_add ( struct ib_uflow_resources * uflow_res ,
struct ib_flow_action * action )
{
WARN_ON ( uflow_res - > num > = uflow_res - > max ) ;
atomic_inc ( & action - > usecnt ) ;
uflow_res - > collection [ uflow_res - > num + + ] = action ;
}
static int kern_spec_to_ib_spec_action ( struct ib_ucontext * ucontext ,
struct ib_uverbs_flow_spec * kern_spec ,
union ib_flow_spec * ib_spec ,
struct ib_uflow_resources * uflow_res )
2017-01-18 15:59:49 +03:00
{
ib_spec - > type = kern_spec - > type ;
switch ( ib_spec - > type ) {
case IB_FLOW_SPEC_ACTION_TAG :
if ( kern_spec - > flow_tag . size ! =
sizeof ( struct ib_uverbs_flow_spec_action_tag ) )
return - EINVAL ;
ib_spec - > flow_tag . size = sizeof ( struct ib_flow_spec_action_tag ) ;
ib_spec - > flow_tag . tag_id = kern_spec - > flow_tag . tag_id ;
break ;
2017-04-03 13:13:51 +03:00
case IB_FLOW_SPEC_ACTION_DROP :
if ( kern_spec - > drop . size ! =
sizeof ( struct ib_uverbs_flow_spec_action_drop ) )
return - EINVAL ;
ib_spec - > drop . size = sizeof ( struct ib_flow_spec_action_drop ) ;
break ;
2018-03-28 09:27:46 +03:00
case IB_FLOW_SPEC_ACTION_HANDLE :
if ( kern_spec - > action . size ! =
sizeof ( struct ib_uverbs_flow_spec_action_handle ) )
return - EOPNOTSUPP ;
ib_spec - > action . act = uobj_get_obj_read ( flow_action ,
UVERBS_OBJECT_FLOW_ACTION ,
kern_spec - > action . handle ,
ucontext ) ;
if ( ! ib_spec - > action . act )
return - EINVAL ;
ib_spec - > action . size =
sizeof ( struct ib_flow_spec_action_handle ) ;
flow_resources_add ( uflow_res , ib_spec - > action . act ) ;
uobj_put_obj_read ( ib_spec - > action . act ) ;
break ;
2017-01-18 15:59:49 +03:00
default :
return - EINVAL ;
}
return 0 ;
}
2018-03-28 09:27:44 +03:00
static size_t kern_spec_filter_sz ( const struct ib_uverbs_flow_spec_hdr * spec )
2016-08-30 16:58:32 +03:00
{
/* Returns user space filter size, includes padding */
return ( spec - > size - sizeof ( struct ib_uverbs_flow_spec_hdr ) ) / 2 ;
}
2018-03-28 09:27:44 +03:00
static ssize_t spec_filter_size ( const void * kern_spec_filter , u16 kern_filter_size ,
2016-08-30 16:58:32 +03:00
u16 ib_real_filter_sz )
{
/*
* User space filter structures must be 64 bit aligned , otherwise this
* may pass , but we won ' t handle additional new attributes .
*/
if ( kern_filter_size > ib_real_filter_sz ) {
if ( memchr_inv ( kern_spec_filter +
ib_real_filter_sz , 0 ,
kern_filter_size - ib_real_filter_sz ) )
return - EINVAL ;
return ib_real_filter_sz ;
}
return kern_filter_size ;
}
2018-03-28 09:27:44 +03:00
int ib_uverbs_kern_spec_to_ib_spec_filter ( enum ib_flow_spec_type type ,
const void * kern_spec_mask ,
const void * kern_spec_val ,
size_t kern_filter_sz ,
union ib_flow_spec * ib_spec )
2013-08-14 14:58:30 +04:00
{
2016-08-30 16:58:32 +03:00
ssize_t actual_filter_sz ;
ssize_t ib_filter_sz ;
/* User flow spec size must be aligned to 4 bytes */
if ( kern_filter_sz ! = ALIGN ( kern_filter_sz , 4 ) )
return - EINVAL ;
2018-03-28 09:27:44 +03:00
ib_spec - > type = type ;
2016-11-14 20:04:51 +03:00
if ( ib_spec - > type = = ( IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL ) )
return - EINVAL ;
2016-08-30 16:58:32 +03:00
2016-11-14 20:04:51 +03:00
switch ( ib_spec - > type & ~ IB_FLOW_SPEC_INNER ) {
2013-08-14 14:58:30 +04:00
case IB_FLOW_SPEC_ETH :
2016-08-30 16:58:32 +03:00
ib_filter_sz = offsetof ( struct ib_flow_eth_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
2013-08-14 14:58:30 +04:00
return - EINVAL ;
2016-08-30 16:58:32 +03:00
ib_spec - > size = sizeof ( struct ib_flow_spec_eth ) ;
memcpy ( & ib_spec - > eth . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > eth . mask , kern_spec_mask , actual_filter_sz ) ;
2013-08-14 14:58:30 +04:00
break ;
case IB_FLOW_SPEC_IPV4 :
2016-08-30 16:58:32 +03:00
ib_filter_sz = offsetof ( struct ib_flow_ipv4_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
2013-08-14 14:58:30 +04:00
return - EINVAL ;
2016-08-30 16:58:32 +03:00
ib_spec - > size = sizeof ( struct ib_flow_spec_ipv4 ) ;
memcpy ( & ib_spec - > ipv4 . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > ipv4 . mask , kern_spec_mask , actual_filter_sz ) ;
2013-08-14 14:58:30 +04:00
break ;
2016-06-17 15:14:50 +03:00
case IB_FLOW_SPEC_IPV6 :
2016-08-30 16:58:32 +03:00
ib_filter_sz = offsetof ( struct ib_flow_ipv6_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
2016-06-17 15:14:50 +03:00
return - EINVAL ;
2016-08-30 16:58:32 +03:00
ib_spec - > size = sizeof ( struct ib_flow_spec_ipv6 ) ;
memcpy ( & ib_spec - > ipv6 . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > ipv6 . mask , kern_spec_mask , actual_filter_sz ) ;
2016-08-30 16:58:34 +03:00
if ( ( ntohl ( ib_spec - > ipv6 . mask . flow_label ) ) > = BIT ( 20 ) | |
( ntohl ( ib_spec - > ipv6 . val . flow_label ) ) > = BIT ( 20 ) )
return - EINVAL ;
2016-06-17 15:14:50 +03:00
break ;
2013-08-14 14:58:30 +04:00
case IB_FLOW_SPEC_TCP :
case IB_FLOW_SPEC_UDP :
2016-08-30 16:58:32 +03:00
ib_filter_sz = offsetof ( struct ib_flow_tcp_udp_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
2013-08-14 14:58:30 +04:00
return - EINVAL ;
2016-08-30 16:58:32 +03:00
ib_spec - > size = sizeof ( struct ib_flow_spec_tcp_udp ) ;
memcpy ( & ib_spec - > tcp_udp . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > tcp_udp . mask , kern_spec_mask , actual_filter_sz ) ;
2013-08-14 14:58:30 +04:00
break ;
2016-11-14 20:04:47 +03:00
case IB_FLOW_SPEC_VXLAN_TUNNEL :
ib_filter_sz = offsetof ( struct ib_flow_tunnel_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
return - EINVAL ;
ib_spec - > tunnel . size = sizeof ( struct ib_flow_spec_tunnel ) ;
memcpy ( & ib_spec - > tunnel . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > tunnel . mask , kern_spec_mask , actual_filter_sz ) ;
if ( ( ntohl ( ib_spec - > tunnel . mask . tunnel_id ) ) > = BIT ( 24 ) | |
( ntohl ( ib_spec - > tunnel . val . tunnel_id ) ) > = BIT ( 24 ) )
return - EINVAL ;
break ;
2018-03-28 09:27:49 +03:00
case IB_FLOW_SPEC_ESP :
ib_filter_sz = offsetof ( struct ib_flow_esp_filter , real_sz ) ;
actual_filter_sz = spec_filter_size ( kern_spec_mask ,
kern_filter_sz ,
ib_filter_sz ) ;
if ( actual_filter_sz < = 0 )
return - EINVAL ;
ib_spec - > esp . size = sizeof ( struct ib_flow_spec_esp ) ;
memcpy ( & ib_spec - > esp . val , kern_spec_val , actual_filter_sz ) ;
memcpy ( & ib_spec - > esp . mask , kern_spec_mask , actual_filter_sz ) ;
break ;
2013-08-14 14:58:30 +04:00
default :
return - EINVAL ;
}
return 0 ;
}
2018-03-28 09:27:44 +03:00
static int kern_spec_to_ib_spec_filter ( struct ib_uverbs_flow_spec * kern_spec ,
union ib_flow_spec * ib_spec )
{
ssize_t kern_filter_sz ;
void * kern_spec_mask ;
void * kern_spec_val ;
if ( kern_spec - > reserved )
return - EINVAL ;
kern_filter_sz = kern_spec_filter_sz ( & kern_spec - > hdr ) ;
kern_spec_val = ( void * ) kern_spec +
sizeof ( struct ib_uverbs_flow_spec_hdr ) ;
kern_spec_mask = kern_spec_val + kern_filter_sz ;
return ib_uverbs_kern_spec_to_ib_spec_filter ( kern_spec - > type ,
kern_spec_mask ,
kern_spec_val ,
kern_filter_sz , ib_spec ) ;
}
2018-03-28 09:27:46 +03:00
static int kern_spec_to_ib_spec ( struct ib_ucontext * ucontext ,
struct ib_uverbs_flow_spec * kern_spec ,
union ib_flow_spec * ib_spec ,
struct ib_uflow_resources * uflow_res )
2017-01-18 15:59:49 +03:00
{
if ( kern_spec - > reserved )
return - EINVAL ;
if ( kern_spec - > type > = IB_FLOW_SPEC_ACTION_TAG )
2018-03-28 09:27:46 +03:00
return kern_spec_to_ib_spec_action ( ucontext , kern_spec , ib_spec ,
uflow_res ) ;
2017-01-18 15:59:49 +03:00
else
return kern_spec_to_ib_spec_filter ( kern_spec , ib_spec ) ;
}
2016-05-23 15:20:49 +03:00
int ib_uverbs_ex_create_wq ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_create_wq cmd = { } ;
struct ib_uverbs_ex_create_wq_resp resp = { } ;
struct ib_uwq_object * obj ;
int err = 0 ;
struct ib_cq * cq ;
struct ib_pd * pd ;
struct ib_wq * wq ;
struct ib_wq_init_attr wq_init_attr = { } ;
size_t required_cmd_sz ;
size_t required_resp_len ;
required_cmd_sz = offsetof ( typeof ( cmd ) , max_sge ) + sizeof ( cmd . max_sge ) ;
required_resp_len = offsetof ( typeof ( resp ) , wqn ) + sizeof ( resp . wqn ) ;
if ( ucore - > inlen < required_cmd_sz )
return - EINVAL ;
if ( ucore - > outlen < required_resp_len )
return - ENOSPC ;
if ( ucore - > inlen > sizeof ( cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
return - EOPNOTSUPP ;
err = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( err )
return err ;
if ( cmd . comp_mask )
return - EOPNOTSUPP ;
2018-03-19 16:02:33 +03:00
obj = ( struct ib_uwq_object * ) uobj_alloc ( UVERBS_OBJECT_WQ ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) )
return PTR_ERR ( obj ) ;
2016-05-23 15:20:49 +03:00
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd . pd_handle , file - > ucontext ) ;
2016-05-23 15:20:49 +03:00
if ( ! pd ) {
err = - EINVAL ;
goto err_uobj ;
}
2018-03-19 16:02:33 +03:00
cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd . cq_handle , file - > ucontext ) ;
2016-05-23 15:20:49 +03:00
if ( ! cq ) {
err = - EINVAL ;
goto err_put_pd ;
}
wq_init_attr . cq = cq ;
wq_init_attr . max_sge = cmd . max_sge ;
wq_init_attr . max_wr = cmd . max_wr ;
wq_init_attr . wq_context = file ;
wq_init_attr . wq_type = cmd . wq_type ;
wq_init_attr . event_handler = ib_uverbs_wq_event_handler ;
2017-01-18 16:39:59 +03:00
if ( ucore - > inlen > = ( offsetof ( typeof ( cmd ) , create_flags ) +
sizeof ( cmd . create_flags ) ) )
wq_init_attr . create_flags = cmd . create_flags ;
2016-05-23 15:20:49 +03:00
obj - > uevent . events_reported = 0 ;
INIT_LIST_HEAD ( & obj - > uevent . event_list ) ;
2018-02-14 15:38:43 +03:00
if ( ! pd - > device - > create_wq ) {
err = - EOPNOTSUPP ;
goto err_put_cq ;
}
2016-05-23 15:20:49 +03:00
wq = pd - > device - > create_wq ( pd , & wq_init_attr , uhw ) ;
if ( IS_ERR ( wq ) ) {
err = PTR_ERR ( wq ) ;
goto err_put_cq ;
}
wq - > uobject = & obj - > uevent . uobject ;
obj - > uevent . uobject . object = wq ;
wq - > wq_type = wq_init_attr . wq_type ;
wq - > cq = cq ;
wq - > pd = pd ;
wq - > device = pd - > device ;
wq - > wq_context = wq_init_attr . wq_context ;
atomic_set ( & wq - > usecnt , 0 ) ;
atomic_inc ( & pd - > usecnt ) ;
atomic_inc ( & cq - > usecnt ) ;
wq - > uobject = & obj - > uevent . uobject ;
obj - > uevent . uobject . object = wq ;
memset ( & resp , 0 , sizeof ( resp ) ) ;
resp . wq_handle = obj - > uevent . uobject . id ;
resp . max_sge = wq_init_attr . max_sge ;
resp . max_wr = wq_init_attr . max_wr ;
resp . wqn = wq - > wq_num ;
resp . response_length = required_resp_len ;
err = ib_copy_to_udata ( ucore ,
& resp , resp . response_length ) ;
if ( err )
goto err_copy ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
uobj_put_obj_read ( cq ) ;
uobj_alloc_commit ( & obj - > uevent . uobject ) ;
2016-05-23 15:20:49 +03:00
return 0 ;
err_copy :
ib_destroy_wq ( wq ) ;
err_put_cq :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( cq ) ;
2016-05-23 15:20:49 +03:00
err_put_pd :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2016-05-23 15:20:49 +03:00
err_uobj :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uevent . uobject ) ;
2016-05-23 15:20:49 +03:00
return err ;
}
int ib_uverbs_ex_destroy_wq ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_destroy_wq cmd = { } ;
struct ib_uverbs_ex_destroy_wq_resp resp = { } ;
struct ib_uobject * uobj ;
struct ib_uwq_object * obj ;
size_t required_cmd_sz ;
size_t required_resp_len ;
int ret ;
required_cmd_sz = offsetof ( typeof ( cmd ) , wq_handle ) + sizeof ( cmd . wq_handle ) ;
required_resp_len = offsetof ( typeof ( resp ) , reserved ) + sizeof ( resp . reserved ) ;
if ( ucore - > inlen < required_cmd_sz )
return - EINVAL ;
if ( ucore - > outlen < required_resp_len )
return - ENOSPC ;
if ( ucore - > inlen > sizeof ( cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
return - EOPNOTSUPP ;
ret = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( ret )
return ret ;
if ( cmd . comp_mask )
return - EOPNOTSUPP ;
resp . response_length = required_resp_len ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_WQ , cmd . wq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2016-05-23 15:20:49 +03:00
obj = container_of ( uobj , struct ib_uwq_object , uevent . uobject ) ;
2017-04-04 13:31:44 +03:00
/*
* Make sure we don ' t free the memory in remove_commit as we still
* needs the uobject memory to create the response .
*/
uverbs_uobject_get ( uobj ) ;
2016-05-23 15:20:49 +03:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
2016-05-23 15:20:49 +03:00
resp . events_reported = obj - > uevent . events_reported ;
2017-04-04 13:31:44 +03:00
uverbs_uobject_put ( uobj ) ;
2016-05-23 15:20:49 +03:00
if ( ret )
return ret ;
2017-04-18 12:03:40 +03:00
return ib_copy_to_udata ( ucore , & resp , resp . response_length ) ;
2016-05-23 15:20:49 +03:00
}
int ib_uverbs_ex_modify_wq ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_modify_wq cmd = { } ;
struct ib_wq * wq ;
struct ib_wq_attr wq_attr = { } ;
size_t required_cmd_sz ;
int ret ;
required_cmd_sz = offsetof ( typeof ( cmd ) , curr_wq_state ) + sizeof ( cmd . curr_wq_state ) ;
if ( ucore - > inlen < required_cmd_sz )
return - EINVAL ;
if ( ucore - > inlen > sizeof ( cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
return - EOPNOTSUPP ;
ret = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( ret )
return ret ;
if ( ! cmd . attr_mask )
return - EINVAL ;
2017-01-18 16:39:59 +03:00
if ( cmd . attr_mask > ( IB_WQ_STATE | IB_WQ_CUR_STATE | IB_WQ_FLAGS ) )
2016-05-23 15:20:49 +03:00
return - EINVAL ;
2018-03-19 16:02:33 +03:00
wq = uobj_get_obj_read ( wq , UVERBS_OBJECT_WQ , cmd . wq_handle , file - > ucontext ) ;
2016-05-23 15:20:49 +03:00
if ( ! wq )
return - EINVAL ;
wq_attr . curr_wq_state = cmd . curr_wq_state ;
wq_attr . wq_state = cmd . wq_state ;
2017-01-18 16:39:59 +03:00
if ( cmd . attr_mask & IB_WQ_FLAGS ) {
wq_attr . flags = cmd . flags ;
wq_attr . flags_mask = cmd . flags_mask ;
}
2018-02-14 15:38:43 +03:00
if ( ! wq - > device - > modify_wq ) {
ret = - EOPNOTSUPP ;
goto out ;
}
2016-05-23 15:20:49 +03:00
ret = wq - > device - > modify_wq ( wq , & wq_attr , cmd . attr_mask , uhw ) ;
2018-02-14 15:38:43 +03:00
out :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( wq ) ;
2016-05-23 15:20:49 +03:00
return ret ;
}
2016-05-23 15:20:52 +03:00
int ib_uverbs_ex_create_rwq_ind_table ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_create_rwq_ind_table cmd = { } ;
struct ib_uverbs_ex_create_rwq_ind_table_resp resp = { } ;
struct ib_uobject * uobj ;
int err = 0 ;
struct ib_rwq_ind_table_init_attr init_attr = { } ;
struct ib_rwq_ind_table * rwq_ind_tbl ;
struct ib_wq * * wqs = NULL ;
u32 * wqs_handles = NULL ;
struct ib_wq * wq = NULL ;
int i , j , num_read_wqs ;
u32 num_wq_handles ;
u32 expected_in_size ;
size_t required_cmd_sz_header ;
size_t required_resp_len ;
required_cmd_sz_header = offsetof ( typeof ( cmd ) , log_ind_tbl_size ) + sizeof ( cmd . log_ind_tbl_size ) ;
required_resp_len = offsetof ( typeof ( resp ) , ind_tbl_num ) + sizeof ( resp . ind_tbl_num ) ;
if ( ucore - > inlen < required_cmd_sz_header )
return - EINVAL ;
if ( ucore - > outlen < required_resp_len )
return - ENOSPC ;
err = ib_copy_from_udata ( & cmd , ucore , required_cmd_sz_header ) ;
if ( err )
return err ;
ucore - > inbuf + = required_cmd_sz_header ;
ucore - > inlen - = required_cmd_sz_header ;
if ( cmd . comp_mask )
return - EOPNOTSUPP ;
if ( cmd . log_ind_tbl_size > IB_USER_VERBS_MAX_LOG_IND_TBL_SIZE )
return - EINVAL ;
num_wq_handles = 1 < < cmd . log_ind_tbl_size ;
expected_in_size = num_wq_handles * sizeof ( __u32 ) ;
if ( num_wq_handles = = 1 )
/* input size for wq handles is u64 aligned */
expected_in_size + = sizeof ( __u32 ) ;
if ( ucore - > inlen < expected_in_size )
return - EINVAL ;
if ( ucore - > inlen > expected_in_size & &
! ib_is_udata_cleared ( ucore , expected_in_size ,
ucore - > inlen - expected_in_size ) )
return - EOPNOTSUPP ;
wqs_handles = kcalloc ( num_wq_handles , sizeof ( * wqs_handles ) ,
GFP_KERNEL ) ;
if ( ! wqs_handles )
return - ENOMEM ;
err = ib_copy_from_udata ( wqs_handles , ucore ,
num_wq_handles * sizeof ( __u32 ) ) ;
if ( err )
goto err_free ;
wqs = kcalloc ( num_wq_handles , sizeof ( * wqs ) , GFP_KERNEL ) ;
if ( ! wqs ) {
err = - ENOMEM ;
goto err_free ;
}
for ( num_read_wqs = 0 ; num_read_wqs < num_wq_handles ;
num_read_wqs + + ) {
2018-03-19 16:02:33 +03:00
wq = uobj_get_obj_read ( wq , UVERBS_OBJECT_WQ , wqs_handles [ num_read_wqs ] ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
2016-05-23 15:20:52 +03:00
if ( ! wq ) {
err = - EINVAL ;
goto put_wqs ;
}
wqs [ num_read_wqs ] = wq ;
}
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_RWQ_IND_TBL , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) ) {
err = PTR_ERR ( uobj ) ;
2016-05-23 15:20:52 +03:00
goto put_wqs ;
}
init_attr . log_ind_tbl_size = cmd . log_ind_tbl_size ;
init_attr . ind_tbl = wqs ;
2018-02-14 15:38:43 +03:00
if ( ! ib_dev - > create_rwq_ind_table ) {
err = - EOPNOTSUPP ;
goto err_uobj ;
}
2016-05-23 15:20:52 +03:00
rwq_ind_tbl = ib_dev - > create_rwq_ind_table ( ib_dev , & init_attr , uhw ) ;
if ( IS_ERR ( rwq_ind_tbl ) ) {
err = PTR_ERR ( rwq_ind_tbl ) ;
goto err_uobj ;
}
rwq_ind_tbl - > ind_tbl = wqs ;
rwq_ind_tbl - > log_ind_tbl_size = init_attr . log_ind_tbl_size ;
rwq_ind_tbl - > uobject = uobj ;
uobj - > object = rwq_ind_tbl ;
rwq_ind_tbl - > device = ib_dev ;
atomic_set ( & rwq_ind_tbl - > usecnt , 0 ) ;
for ( i = 0 ; i < num_wq_handles ; i + + )
atomic_inc ( & wqs [ i ] - > usecnt ) ;
resp . ind_tbl_handle = uobj - > id ;
resp . ind_tbl_num = rwq_ind_tbl - > ind_tbl_num ;
resp . response_length = required_resp_len ;
err = ib_copy_to_udata ( ucore ,
& resp , resp . response_length ) ;
if ( err )
goto err_copy ;
kfree ( wqs_handles ) ;
for ( j = 0 ; j < num_read_wqs ; j + + )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( wqs [ j ] ) ;
2016-05-23 15:20:52 +03:00
2017-04-04 13:31:44 +03:00
uobj_alloc_commit ( uobj ) ;
2016-05-23 15:20:52 +03:00
return 0 ;
err_copy :
ib_destroy_rwq_ind_table ( rwq_ind_tbl ) ;
err_uobj :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2016-05-23 15:20:52 +03:00
put_wqs :
for ( j = 0 ; j < num_read_wqs ; j + + )
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( wqs [ j ] ) ;
2016-05-23 15:20:52 +03:00
err_free :
kfree ( wqs_handles ) ;
kfree ( wqs ) ;
return err ;
}
int ib_uverbs_ex_destroy_rwq_ind_table ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_destroy_rwq_ind_table cmd = { } ;
struct ib_uobject * uobj ;
int ret ;
size_t required_cmd_sz ;
required_cmd_sz = offsetof ( typeof ( cmd ) , ind_tbl_handle ) + sizeof ( cmd . ind_tbl_handle ) ;
if ( ucore - > inlen < required_cmd_sz )
return - EINVAL ;
if ( ucore - > inlen > sizeof ( cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
return - EOPNOTSUPP ;
ret = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( ret )
return ret ;
if ( cmd . comp_mask )
return - EOPNOTSUPP ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_RWQ_IND_TBL , cmd . ind_tbl_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2016-05-23 15:20:52 +03:00
2017-04-04 13:31:44 +03:00
return uobj_remove_commit ( uobj ) ;
2016-05-23 15:20:52 +03:00
}
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
int ib_uverbs_ex_create_flow ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
struct ib_udata * ucore ,
struct ib_udata * uhw )
2013-08-14 14:58:30 +04:00
{
struct ib_uverbs_create_flow cmd ;
struct ib_uverbs_create_flow_resp resp ;
struct ib_uobject * uobj ;
2018-03-28 09:27:46 +03:00
struct ib_uflow_object * uflow ;
2013-08-14 14:58:30 +04:00
struct ib_flow * flow_id ;
2013-11-07 02:21:45 +04:00
struct ib_uverbs_flow_attr * kern_flow_attr ;
2013-08-14 14:58:30 +04:00
struct ib_flow_attr * flow_attr ;
struct ib_qp * qp ;
2018-03-28 09:27:46 +03:00
struct ib_uflow_resources * uflow_res ;
2013-08-14 14:58:30 +04:00
int err = 0 ;
void * kern_spec ;
void * ib_spec ;
int i ;
2013-12-12 02:01:52 +04:00
if ( ucore - > inlen < sizeof ( cmd ) )
return - EINVAL ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
if ( ucore - > outlen < sizeof ( resp ) )
2013-08-14 14:58:30 +04:00
return - ENOSPC ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
err = ib_copy_from_udata ( & cmd , ucore , sizeof ( cmd ) ) ;
if ( err )
return err ;
ucore - > inbuf + = sizeof ( cmd ) ;
ucore - > inlen - = sizeof ( cmd ) ;
2013-08-14 14:58:30 +04:00
2013-09-01 19:39:52 +04:00
if ( cmd . comp_mask )
return - EINVAL ;
2016-05-13 18:52:26 +03:00
if ( ! capable ( CAP_NET_RAW ) )
2013-08-14 14:58:30 +04:00
return - EPERM ;
2016-02-18 19:31:05 +03:00
if ( cmd . flow_attr . flags > = IB_FLOW_ATTR_FLAGS_RESERVED )
return - EINVAL ;
if ( ( cmd . flow_attr . flags & IB_FLOW_ATTR_FLAGS_DONT_TRAP ) & &
( ( cmd . flow_attr . type = = IB_FLOW_ATTR_ALL_DEFAULT ) | |
( cmd . flow_attr . type = = IB_FLOW_ATTR_MC_DEFAULT ) ) )
return - EINVAL ;
2013-11-07 02:21:44 +04:00
if ( cmd . flow_attr . num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS )
2013-09-01 19:39:52 +04:00
return - EINVAL ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
if ( cmd . flow_attr . size > ucore - > inlen | |
2013-11-07 02:21:44 +04:00
cmd . flow_attr . size >
2013-11-07 02:21:46 +04:00
( cmd . flow_attr . num_of_specs * sizeof ( struct ib_uverbs_flow_spec ) ) )
2013-09-01 19:39:52 +04:00
return - EINVAL ;
2013-12-12 02:01:49 +04:00
if ( cmd . flow_attr . reserved [ 0 ] | |
cmd . flow_attr . reserved [ 1 ] )
return - EINVAL ;
2013-08-14 14:58:30 +04:00
if ( cmd . flow_attr . num_of_specs ) {
2013-11-07 02:21:44 +04:00
kern_flow_attr = kmalloc ( sizeof ( * kern_flow_attr ) + cmd . flow_attr . size ,
GFP_KERNEL ) ;
2013-08-14 14:58:30 +04:00
if ( ! kern_flow_attr )
return - ENOMEM ;
memcpy ( kern_flow_attr , & cmd . flow_attr , sizeof ( * kern_flow_attr ) ) ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
err = ib_copy_from_udata ( kern_flow_attr + 1 , ucore ,
cmd . flow_attr . size ) ;
if ( err )
2013-08-14 14:58:30 +04:00
goto err_free_attr ;
} else {
kern_flow_attr = & cmd . flow_attr ;
}
2018-03-19 16:02:33 +03:00
uobj = uobj_alloc ( UVERBS_OBJECT_FLOW , file - > ucontext ) ;
2017-04-04 13:31:44 +03:00
if ( IS_ERR ( uobj ) ) {
err = PTR_ERR ( uobj ) ;
2013-08-14 14:58:30 +04:00
goto err_free_attr ;
}
2018-03-19 16:02:33 +03:00
qp = uobj_get_obj_read ( qp , UVERBS_OBJECT_QP , cmd . qp_handle , file - > ucontext ) ;
2013-08-14 14:58:30 +04:00
if ( ! qp ) {
err = - EINVAL ;
goto err_uobj ;
}
2016-08-30 16:58:32 +03:00
flow_attr = kzalloc ( sizeof ( * flow_attr ) + cmd . flow_attr . num_of_specs *
sizeof ( union ib_flow_spec ) , GFP_KERNEL ) ;
2013-08-14 14:58:30 +04:00
if ( ! flow_attr ) {
err = - ENOMEM ;
goto err_put ;
}
2018-03-28 09:27:46 +03:00
uflow_res = flow_resources_alloc ( cmd . flow_attr . num_of_specs ) ;
if ( ! uflow_res ) {
err = - ENOMEM ;
goto err_free_flow_attr ;
}
2013-08-14 14:58:30 +04:00
flow_attr - > type = kern_flow_attr - > type ;
flow_attr - > priority = kern_flow_attr - > priority ;
flow_attr - > num_of_specs = kern_flow_attr - > num_of_specs ;
flow_attr - > port = kern_flow_attr - > port ;
flow_attr - > flags = kern_flow_attr - > flags ;
flow_attr - > size = sizeof ( * flow_attr ) ;
kern_spec = kern_flow_attr + 1 ;
ib_spec = flow_attr + 1 ;
2013-11-07 02:21:44 +04:00
for ( i = 0 ; i < flow_attr - > num_of_specs & &
2013-11-07 02:21:46 +04:00
cmd . flow_attr . size > offsetof ( struct ib_uverbs_flow_spec , reserved ) & &
2013-11-07 02:21:44 +04:00
cmd . flow_attr . size > =
2013-11-07 02:21:46 +04:00
( ( struct ib_uverbs_flow_spec * ) kern_spec ) - > size ; i + + ) {
2018-03-28 09:27:46 +03:00
err = kern_spec_to_ib_spec ( file - > ucontext , kern_spec , ib_spec ,
uflow_res ) ;
2013-08-14 14:58:30 +04:00
if ( err )
goto err_free ;
flow_attr - > size + =
( ( union ib_flow_spec * ) ib_spec ) - > size ;
2013-11-07 02:21:46 +04:00
cmd . flow_attr . size - = ( ( struct ib_uverbs_flow_spec * ) kern_spec ) - > size ;
kern_spec + = ( ( struct ib_uverbs_flow_spec * ) kern_spec ) - > size ;
2013-08-14 14:58:30 +04:00
ib_spec + = ( ( union ib_flow_spec * ) ib_spec ) - > size ;
}
2013-11-07 02:21:44 +04:00
if ( cmd . flow_attr . size | | ( i ! = flow_attr - > num_of_specs ) ) {
pr_warn ( " create flow failed, flow %d: %d bytes left from uverb cmd \n " ,
i , cmd . flow_attr . size ) ;
2013-12-12 02:01:50 +04:00
err = - EINVAL ;
2013-08-14 14:58:30 +04:00
goto err_free ;
}
flow_id = ib_create_flow ( qp , flow_attr , IB_FLOW_DOMAIN_USER ) ;
if ( IS_ERR ( flow_id ) ) {
err = PTR_ERR ( flow_id ) ;
2017-04-04 13:31:44 +03:00
goto err_free ;
2013-08-14 14:58:30 +04:00
}
flow_id - > uobject = uobj ;
uobj - > object = flow_id ;
2018-03-28 09:27:46 +03:00
uflow = container_of ( uobj , typeof ( * uflow ) , uobject ) ;
uflow - > resources = uflow_res ;
2013-08-14 14:58:30 +04:00
memset ( & resp , 0 , sizeof ( resp ) ) ;
resp . flow_handle = uobj - > id ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
err = ib_copy_to_udata ( ucore ,
& resp , sizeof ( resp ) ) ;
if ( err )
2013-08-14 14:58:30 +04:00
goto err_copy ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
uobj_alloc_commit ( uobj ) ;
2013-08-14 14:58:30 +04:00
kfree ( flow_attr ) ;
if ( cmd . flow_attr . num_of_specs )
kfree ( kern_flow_attr ) ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
return 0 ;
2013-08-14 14:58:30 +04:00
err_copy :
ib_destroy_flow ( flow_id ) ;
err_free :
2018-03-28 09:27:46 +03:00
ib_uverbs_flow_resources_free ( uflow_res ) ;
err_free_flow_attr :
2013-08-14 14:58:30 +04:00
kfree ( flow_attr ) ;
err_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( qp ) ;
2013-08-14 14:58:30 +04:00
err_uobj :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( uobj ) ;
2013-08-14 14:58:30 +04:00
err_free_attr :
if ( cmd . flow_attr . num_of_specs )
kfree ( kern_flow_attr ) ;
return err ;
}
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
int ib_uverbs_ex_destroy_flow ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
2013-08-14 14:58:30 +04:00
struct ib_uverbs_destroy_flow cmd ;
struct ib_uobject * uobj ;
int ret ;
2013-12-12 02:01:52 +04:00
if ( ucore - > inlen < sizeof ( cmd ) )
return - EINVAL ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
ret = ib_copy_from_udata ( & cmd , ucore , sizeof ( cmd ) ) ;
if ( ret )
return ret ;
2013-08-14 14:58:30 +04:00
2013-12-12 02:01:48 +04:00
if ( cmd . comp_mask )
return - EINVAL ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_FLOW , cmd . flow_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
2013-08-14 14:58:30 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc96583f, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07 02:21:49 +04:00
return ret ;
2013-08-14 14:58:30 +04:00
}
2011-12-07 01:13:10 +04:00
static int __uverbs_create_xsrq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-12-07 01:13:10 +04:00
struct ib_uverbs_create_xsrq * cmd ,
struct ib_udata * udata )
2005-08-18 23:24:13 +04:00
{
struct ib_uverbs_create_srq_resp resp ;
2011-05-26 04:08:38 +04:00
struct ib_usrq_object * obj ;
2005-08-18 23:24:13 +04:00
struct ib_pd * pd ;
struct ib_srq * srq ;
2011-05-26 04:08:38 +04:00
struct ib_uobject * uninitialized_var ( xrcd_uobj ) ;
2005-08-18 23:24:13 +04:00
struct ib_srq_init_attr attr ;
int ret ;
2018-03-19 16:02:33 +03:00
obj = ( struct ib_usrq_object * ) uobj_alloc ( UVERBS_OBJECT_SRQ ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( obj ) )
return PTR_ERR ( obj ) ;
2005-08-18 23:24:13 +04:00
2017-08-17 15:52:07 +03:00
if ( cmd - > srq_type = = IB_SRQT_TM )
attr . ext . tag_matching . max_num_tags = cmd - > max_num_tags ;
2011-05-26 04:08:38 +04:00
if ( cmd - > srq_type = = IB_SRQT_XRC ) {
2018-03-19 16:02:33 +03:00
xrcd_uobj = uobj_get_read ( UVERBS_OBJECT_XRCD , cmd - > xrcd_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( xrcd_uobj ) ) {
2011-05-26 04:08:38 +04:00
ret = - EINVAL ;
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
goto err ;
2011-05-26 04:08:38 +04:00
}
2017-04-04 13:31:44 +03:00
attr . ext . xrc . xrcd = ( struct ib_xrcd * ) xrcd_uobj - > object ;
if ( ! attr . ext . xrc . xrcd ) {
ret = - EINVAL ;
goto err_put_xrcd ;
}
2011-05-26 04:08:38 +04:00
obj - > uxrcd = container_of ( xrcd_uobj , struct ib_uxrcd_object , uobject ) ;
atomic_inc ( & obj - > uxrcd - > refcnt ) ;
2017-08-17 15:52:04 +03:00
}
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
2017-08-17 15:52:04 +03:00
if ( ib_srq_has_cq ( cmd - > srq_type ) ) {
2018-03-19 16:02:33 +03:00
attr . ext . cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd - > cq_handle ,
2017-08-17 15:52:04 +03:00
file - > ucontext ) ;
if ( ! attr . ext . cq ) {
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
ret = - EINVAL ;
goto err_put_xrcd ;
}
}
2018-03-19 16:02:33 +03:00
pd = uobj_get_obj_read ( pd , UVERBS_OBJECT_PD , cmd - > pd_handle , file - > ucontext ) ;
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
if ( ! pd ) {
ret = - EINVAL ;
goto err_put_cq ;
2011-05-26 04:08:38 +04:00
}
2005-08-18 23:24:13 +04:00
attr . event_handler = ib_uverbs_srq_event_handler ;
attr . srq_context = file ;
2011-05-26 04:08:38 +04:00
attr . srq_type = cmd - > srq_type ;
attr . attr . max_wr = cmd - > max_wr ;
attr . attr . max_sge = cmd - > max_sge ;
attr . attr . srq_limit = cmd - > srq_limit ;
2005-08-18 23:24:13 +04:00
2011-05-26 04:08:38 +04:00
obj - > uevent . events_reported = 0 ;
INIT_LIST_HEAD ( & obj - > uevent . event_list ) ;
2005-08-18 23:24:13 +04:00
2011-05-26 04:08:38 +04:00
srq = pd - > device - > create_srq ( pd , & attr , udata ) ;
2005-08-18 23:24:13 +04:00
if ( IS_ERR ( srq ) ) {
ret = PTR_ERR ( srq ) ;
2006-07-17 19:20:51 +04:00
goto err_put ;
2005-08-18 23:24:13 +04:00
}
2011-05-26 04:08:38 +04:00
srq - > device = pd - > device ;
srq - > pd = pd ;
srq - > srq_type = cmd - > srq_type ;
srq - > uobject = & obj - > uevent . uobject ;
2005-08-18 23:24:13 +04:00
srq - > event_handler = attr . event_handler ;
srq - > srq_context = attr . srq_context ;
2011-05-26 04:08:38 +04:00
2017-08-17 15:52:04 +03:00
if ( ib_srq_has_cq ( cmd - > srq_type ) ) {
srq - > ext . cq = attr . ext . cq ;
atomic_inc ( & attr . ext . cq - > usecnt ) ;
}
2011-05-26 04:08:38 +04:00
if ( cmd - > srq_type = = IB_SRQT_XRC ) {
srq - > ext . xrc . xrcd = attr . ext . xrc . xrcd ;
atomic_inc ( & attr . ext . xrc . xrcd - > usecnt ) ;
}
2005-08-18 23:24:13 +04:00
atomic_inc ( & pd - > usecnt ) ;
atomic_set ( & srq - > usecnt , 0 ) ;
2011-05-26 04:08:38 +04:00
obj - > uevent . uobject . object = srq ;
2017-04-04 13:31:44 +03:00
obj - > uevent . uobject . user_handle = cmd - > user_handle ;
2005-08-18 23:24:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
memset ( & resp , 0 , sizeof resp ) ;
2011-05-26 04:08:38 +04:00
resp . srq_handle = obj - > uevent . uobject . id ;
2006-02-23 23:36:18 +03:00
resp . max_wr = attr . attr . max_wr ;
resp . max_sge = attr . attr . max_sge ;
2011-05-26 04:08:38 +04:00
if ( cmd - > srq_type = = IB_SRQT_XRC )
resp . srqn = srq - > ext . xrc . srq_num ;
2005-08-18 23:24:13 +04:00
2018-03-27 23:18:47 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd - > response ) ,
2005-08-18 23:24:13 +04:00
& resp , sizeof resp ) ) {
ret = - EFAULT ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
goto err_copy ;
2005-08-18 23:24:13 +04:00
}
2017-08-17 15:52:04 +03:00
if ( cmd - > srq_type = = IB_SRQT_XRC )
2017-04-04 13:31:44 +03:00
uobj_put_read ( xrcd_uobj ) ;
2017-08-17 15:52:04 +03:00
if ( ib_srq_has_cq ( cmd - > srq_type ) )
uobj_put_obj_read ( attr . ext . cq ) ;
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
uobj_alloc_commit ( & obj - > uevent . uobject ) ;
2005-08-18 23:24:13 +04:00
2011-05-26 04:08:38 +04:00
return 0 ;
2005-08-18 23:24:13 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err_copy :
2005-08-18 23:24:13 +04:00
ib_destroy_srq ( srq ) ;
2006-07-17 19:20:51 +04:00
err_put :
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( pd ) ;
2011-05-26 04:08:38 +04:00
err_put_cq :
2017-08-17 15:52:04 +03:00
if ( ib_srq_has_cq ( cmd - > srq_type ) )
uobj_put_obj_read ( attr . ext . cq ) ;
2011-05-26 04:08:38 +04:00
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
err_put_xrcd :
if ( cmd - > srq_type = = IB_SRQT_XRC ) {
atomic_dec ( & obj - > uxrcd - > refcnt ) ;
2017-04-04 13:31:44 +03:00
uobj_put_read ( xrcd_uobj ) ;
IB/uverbs: Lock SRQ / CQ / PD objects in a consistent order
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-30 23:51:50 +04:00
}
2006-07-17 19:20:51 +04:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
err :
2017-04-04 13:31:44 +03:00
uobj_alloc_abort ( & obj - > uevent . uobject ) ;
2005-08-18 23:24:13 +04:00
return ret ;
}
2011-05-26 04:08:38 +04:00
ssize_t ib_uverbs_create_srq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-05-26 04:08:38 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_create_srq cmd ;
struct ib_uverbs_create_xsrq xcmd ;
struct ib_uverbs_create_srq_resp resp ;
struct ib_udata udata ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-08-17 15:52:07 +03:00
memset ( & xcmd , 0 , sizeof ( xcmd ) ) ;
2011-05-26 04:08:38 +04:00
xcmd . response = cmd . response ;
xcmd . user_handle = cmd . user_handle ;
xcmd . srq_type = IB_SRQT_BASIC ;
xcmd . pd_handle = cmd . pd_handle ;
xcmd . max_wr = cmd . max_wr ;
xcmd . max_sge = cmd . max_sge ;
xcmd . srq_limit = cmd . srq_limit ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2011-05-26 04:08:38 +04:00
2015-08-13 18:32:04 +03:00
ret = __uverbs_create_xsrq ( file , ib_dev , & xcmd , & udata ) ;
2011-05-26 04:08:38 +04:00
if ( ret )
return ret ;
return in_len ;
}
ssize_t ib_uverbs_create_xsrq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2011-05-26 04:08:38 +04:00
const char __user * buf , int in_len , int out_len )
{
struct ib_uverbs_create_xsrq cmd ;
struct ib_uverbs_create_srq_resp resp ;
struct ib_udata udata ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof ( cmd ) ,
u64_to_user_ptr ( cmd . response ) + sizeof ( resp ) ,
2017-06-27 17:04:42 +03:00
in_len - sizeof ( cmd ) - sizeof ( struct ib_uverbs_cmd_hdr ) ,
out_len - sizeof ( resp ) ) ;
2011-05-26 04:08:38 +04:00
2015-08-13 18:32:04 +03:00
ret = __uverbs_create_xsrq ( file , ib_dev , & cmd , & udata ) ;
2011-05-26 04:08:38 +04:00
if ( ret )
return ret ;
return in_len ;
}
2005-08-18 23:24:13 +04:00
ssize_t ib_uverbs_modify_srq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-08-18 23:24:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
struct ib_uverbs_modify_srq cmd ;
2006-08-12 01:58:09 +04:00
struct ib_udata udata ;
2005-08-18 23:24:13 +04:00
struct ib_srq * srq ;
struct ib_srq_attr attr ;
int ret ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2017-09-07 00:34:26 +03:00
ib_uverbs_init_udata ( & udata , buf + sizeof cmd , NULL , in_len - sizeof cmd ,
2006-08-12 01:58:09 +04:00
out_len ) ;
2018-03-19 16:02:33 +03:00
srq = uobj_get_obj_read ( srq , UVERBS_OBJECT_SRQ , cmd . srq_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! srq )
return - EINVAL ;
2005-08-18 23:24:13 +04:00
attr . max_wr = cmd . max_wr ;
attr . srq_limit = cmd . srq_limit ;
2006-08-12 01:58:09 +04:00
ret = srq - > device - > modify_srq ( srq , & attr , cmd . attr_mask , & udata ) ;
2005-08-18 23:24:13 +04:00
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( srq ) ;
2005-08-18 23:24:13 +04:00
return ret ? ret : in_len ;
}
2006-02-14 03:31:57 +03:00
ssize_t ib_uverbs_query_srq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2006-02-14 03:31:57 +03:00
const char __user * buf ,
int in_len , int out_len )
{
struct ib_uverbs_query_srq cmd ;
struct ib_uverbs_query_srq_resp resp ;
struct ib_srq_attr attr ;
struct ib_srq * srq ;
int ret ;
if ( out_len < sizeof resp )
return - ENOSPC ;
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
srq = uobj_get_obj_read ( srq , UVERBS_OBJECT_SRQ , cmd . srq_handle , file - > ucontext ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
if ( ! srq )
return - EINVAL ;
2006-02-14 03:31:57 +03:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
ret = ib_query_srq ( srq , & attr ) ;
2006-02-14 03:31:57 +03:00
2017-04-04 13:31:44 +03:00
uobj_put_obj_read ( srq ) ;
2006-02-14 03:31:57 +03:00
if ( ret )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return ret ;
2006-02-14 03:31:57 +03:00
memset ( & resp , 0 , sizeof resp ) ;
resp . max_wr = attr . max_wr ;
resp . max_sge = attr . max_sge ;
resp . srq_limit = attr . srq_limit ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof resp ) )
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return - EFAULT ;
2006-02-14 03:31:57 +03:00
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return in_len ;
2006-02-14 03:31:57 +03:00
}
2005-08-18 23:24:13 +04:00
ssize_t ib_uverbs_destroy_srq ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2005-08-18 23:24:13 +04:00
const char __user * buf , int in_len ,
int out_len )
{
2005-09-10 02:55:08 +04:00
struct ib_uverbs_destroy_srq cmd ;
struct ib_uverbs_destroy_srq_resp resp ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
struct ib_uobject * uobj ;
struct ib_uevent_object * obj ;
2005-09-10 02:55:08 +04:00
int ret = - EINVAL ;
2005-08-18 23:24:13 +04:00
if ( copy_from_user ( & cmd , buf , sizeof cmd ) )
return - EFAULT ;
2018-03-19 16:02:33 +03:00
uobj = uobj_get_write ( UVERBS_OBJECT_SRQ , cmd . srq_handle ,
2017-04-04 13:31:44 +03:00
file - > ucontext ) ;
if ( IS_ERR ( uobj ) )
return PTR_ERR ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
obj = container_of ( uobj , struct ib_uevent_object , uobject ) ;
2017-04-04 13:31:44 +03:00
/*
* Make sure we don ' t free the memory in remove_commit as we still
* needs the uobject memory to create the response .
*/
uverbs_uobject_get ( uobj ) ;
2005-09-10 02:55:08 +04:00
2017-04-04 13:31:44 +03:00
memset ( & resp , 0 , sizeof ( resp ) ) ;
2005-08-18 23:24:13 +04:00
2017-04-04 13:31:44 +03:00
ret = uobj_remove_commit ( uobj ) ;
if ( ret ) {
uverbs_uobject_put ( uobj ) ;
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
return ret ;
2013-08-01 19:49:54 +04:00
}
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-18 07:44:49 +04:00
resp . events_reported = obj - > events_reported ;
2017-04-04 13:31:44 +03:00
uverbs_uobject_put ( uobj ) ;
2017-09-07 00:34:26 +03:00
if ( copy_to_user ( u64_to_user_ptr ( cmd . response ) , & resp , sizeof ( resp ) ) )
2017-04-04 13:31:44 +03:00
return - EFAULT ;
2005-09-10 02:55:08 +04:00
2017-04-04 13:31:44 +03:00
return in_len ;
2005-08-18 23:24:13 +04:00
}
2015-02-08 14:28:50 +03:00
int ib_uverbs_ex_query_device ( struct ib_uverbs_file * file ,
2015-08-13 18:32:04 +03:00
struct ib_device * ib_dev ,
2015-02-08 14:28:50 +03:00
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
2016-02-23 11:25:24 +03:00
struct ib_uverbs_ex_query_device_resp resp = { { 0 } } ;
2015-02-08 14:28:50 +03:00
struct ib_uverbs_ex_query_device cmd ;
2016-02-23 11:25:24 +03:00
struct ib_device_attr attr = { 0 } ;
2015-02-08 14:28:50 +03:00
int err ;
2018-02-14 15:38:43 +03:00
if ( ! ib_dev - > query_device )
return - EOPNOTSUPP ;
2015-02-08 14:28:50 +03:00
if ( ucore - > inlen < sizeof ( cmd ) )
return - EINVAL ;
err = ib_copy_from_udata ( & cmd , ucore , sizeof ( cmd ) ) ;
if ( err )
return err ;
if ( cmd . comp_mask )
return - EINVAL ;
if ( cmd . reserved )
return - EINVAL ;
2015-02-08 14:28:51 +03:00
resp . response_length = offsetof ( typeof ( resp ) , odp_caps ) ;
2015-02-08 14:28:50 +03:00
if ( ucore - > outlen < resp . response_length )
return - ENOSPC ;
2015-08-13 18:32:04 +03:00
err = ib_dev - > query_device ( ib_dev , & attr , uhw ) ;
2015-02-08 14:28:50 +03:00
if ( err )
return err ;
2015-08-13 18:32:04 +03:00
copy_query_dev_fields ( file , ib_dev , & resp . base , & attr ) ;
2015-02-08 14:28:50 +03:00
2015-02-08 14:28:51 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . odp_caps ) )
goto end ;
# ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
resp . odp_caps . general_caps = attr . odp_caps . general_caps ;
resp . odp_caps . per_transport_caps . rc_odp_caps =
attr . odp_caps . per_transport_caps . rc_odp_caps ;
resp . odp_caps . per_transport_caps . uc_odp_caps =
attr . odp_caps . per_transport_caps . uc_odp_caps ;
resp . odp_caps . per_transport_caps . ud_odp_caps =
attr . odp_caps . per_transport_caps . ud_odp_caps ;
# endif
resp . response_length + = sizeof ( resp . odp_caps ) ;
2015-06-11 16:35:24 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . timestamp_mask ) )
goto end ;
resp . timestamp_mask = attr . timestamp_mask ;
resp . response_length + = sizeof ( resp . timestamp_mask ) ;
if ( ucore - > outlen < resp . response_length + sizeof ( resp . hca_core_clock ) )
goto end ;
resp . hca_core_clock = attr . hca_core_clock ;
resp . response_length + = sizeof ( resp . hca_core_clock ) ;
2016-04-17 17:19:34 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . device_cap_flags_ex ) )
goto end ;
resp . device_cap_flags_ex = attr . device_cap_flags ;
resp . response_length + = sizeof ( resp . device_cap_flags_ex ) ;
2016-08-28 11:28:44 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . rss_caps ) )
goto end ;
resp . rss_caps . supported_qpts = attr . rss_caps . supported_qpts ;
resp . rss_caps . max_rwq_indirection_tables =
attr . rss_caps . max_rwq_indirection_tables ;
resp . rss_caps . max_rwq_indirection_table_size =
attr . rss_caps . max_rwq_indirection_table_size ;
resp . response_length + = sizeof ( resp . rss_caps ) ;
if ( ucore - > outlen < resp . response_length + sizeof ( resp . max_wq_type_rq ) )
goto end ;
resp . max_wq_type_rq = attr . max_wq_type_rq ;
resp . response_length + = sizeof ( resp . max_wq_type_rq ) ;
2017-01-18 16:39:58 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . raw_packet_caps ) )
goto end ;
resp . raw_packet_caps = attr . raw_packet_caps ;
resp . response_length + = sizeof ( resp . raw_packet_caps ) ;
2017-08-17 15:52:08 +03:00
2017-09-24 21:46:29 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . tm_caps ) )
2017-08-17 15:52:08 +03:00
goto end ;
2017-09-24 21:46:29 +03:00
resp . tm_caps . max_rndv_hdr_size = attr . tm_caps . max_rndv_hdr_size ;
resp . tm_caps . max_num_tags = attr . tm_caps . max_num_tags ;
resp . tm_caps . max_ops = attr . tm_caps . max_ops ;
resp . tm_caps . max_sge = attr . tm_caps . max_sge ;
resp . tm_caps . flags = attr . tm_caps . flags ;
resp . response_length + = sizeof ( resp . tm_caps ) ;
2017-11-13 11:51:16 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . cq_moderation_caps ) )
goto end ;
resp . cq_moderation_caps . max_cq_moderation_count =
attr . cq_caps . max_cq_moderation_count ;
resp . cq_moderation_caps . max_cq_moderation_period =
attr . cq_caps . max_cq_moderation_period ;
resp . response_length + = sizeof ( resp . cq_moderation_caps ) ;
2018-04-05 18:53:23 +03:00
if ( ucore - > outlen < resp . response_length + sizeof ( resp . max_dm_size ) )
goto end ;
resp . max_dm_size = attr . max_dm_size ;
resp . response_length + = sizeof ( resp . max_dm_size ) ;
2015-02-08 14:28:51 +03:00
end :
2015-02-08 14:28:50 +03:00
err = ib_copy_to_udata ( ucore , & resp , resp . response_length ) ;
2016-02-23 11:25:24 +03:00
return err ;
2015-02-08 14:28:50 +03:00
}
2017-11-13 11:51:13 +03:00
int ib_uverbs_ex_modify_cq ( struct ib_uverbs_file * file ,
struct ib_device * ib_dev ,
struct ib_udata * ucore ,
struct ib_udata * uhw )
{
struct ib_uverbs_ex_modify_cq cmd = { } ;
struct ib_cq * cq ;
size_t required_cmd_sz ;
int ret ;
required_cmd_sz = offsetof ( typeof ( cmd ) , reserved ) +
sizeof ( cmd . reserved ) ;
if ( ucore - > inlen < required_cmd_sz )
return - EINVAL ;
/* sanity checks */
if ( ucore - > inlen > sizeof ( cmd ) & &
! ib_is_udata_cleared ( ucore , sizeof ( cmd ) ,
ucore - > inlen - sizeof ( cmd ) ) )
return - EOPNOTSUPP ;
ret = ib_copy_from_udata ( & cmd , ucore , min ( sizeof ( cmd ) , ucore - > inlen ) ) ;
if ( ret )
return ret ;
if ( ! cmd . attr_mask | | cmd . reserved )
return - EINVAL ;
if ( cmd . attr_mask > IB_CQ_MODERATE )
return - EOPNOTSUPP ;
2018-03-19 16:02:33 +03:00
cq = uobj_get_obj_read ( cq , UVERBS_OBJECT_CQ , cmd . cq_handle , file - > ucontext ) ;
2017-11-13 11:51:13 +03:00
if ( ! cq )
return - EINVAL ;
2017-11-13 11:51:19 +03:00
ret = rdma_set_cq_moderation ( cq , cmd . attr . cq_count , cmd . attr . cq_period ) ;
2017-11-13 11:51:13 +03:00
uobj_put_obj_read ( cq ) ;
return ret ;
}