linux/net/core
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
..
datagram.c udp: properly cope with csum errors 2017-02-07 11:19:00 -05:00
dev_addr_lists.c
dev_ioctl.c
dev.c net: net_enable_timestamp() can be called from irq contexts 2017-03-01 20:55:57 -08:00
devlink.c devlink: allow to fillup eswitch attrs even if mode_get op does not exist 2017-02-10 14:43:00 -05:00
drop_monitor.c drop_monitor: consider inserted data in genlmsg_end 2017-01-03 11:09:44 -05:00
dst_cache.c net: dst_cache_per_cpu_dst_set() can be static 2016-03-18 17:45:08 -04:00
dst.c net: pending_confirm is not used anymore 2017-02-07 13:07:47 -05:00
ethtool.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
fib_rules.c net: core: add missing check for uid_range in rule_exists. 2016-11-09 13:28:10 -05:00
filter.c bpf: Fix bpf_xdp_event_output 2017-02-23 13:53:42 -05:00
flow_dissector.c flow dissector: check if arp_eth is null rather than arp 2017-01-16 13:48:48 -05:00
flow.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
gen_estimator.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
gen_stats.c net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
gro_cells.c gro_cells: move to net/core/gro_cells.c 2017-02-08 14:38:18 -05:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
lwt_bpf.c lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
lwtunnel.c lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
Makefile gro_cells: move to net/core/gro_cells.c 2017-02-08 14:38:18 -05:00
neighbour.c net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification 2017-02-15 12:38:43 -05:00
net_namespace.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
net-procfs.c net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
net-sysfs.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
net-sysfs.h
net-traces.c net: IPv6 fib lookup tracepoint 2015-11-22 11:54:10 -05:00
netclassid_cgroup.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c netpoll: more efficient locking 2016-11-16 18:32:02 -05:00
netprio_cgroup.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
pktgen.c net-tc: convert tc_verd to integer bitfields 2017-01-08 20:58:52 -05:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c ipv4: Namespaceify tcp_max_syn_backlog knob 2016-12-29 11:38:31 -05:00
rtnetlink.c rtnl: simplify error return path in rtnl_create_link() 2017-02-21 12:17:43 -05:00
scm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
secure_seq.c secure_seq: fix sparse errors 2017-01-12 15:57:10 -05:00
skbuff.c net: fix socket refcounting in skb_complete_tx_timestamp() 2017-03-07 14:06:15 -08:00
sock_diag.c sock_diag: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
sock_reuseport.c soreuseport: do not export reuseport_add_sock() 2016-10-18 14:18:23 -04:00
sock.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
stream.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
sysctl_net_core.c bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
utils.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00