linux/net
David Howells 3b5bac2bde RxRPC: Fix a potential deadlock between the call resend_timer and state_lock
RxRPC can potentially deadlock as rxrpc_resend_time_expired() wants to get
call->state_lock so that it can alter the state of an RxRPC call.  However, its
caller (call_timer_fn()) has an apparent lock on the timer struct.

The problem is that rxrpc_resend_time_expired() isn't permitted to lock
call->state_lock as this could cause a deadlock against rxrpc_send_abort() as
that takes state_lock and then attempts to delete the resend timer by calling
del_timer_sync().

The deadlock can occur because del_timer_sync() will sit there forever waiting
for rxrpc_resend_time_expired() to return, but the latter may then wait for
call->state_lock, which rxrpc_send_abort() holds around del_timer_sync()...

This leads to a warning appearing in the kernel log that looks something like
the attached.

It should be sufficient to simply dispense with the locks.  It doesn't matter
if we set the resend timer expired event bit and queue the event processor
whilst we're changing state to one where the resend timer is irrelevant as the
event can just be ignored by the processor thereafter.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35-rc3-cachefs+ #115
-------------------------------------------------------
swapper/0 is trying to acquire lock:
 (&call->state_lock){++--..}, at: [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]

but task is already holding lock:
 (&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&call->resend_timer){+.-...}:
       [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
       [<ffffffff81056184>] lock_acquire+0x57/0x6d
       [<ffffffff8103bb9c>] del_timer_sync+0x3c/0x86
       [<ffffffffa002bb7a>] rxrpc_send_abort+0x50/0x97 [af_rxrpc]
       [<ffffffffa002bdd9>] rxrpc_kernel_abort_call+0xa1/0xdd [af_rxrpc]
       [<ffffffffa0061588>] afs_deliver_to_call+0x129/0x368 [kafs]
       [<ffffffffa006181b>] afs_process_async_call+0x54/0xff [kafs]
       [<ffffffff8104261d>] worker_thread+0x1ef/0x2e2
       [<ffffffff81045f47>] kthread+0x7a/0x82
       [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10

-> #0 (&call->state_lock){++--..}:
       [<ffffffff81055237>] validate_chain+0x727/0xd23
       [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
       [<ffffffff81056184>] lock_acquire+0x57/0x6d
       [<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43
       [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
       [<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5
       [<ffffffff81036828>] __do_softirq+0xa2/0x13e
       [<ffffffff81002dcc>] call_softirq+0x1c/0x28
       [<ffffffff810049f0>] do_softirq+0x38/0x80
       [<ffffffff810361a2>] irq_exit+0x45/0x47
       [<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96
       [<ffffffff81002893>] apic_timer_interrupt+0x13/0x20
       [<ffffffff810011ac>] cpu_idle+0x4d/0x83
       [<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1

other info that might help us debug this:

1 lock held by swapper/0:
 #0:  (&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.35-rc3-cachefs+ #115
Call Trace:
 <IRQ>  [<ffffffff81054414>] print_circular_bug+0xae/0xbd
 [<ffffffff81055237>] validate_chain+0x727/0xd23
 [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa
 [<ffffffff810539a7>] ? mark_lock+0x42f/0x51f
 [<ffffffff81056184>] lock_acquire+0x57/0x6d
 [<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
 [<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43
 [<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
 [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc]
 [<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5
 [<ffffffff8103b675>] ? run_timer_softirq+0x182/0x2a5
 [<ffffffffa002007e>] ? rxrpc_resend_time_expired+0x0/0x96 [af_rxrpc]
 [<ffffffff810367ef>] ? __do_softirq+0x69/0x13e
 [<ffffffff81036828>] __do_softirq+0xa2/0x13e
 [<ffffffff81002dcc>] call_softirq+0x1c/0x28
 [<ffffffff810049f0>] do_softirq+0x38/0x80
 [<ffffffff810361a2>] irq_exit+0x45/0x47
 [<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96
 [<ffffffff81002893>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81049de1>] ? __atomic_notifier_call_chain+0x0/0x86
 [<ffffffff8100955b>] ? mwait_idle+0x6e/0x78
 [<ffffffff81009552>] ? mwait_idle+0x65/0x78
 [<ffffffff810011ac>] cpu_idle+0x4d/0x83
 [<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-04 21:53:16 -07:00
..
9p fs/9p: destroy fid on failed remove 2010-08-02 14:28:36 -05:00
802 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
8021q vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet) 2010-07-18 15:38:44 -07:00
appletalk Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
atm atm/br2684: register notifier event for carrier signal changes. 2010-07-09 00:09:21 -07:00
ax25 net: sk_sleep() helper 2010-04-20 16:37:13 -07:00
bluetooth Bluetooth: Add __init and __exit marks to RFCOMM 2010-07-27 12:37:27 -07:00
bridge Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-08-02 22:22:46 -07:00
caif caif: precedence bug 2010-07-22 14:14:47 -07:00
can can-raw: Fix skb_orphan_try handling 2010-08-03 00:31:48 -07:00
core Revert "net: remove zap_completion_queue" 2010-08-03 00:24:04 -07:00
dcb
dccp net: dccp: fix sign bug 2010-07-18 15:07:14 -07:00
decnet net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
dsa Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-20 18:25:24 -07:00
econet econet: fix locking 2010-06-11 18:37:08 -07:00
ethernet Net: ethernet: pe2.c: fix EXPORT_SYMBOL macro code style issue 2010-07-14 18:27:09 -07:00
ieee802154 ieee802154: Fix possible NULL pointer dereference in wpan_phy_alloc 2010-05-23 23:11:07 -07:00
ipv4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-08-02 22:22:46 -07:00
ipv6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
ipx
irda net/irda: Remove unnecessary casts of private_data 2010-07-12 21:13:37 -07:00
iucv net: use __packed annotation 2010-06-03 03:21:52 -07:00
key pfkey: add severity to printk 2010-05-17 23:23:13 -07:00
l2tp net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
lapb
llc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-05-12 00:05:35 -07:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-07-29 14:47:07 -04:00
netfilter Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
netlabel net: Remove unnecessary returns from void function()s 2010-05-17 23:23:14 -07:00
netlink genetlink: use genl_register_family_with_ops() 2010-07-26 21:00:10 -07:00
netrom net: sk_sleep() helper 2010-04-20 16:37:13 -07:00
packet packet_mmap: expose hw packet timestamps to network packet capture utilities 2010-06-02 05:53:56 -07:00
phonet Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-20 18:25:24 -07:00
rds net/rds: Add missing mutex_unlock 2010-05-29 00:18:48 -07:00
rfkill Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
rose net/rose: Use GFP_ATOMIC 2010-08-01 00:32:12 -07:00
rxrpc RxRPC: Fix a potential deadlock between the call resend_timer and state_lock 2010-08-04 21:53:16 -07:00
sched sch_sfq: add sanity check for the packet length 2010-08-04 21:53:16 -07:00
sctp Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
sunrpc mm: add context argument to shrinker callback to remaining shrinkers 2010-07-21 15:33:01 +10:00
tipc tipc: Reduce footprint by un-inlining tipc_msg_* routines 2010-05-12 23:02:29 -07:00
unix drop_monitor: convert some kfree_skb call sites to consume_skb 2010-07-20 13:28:05 -07:00
wanrouter net: autoconvert trivial BKL users to private mutex 2010-07-12 20:21:47 -07:00
wimax Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2010-05-20 21:04:44 -07:00
wireless cfg80211: Update of regulatory request initiator handling 2010-07-28 16:24:01 -04:00
x25 X25: Remove bkl in sockopts 2010-05-17 17:39:28 -07:00
xfrm Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-20 18:25:24 -07:00
compat.c From abbffa2aa9bd6f8df16d0d0a102af677510d8b9a Mon Sep 17 00:00:00 2001 2010-06-03 20:03:40 -07:00
Kconfig wireless: Make COMPAT_NETLINK_MESSAGES depend upon WEXT_CORE 2010-07-26 13:13:49 -07:00
Makefile net/Makefile: conditionally descend to wireless and ieee802154 2010-06-29 15:32:43 -07:00
nonet.c
socket.c net: support time stamping in phy devices. 2010-07-18 19:15:26 -07:00
sysctl_net.c net: Remove unnecessary returns from void function()s 2010-05-17 23:23:14 -07:00
TUNABLE