linux/net
Chuck Lever 671c450b6f xprtrdma: Fix oops in Receive handler after device removal
Since v5.4, a device removal occasionally triggered this oops:

Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
Dec  2 17:13:53 manet kernel: Call Trace:
Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
is still pointing to the old ib_device, which has been freed. The
only way that is possible is if this rpcrdma_rep was not destroyed
by rpcrdma_ia_remove.

Debugging showed that was indeed the case: this rpcrdma_rep was
still in use by a completing RPC at the time of the device removal,
and thus wasn't on the rep free list. So, it was not found by
rpcrdma_reps_destroy().

The fix is to introduce a list of all rpcrdma_reps so that they all
can be found when a device is removed. That list is used to perform
only regbuf DMA unmapping, replacing that call to
rpcrdma_reps_destroy().

Meanwhile, to prevent corruption of this list, I've moved the
destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
protecting the rb_all_reps list.

Fixes: b0b227f071 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-14 13:30:24 -05:00
..
6lowpan
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
8021q vlan: vlan_changelink() should propagate errors 2020-01-07 13:35:14 -08:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-22 16:27:24 -08:00
ax25 net: use helpers to change sk_ack_backlog 2019-11-06 16:14:48 -08:00
batman-adv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
bluetooth compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
bpf treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
bpfilter
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2019-12-26 13:11:40 -08:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-02 13:54:56 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-08 11:52:02 +01:00
ceph libceph, rbd, ceph: convert to use the new mount API 2019-11-27 22:28:37 +01:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 09:54:33 -08:00
dcb
dccp treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
decnet net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
dns_resolver
dsa net: dsa: ksz: use common define for tag len 2019-12-20 21:06:49 -08:00
ethernet net: add annotations on hh->hh_len lockless accesses 2019-11-07 20:07:30 -08:00
hsr hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() 2019-12-30 20:36:27 -08:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-02 13:54:56 -07:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2020-01-08 15:22:41 -08:00
ipv6 sit: do not confirm neighbor when do pmtu update 2019-12-24 22:28:55 -08:00
iucv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key
l2tp net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-04 12:27:12 -08:00
l3mdev
lapb
llc llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) 2019-12-20 21:19:36 -08:00
mac80211 mac80211: Turn AQL into an NL80211_EXT_FEATURE 2019-12-13 10:34:04 +01:00
mac802154
mpls net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
ncsi net/ncsi: Disable global multicast filter 2019-09-19 18:04:40 -07:00
netfilter netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present 2020-01-08 23:31:46 +01:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
netrom net: core: add generic lockdep keys 2019-10-24 14:53:48 -07:00
nfc net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive() 2019-12-18 11:57:33 -08:00
nsh
openvswitch treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
packet af_packet: set defaule value for tmo 2019-12-09 14:30:19 -08:00
phonet net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
psample net: psample: fix skb_over_panic 2019-11-26 14:40:13 -08:00
qrtr net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue 2020-01-05 14:46:05 -08:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 21:51:42 -08:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2019-12-16 10:15:49 +01:00
rose net: use helpers to change sk_ack_backlog 2019-11-06 16:14:48 -08:00
rxrpc RxRPC fixes 2019-12-24 16:12:47 -08:00
sched net: sch_prio: When ungrafting, replace with FIFO 2020-01-08 12:45:53 -08:00
sctp sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY 2020-01-06 13:28:37 -08:00
smc net/smc: unregister ib devices in reboot_event 2019-12-20 21:31:19 -08:00
strparser
sunrpc xprtrdma: Fix oops in Receive handler after device removal 2020-01-14 13:30:24 -05:00
switchdev
tipc tipc: fix wrong connect() return code 2020-01-08 15:57:35 -08:00
tls net/tls: Fix return values to avoid ENOTSUPP 2019-12-06 20:15:39 -08:00
unix treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
vmw_vsock vsock/virtio: add WARN_ON check on virtio_transport_get_ops() 2019-12-16 16:07:12 -08:00
wimax
wireless cfg80211: fix double-free after changing network namespace 2019-12-13 10:08:09 +01:00
x25 net/x25: add new state X25_STATE_5 2019-12-09 10:28:43 -08:00
xdp xsk: Add rcu_read_lock around the XSK wakeup 2019-12-19 16:20:48 +01:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2019-11-25 20:02:57 -08:00
compat.c y2038: socket: use __kernel_old_timespec instead of timespec 2019-11-15 14:38:29 +01:00
Kconfig net: Fix Kconfig indentation, continued 2019-11-21 12:00:21 -08:00
Makefile
socket.c io_uring-5.5-20191212 2019-12-13 14:24:54 -08:00
sysctl_net.c