linux/net
John Fastabend 144748eb0c bpf, sockmap: Fix incorrect fwd_alloc accounting
Incorrect accounting fwd_alloc can result in a warning when the socket
is torn down,

 [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230
 [...]
 [18455.319543] Call Trace:
 [18455.319556]  inet_csk_destroy_sock+0xba/0x1f0
 [18455.319577]  tcp_rcv_state_process+0x1b4e/0x2380
 [18455.319593]  ? lock_downgrade+0x3a0/0x3a0
 [18455.319617]  ? tcp_finish_connect+0x1e0/0x1e0
 [18455.319631]  ? sk_reset_timer+0x15/0x70
 [18455.319646]  ? tcp_schedule_loss_probe+0x1b2/0x240
 [18455.319663]  ? lock_release+0xb2/0x3f0
 [18455.319676]  ? __release_sock+0x8a/0x1b0
 [18455.319690]  ? lock_downgrade+0x3a0/0x3a0
 [18455.319704]  ? lock_release+0x3f0/0x3f0
 [18455.319717]  ? __tcp_close+0x2c6/0x790
 [18455.319736]  ? tcp_v4_do_rcv+0x168/0x370
 [18455.319750]  tcp_v4_do_rcv+0x168/0x370
 [18455.319767]  __release_sock+0xbc/0x1b0
 [18455.319785]  __tcp_close+0x2ee/0x790
 [18455.319805]  tcp_close+0x20/0x80

This currently happens because on redirect case we do skb_set_owner_r()
with the original sock. This increments the fwd_alloc memory accounting
on the original sock. Then on redirect we may push this into the queue
of the psock we are redirecting to. When the skb is flushed from the
queue we give the memory back to the original sock. The problem is if
the original sock is destroyed/closed with skbs on another psocks queue
then the original sock will not have a way to reclaim the memory before
being destroyed. Then above warning will be thrown

  sockA                          sockB

  sk_psock_strp_read()
   sk_psock_verdict_apply()
     -- SK_REDIRECT --
     sk_psock_skb_redirect()
                                skb_queue_tail(psock_other->ingress_skb..)

  sk_close()
   sock_map_unref()
     sk_psock_put()
       sk_psock_drop()
         sk_psock_zap_ingress()

At this point we have torn down our own psock, but have the outstanding
skb in psock_other. Note that SK_PASS doesn't have this problem because
the sk_psock_drop() logic releases the skb, its still associated with
our psock.

To resolve lets only account for sockets on the ingress queue that are
still associated with the current socket. On the redirect case we will
check memory limits per 6fa9201a89, but will omit fwd_alloc accounting
until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
or received with sk_psock_skb_ingress memory will be claimed on psock_other.

Fixes: 6fa9201a89 ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370
2021-04-07 01:29:06 +02:00
..
6lowpan
9p net: 9p: advance iov on empty read 2021-03-03 16:57:59 -08:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-01-14 18:34:50 -08:00
appletalk appletalk: Fix skb allocation size in loopback case 2021-02-12 16:40:28 -08:00
atm net: atm: pppoatm: use new API for wakeup tasklet 2021-01-29 18:24:05 -08:00
ax25
batman-adv batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field 2021-04-05 15:06:03 -07:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kern 2021-02-11 14:59:01 -08:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-01-20 12:16:11 -08:00
bpfilter net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
bridge net: bridge: don't notify switchdev for local FDB addresses 2021-03-23 14:39:41 -07:00
caif net: caif: Use netif_rx_any_context(). 2021-02-15 13:21:48 -08:00
can can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE 2021-03-29 09:51:43 +02:00
ceph libceph: remove osdtimeout option entirely 2021-02-16 12:09:52 +01:00
core bpf, sockmap: Fix incorrect fwd_alloc accounting 2021-04-07 01:29:06 +02:00
dcb net: dcb: use obj-$(CONFIG_DCB) form in net/Makefile 2021-01-27 17:03:52 -08:00
dccp ipv6: weaken the v4mapped source check 2021-03-18 11:19:23 -07:00
decnet net: decnet: fix netdev refcount leaking on error path 2021-01-27 17:33:46 -08:00
dns_resolver net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
dsa net: dsa: Fix type was not set for devlink port 2021-03-29 13:49:04 -07:00
ethernet
ethtool ethtool: fix the check logic of at least one channel for RX/TX 2021-02-28 11:49:07 -08:00
hsr net: hsr: add support for EntryForgetTime 2021-02-25 09:41:51 -08:00
ieee802154
ife net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
ipv4 net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...); 2021-04-01 15:50:50 -07:00
ipv6 net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind() 2021-04-05 12:56:52 -07:00
iucv net/af_iucv: build SG skbs for TRANS_HIPER sockets 2021-01-28 20:36:22 -08:00
kcm net: group skb_shinfo zerocopy related bits together. 2021-01-07 16:08:37 -08:00
key af_key: relax availability checks for skb size calculation 2021-01-04 10:05:50 +01:00
l2tp net: l2tp: reduce log level of messages in receive path, add counter instead 2021-03-03 16:55:02 -08:00
l3mdev net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile 2021-01-27 17:03:52 -08:00
lapb net: lapb: Copy the skb before sending a packet 2021-02-02 08:40:48 -08:00
llc net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
mac80211 mac80211: choose first enabled channel for monitor 2021-03-16 21:20:47 +01:00
mac802154
mpls net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 2021-03-09 16:12:20 -08:00
mptcp mptcp: revert "mptcp: provide subflow aware release function" 2021-04-01 16:02:50 -07:00
ncsi net/ncsi: Avoid channel_monitor hrtimer deadlock 2021-03-30 13:16:23 -07:00
netfilter netfilter: nftables: skip hook overlap logic if flowtable is stale 2021-03-18 01:08:54 +01:00
netlabel cipso,calipso: resolve a number of problems with the DOI refcounts 2021-03-04 15:26:57 -08:00
netlink mptcp: avoid lock_fast usage in accept path 2021-02-12 16:31:46 -08:00
netrom
nfc nfc: Avoid endless loops caused by repeated llcp_sock_connect() 2021-03-25 17:02:01 -07:00
nsh
openvswitch openvswitch: fix send of uninitialized stack memory in ct limit reply 2021-04-05 12:54:42 -07:00
packet net/packet: Improve the comment about LL header visibility criteria 2021-02-06 14:59:28 -08:00
phonet
psample net: psample: Fix netlink skb length with tunnel info 2021-02-25 09:49:46 -08:00
qrtr net: qrtr: Fix memory leak on qrtr_tx_wait failure 2021-03-30 13:48:29 -07:00
rds net/rds: Fix a use after free in rds_message_map_pages 2021-03-31 14:26:56 -07:00
rfkill rfkill: add a reason to the HW rfkill state 2020-12-11 12:47:17 +01:00
rose
rxrpc rxrpc: Fix dependency on IPv6 in udp tunnel config 2021-02-12 16:42:05 -08:00
sched net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb 2021-04-02 14:14:22 -07:00
sctp net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind() 2021-04-05 12:56:52 -07:00
smc net/smc: use memcpy instead of snprintf to avoid out of bounds read 2021-01-12 20:22:01 -08:00
strparser
sunrpc Miscellaneous NFSD fixes for v5.12-rc. 2021-03-16 10:22:50 -07:00
switchdev net: bridge: propagate extack through switchdev_port_attr_set 2021-02-14 17:38:11 -08:00
tipc tipc: increment the tmp aead refcnt before attaching it 2021-04-06 16:25:34 -07:00
tls net/tls: Select SOCK_RX_QUEUE_MAPPING from TLS_DEVICE 2021-02-11 19:08:06 -08:00
unix af_unix: handle idmapped mounts 2021-01-24 14:27:18 +01:00
vmw_vsock selinux: vsock: Set SID for socket returned by accept() 2021-03-19 13:46:55 -07:00
wireless wireless/nl80211: fix wdev_id may be used uninitialized 2021-03-16 21:20:47 +01:00
x25 net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp xsk: Fold xp_assign_dev and __xp_assign_dev 2021-01-25 23:56:33 +01:00
xfrm xfrm/compat: Cleanup WARN()s that can be user-triggered 2021-03-30 07:29:09 +02:00
compat.c
devres.c
Kconfig net/sock: Add kernel config SOCK_RX_QUEUE_MAPPING 2021-02-11 19:08:06 -08:00
Makefile net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile 2021-01-27 17:03:52 -08:00
socket.c io_uring-worker.v3-2021-02-25 2021-02-27 08:29:02 -08:00
sysctl_net.c