linux/net/core
Daniel Borkmann 109980b894 bpf: don't select potentially stale ri->map from buggy xdp progs
We can potentially run into a couple of issues with the XDP
bpf_redirect_map() helper. The ri->map in the per CPU storage
can become stale in several ways, mostly due to misuse, where
we can then trigger a use after free on the map:

i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT
and running on a driver not supporting XDP_REDIRECT yet. The
ri->map on that CPU becomes stale when the XDP program is unloaded
on the driver, and a prog B loaded on a different driver which
supports XDP_REDIRECT return code. prog B would have to omit
calling to bpf_redirect_map() and just return XDP_REDIRECT, which
would then access the freed map in xdp_do_redirect() since not
cleared for that CPU.

ii) prog A is calling bpf_redirect_map(), returning a code other
than XDP_REDIRECT. prog A is then detached, which triggers release
of the map. prog B is attached which, similarly as in i), would
just return XDP_REDIRECT without having called bpf_redirect_map()
and thus be accessing the freed map in xdp_do_redirect() since
not cleared for that CPU.

iii) prog A is attached to generic XDP, calling the bpf_redirect_map()
helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is
currently not handling ri->map (will be fixed by Jesper), so it's
not being reset. Later loading a e.g. native prog B which would,
say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would
find in xdp_do_redirect() that a map was set and uses that causing
use after free on map access.

Fix thus needs to avoid accessing stale ri->map pointers, naive
way would be to call a BPF function from drivers that just resets
it to NULL for all XDP return codes but XDP_REDIRECT and including
XDP_REDIRECT for drivers not supporting it yet (and let ri->map
being handled in xdp_do_generic_redirect()). There is a less
intrusive way w/o letting drivers call a reset for each BPF run.

The verifier knows we're calling into bpf_xdp_redirect_map()
helper, so it can do a small insn rewrite transparent to the prog
itself in the sense that it fills R4 with a pointer to the own
bpf_prog. We have that pointer at verification time anyway and
R4 is allowed to be used as per calling convention we scratch
R0 to R5 anyway, so they become inaccessible and program cannot
read them prior to a write. Then, the helper would store the prog
pointer in the current CPUs struct redirect_info. Later in
xdp_do_*_redirect() we check whether the redirect_info's prog
pointer is the same as passed xdp_prog pointer, and if that's
the case then all good, since the prog holds a ref on the map
anyway, so it is always valid at that point in time and must
have a reference count of at least 1. If in the unlikely case
they are not equal, it means we got a stale pointer, so we clear
and bail out right there. Also do reset map and the owning prog
in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and
bpf_xdp_redirect() won't get mixed up, only the last call should
take precedence. A tc bpf_redirect() doesn't use map anywhere
yet, so no need to clear it there since never accessed in that
layer.

Note that in case the prog is released, and thus the map as
well we're still under RCU read critical section at that time
and have preemption disabled as well. Once we commit with the
__dev_map_insert_ctx() from xdp_do_redirect_map() and set the
map to ri->map_to_flush, we still wait for a xdp_do_flush_map()
to finish in devmap dismantle time once flush_needed bit is set,
so that is fine.

Fixes: 97f91a7cf0 ("bpf: add bpf_redirect_map helper routine")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08 20:58:09 -07:00
..
datagram.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
dev_addr_lists.c
dev_ioctl.c net: check dev->addr_len for dev_set_mac_address() 2017-07-29 11:25:05 -07:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
devlink.c devlink: Add IPv6 header for dpipe 2017-08-31 14:42:19 -07:00
drop_monitor.c drop_monitor: use setup_timer 2017-03-12 23:47:16 -07:00
dst_cache.c net: dst_cache_per_cpu_dst_set() can be static 2016-03-18 17:45:08 -04:00
dst.c net: check type when freeing metadata dst 2017-08-21 10:57:38 -07:00
ethtool.c net: ethtool: add support for forward error correction modes 2017-07-29 23:23:44 -07:00
fib_notifier.c net: Add module reference to FIB notifiers 2017-09-01 20:33:42 -07:00
fib_rules.c rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
filter.c bpf: don't select potentially stale ri->map from buggy xdp progs 2017-09-08 20:58:09 -07:00
flow_dissector.c flow_dissector: Add limit for number of headers to dissect 2017-09-05 11:40:08 -07:00
gen_estimator.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
gen_stats.c net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
gro_cells.c net: Generic XDP 2017-04-25 13:33:49 -04:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c
lwt_bpf.c net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
lwtunnel.c ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
Makefile net: core: Make the FIB notification chain generic 2017-08-03 15:35:59 -07:00
neighbour.c rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
net_namespace.c net: call newid/getid without rtnl mutex held 2017-08-09 16:57:38 -07:00
net-procfs.c net-procfs: Use vsnprintf extension %phN 2017-06-04 19:52:58 -04:00
net-sysfs.c net: style cleanups 2017-08-18 22:38:47 -07:00
net-sysfs.h
net-traces.c bridge: add tracepoint in br_fdb_update 2017-08-31 11:42:41 -07:00
netclassid_cgroup.c cgroup, net_cls: iterate the fds of only the tasks which are being migrated 2017-03-22 10:32:46 -07:00
netevent.c
netpoll.c netpoll: Fix device name check in netpoll_setup() 2017-07-26 17:01:43 -07:00
netprio_cgroup.c net: break include loop netdevice.h, dsa.h, devlink.h 2017-03-28 22:46:04 -07:00
pktgen.c net: convert sk_buff.users from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c ipv4: Namespaceify tcp_max_syn_backlog knob 2016-12-29 11:38:31 -05:00
rtnetlink.c rtnelink: Move link dump consistency check out of the loop 2017-08-13 19:43:57 -07:00
scm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
secure_seq.c tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
skbuff.c udp: drop head states only when all skb references are gone 2017-09-07 20:02:39 -07:00
sock_diag.c netlink: extended ACK reporting 2017-04-13 13:58:20 -04:00
sock_reuseport.c soreuseport: use "unsigned int" in __reuseport_alloc() 2017-04-03 19:06:38 -07:00
sock.c neigh: increase queue_len_bytes to match wmem_default 2017-08-29 16:10:50 -07:00
stream.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
sysctl_net_core.c net: move somaxconn init from sysctl code 2017-05-25 13:12:17 -04:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
utils.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00