linux/tools
Kuniyuki Iwashima d5e4ddaeb6 bpf: Support socket migration by eBPF.
This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.

Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc320d ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().

Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.

  - accept_queue : sock (ESTABLISHED/SYN_RECV)
  - 3WHS         : request_sock (NEW_SYN_RECV)

In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fallbacks to the random selection
  - SK_DROP, cancel the migration.

There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.

Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
..
accounting
arch - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
bootconfig
bpf tools/bpftool: Fix error return code in do_batch() 2021-06-11 15:31:09 -07:00
build perf tools changes for v5.13: 1st batch 2021-05-01 12:22:38 -07:00
cgroup tools/cgroup/slabinfo.py: updated to work on current kernel 2021-04-23 14:42:40 -07:00
debugging
edid
firewire
firmware
gpio
hv
iio
include bpf: Support socket migration by eBPF. 2021-06-15 18:01:06 +02:00
io_uring
kvm/kvm_stat
laptop
leds
lib libbpf: Set NLM_F_EXCL when creating qdisc 2021-06-15 14:00:30 +02:00
memory-model
objtool Objtool updates in this cycle were: 2021-04-28 12:53:24 -07:00
pci
pcmcia
perf perf tools changes for v5.13: 1st batch 2021-05-01 12:22:38 -07:00
power Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2021-05-05 12:24:29 -07:00
scripts tools: disable -Wno-type-limits 2021-05-06 19:24:11 -07:00
spi
testing selftests, bpf: Make docs tests fail more reliably 2021-06-08 22:04:35 +02:00
thermal/tmon tools: do not include scripts/Kbuild.include 2021-04-25 05:26:13 +09:00
time
tracing
usb treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
virtio
vm
wmi
Makefile