039f0e054a
228 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Jakub Kicinski
|
b3fce974d4 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== bpf-next 2022-07-22 We've added 73 non-merge commits during the last 12 day(s) which contain a total of 88 files changed, 3458 insertions(+), 860 deletions(-). The main changes are: 1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai. 2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel syscalls through kprobe mechanism, from Andrii Nakryiko. 3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function, from Song Liu & Jiri Olsa. 4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi. 5) Add a ksym BPF iterator to allow for more flexible and efficient interactions with kernel symbols, from Alan Maguire. 6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter. 7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov. 8) libbpf support for writing custom perf event readers, from Jon Doron. 9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar. 10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski. 11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin. 12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev. 13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui. 14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong. 15) Make non-prealloced BPF map allocations low priority to play better with memcg limits, from Yafang Shao. 16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao. 17) Various smaller cleanups and improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits) bpf: Simplify bpf_prog_pack_[size|mask] bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch) bpf, x64: Allow to use caller address from stack ftrace: Allow IPMODIFY and DIRECT ops on the same function ftrace: Add modify_ftrace_direct_multi_nolock bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF selftests/bpf: Fix test_verifier failed test in unprivileged mode selftests/bpf: Add negative tests for new nf_conntrack kfuncs selftests/bpf: Add tests for new nf_conntrack kfuncs selftests/bpf: Add verifier tests for trusted kfunc args net: netfilter: Add kfuncs to set and change CT status net: netfilter: Add kfuncs to set and change CT timeout net: netfilter: Add kfuncs to allocate and insert CT net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup bpf: Add documentation for kfuncs bpf: Add support for forcing kfunc args to be trusted bpf: Switch to new kfunc flags infrastructure tools/resolve_btfids: Add support for 8-byte BTF sets bpf: Introduce 8-byte BTF set ... ==================== Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jiri Olsa
|
316cba62df |
bpf, x64: Allow to use caller address from stack
Currently we call the original function by using the absolute address given at the JIT generation. That's not usable when having trampoline attached to multiple functions, or the target address changes dynamically (in case of live patch). In such cases we need to take the return address from the stack. Adding support to retrieve the original function address from the stack by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline function. Basically we take the return address of the 'fentry' call: function + 0: call fentry # stores 'function + 5' address on stack function + 5: ... The 'function + 5' address will be used as the address for the original function to call. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220720002126.803253-4-song@kernel.org |
||
Jakub Kicinski
|
816cd16883 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h |
||
Song Liu
|
1d5f82d9dd |
bpf, x86: fix freeing of not-finalized bpf_prog_pack
syzbot reported a few issues with bpf_prog_pack [1], [2]. This only happens
with multiple subprogs. In jit_subprogs(), we first call bpf_int_jit_compile()
on each sub program. And then, we call it on each sub program again. jit_data
is not freed in the first call of bpf_int_jit_compile(). Similarly we don't
call bpf_jit_binary_pack_finalize() in the first call of bpf_int_jit_compile().
If bpf_int_jit_compile() failed for one sub program, we will call
bpf_jit_binary_pack_finalize() for this sub program. However, we don't have a
chance to call it for other sub programs. Then we will hit "goto out_free" in
jit_subprogs(), and call bpf_jit_free on some subprograms that haven't got
bpf_jit_binary_pack_finalize() yet.
At this point, bpf_jit_binary_pack_free() is called and the whole 2MB page is
freed erroneously.
Fix this with a custom bpf_jit_free() for x86_64, which calls
bpf_jit_binary_pack_finalize() if necessary. Also, with custom
bpf_jit_free(), bpf_prog_aux->use_bpf_prog_pack is not needed any more,
remove it.
Fixes:
|
||
Xu Kuohai
|
535a57a7ff |
bpf: Remove is_valid_bpf_tramp_flags()
Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags() to check the input flags. This check is architecture independent. So, to be consistent with x86, arm64 should also do this check before generating bpf trampoline. However, the BPF_TRAMP_F_XXX flags are not used by user code and the flags argument is almost constant at compile time, so this run time check is a bit redundant. Remove is_valid_bpf_tramp_flags() and add some comments to the usage of BPF_TRAMP_F_XXX flags, as suggested by Alexei. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com |
||
Jakub Kicinski
|
0076cad301 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Stanislav Fomichev
|
69fd337a97 |
bpf: per-cgroup lsm flavor
Allow attaching to lsm hooks in the cgroup context. Attaching to per-cgroup LSM works exactly like attaching to other per-cgroup hooks. New BPF_LSM_CGROUP is added to trigger new mode; the actual lsm hook we attach to is signaled via existing attach_btf_id. For the hooks that have 'struct socket' or 'struct sock' as its first argument, we use the cgroup associated with that socket. For the rest, we use 'current' cgroup (this is all on default hierarchy == v2 only). Note that for some hooks that work on 'struct sock' we still take the cgroup from 'current' because some of them work on the socket that hasn't been properly initialized yet. Behind the scenes, we allocate a shim program that is attached to the trampoline and runs cgroup effective BPF programs array. This shim has some rudimentary ref counting and can be shared between several programs attaching to the same lsm hook from different cgroups. Note that this patch bloats cgroup size because we add 211 cgroup_bpf_attach_type(s) for simplicity sake. This will be addressed in the subsequent patch. Also note that we only add non-sleepable flavor for now. To enable sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu, shim programs have to be freed via trace rcu, cgroup_bpf.effective should be also trace-rcu-managed + maybe some other changes that I'm not aware of. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Peter Zijlstra
|
d77cfe594a |
x86/bpf: Use alternative RET encoding
Use the return thunk in eBPF generated code, if needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
Peter Zijlstra
|
369ae6ffc4 |
x86/retpoline: Cleanup some #ifdefery
On it's own not much of a cleanup but it prepares for more/similar code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
Tony Ambardar
|
95acd8817e |
bpf, x64: Add predicate for bpf2bpf with tailcalls support in JIT
The BPF core/verifier is hard-coded to permit mixing bpf2bpf and tail calls for only x86-64. Change the logic to instead rely on a new weak function 'bool bpf_jit_supports_subprog_tailcalls(void)', which a capable JIT backend can override. Update the x86-64 eBPF JIT to reflect this. Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> [jakub: drop MIPS bits and tweak patch subject] Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220617105735.733938-2-jakub@cloudflare.com |
||
Jakub Sitnicki
|
ff672c67ee |
bpf, x86: Fix tail call count offset calculation on bpf2bpf call
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
int entry(struct __sk_buff * skb):
0: (b4) w2 = 0
1: (73) *(u8 *)(r10 -1) = r2
2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F
3: (95) exit
int entry(struct __sk_buff * skb):
0xffffffffa0201788: nop DWORD PTR [rax+rax*1+0x0]
0xffffffffa020178d: xor eax,eax
0xffffffffa020178f: push rbp
0xffffffffa0201790: mov rbp,rsp
0xffffffffa0201793: sub rsp,0x8
0xffffffffa020179a: push rax
0xffffffffa020179b: xor esi,esi
0xffffffffa020179d: mov BYTE PTR [rbp-0x1],sil
0xffffffffa02017a1: mov rax,QWORD PTR [rbp-0x9] !!! tail call count
0xffffffffa02017a8: call 0xffffffffa02017d8 !!! is at rbp-0x10
0xffffffffa02017ad: leave
0xffffffffa02017ae: ret
Fix it by rounding up the BPF stack depth to a multiple of 8, when
calculating the tail call count offset on stack.
Fixes:
|
||
Song Liu
|
fe736565ef |
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
Introduce bpf_arch_text_invalidate and use it to fill unused part of the bpf_prog_pack with illegal instructions when a BPF program is freed. Fixes: |
||
Kui-Feng Lee
|
2fcc82411e |
bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Pass a cookie along with BPF_LINK_CREATE requests. Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie. The cookie of a bpf_tracing_link is available by calling bpf_get_attach_cookie when running the BPF program of the attached link. The value of a cookie will be set at bpf_tramp_run_ctx by the trampoline of the link. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com |
||
Kui-Feng Lee
|
e384c7b7b4 |
bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack
BPF trampolines will create a bpf_tramp_run_ctx, a bpf_run_ctx, on stacks and set/reset the current bpf_run_ctx before/after calling a bpf_prog. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-3-kuifeng@fb.com |
||
Kui-Feng Lee
|
f7e0beaf39 |
bpf, x86: Generate trampolines from bpf_tramp_links
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect struct bpf_tramp_link(s) for a trampoline. struct bpf_tramp_link extends bpf_link to act as a linked list node. arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to collects all bpf_tramp_link(s) that a trampoline should call. Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links instead of bpf_tramp_progs. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com |
||
Peter Zijlstra
|
be8a096521 |
x86,bpf: Avoid IBT objtool warning
Clang can inline emit_indirect_jump() and then folds constants, which results in: | vmlinux.o: warning: objtool: emit_bpf_dispatcher()+0x6a4: relocation to !ENDBR: .text.__x86.indirect_thunk+0x40 | vmlinux.o: warning: objtool: emit_bpf_dispatcher()+0x67d: relocation to !ENDBR: .text.__x86.indirect_thunk+0x40 | vmlinux.o: warning: objtool: emit_bpf_tail_call_indirect()+0x386: relocation to !ENDBR: .text.__x86.indirect_thunk+0x20 | vmlinux.o: warning: objtool: emit_bpf_tail_call_indirect()+0x35d: relocation to !ENDBR: .text.__x86.indirect_thunk+0x20 Suppress the optimization such that it must emit a code reference to the __x86_indirect_thunk_array[] base. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lkml.kernel.org/r/20220405075531.GB30877@worktop.programming.kicks-ass.net |
||
Linus Torvalds
|
7001052160 |
Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz beB/7u2KIUbKEkSN =jZfK -----END PGP SIGNATURE----- Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra: "Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself" [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html * tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) kvm/emulate: Fix SETcc emulation for ENDBR x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0 x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0 kbuild: Fixup the IBT kbuild changes x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy x86: Remove toolchain check for X32 ABI capability x86/alternative: Use .ibt_endbr_seal to seal indirect calls objtool: Find unused ENDBR instructions objtool: Validate IBT assumptions objtool: Add IBT/ENDBR decoding objtool: Read the NOENDBR annotation x86: Annotate idtentry_df() x86,objtool: Move the ASM_REACHABLE annotation to objtool.h x86: Annotate call_on_stack() objtool: Rework ASM_REACHABLE x86: Mark __invalid_creds() __noreturn exit: Mark do_group_exit() __noreturn x86: Mark stop_this_cpu() __noreturn objtool: Ignore extra-symbol code objtool: Rename --duplicate to --lto ... |
||
Jakub Kicinski
|
0db8640df5 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Hou Tao
|
73e14451f3 |
bpf, x86: Fall back to interpreter mode when extra pass fails
Extra pass for subprog jit may fail (e.g. due to bpf_jit_harden race), but bpf_func is not cleared for the subprog and jit_subprogs will succeed. The running of the bpf program may lead to oops because the memory for the jited subprog image has already been freed. So fall back to interpreter mode by clearing bpf_func/jited/jited_len when extra pass fails. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220309123321.2400262-2-houtao1@huawei.com |
||
Peter Zijlstra
|
5891271055 |
x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline
With IBT enabled builds we need ENDBR instructions at indirect jump target sites, since we start execution of the JIT'ed code through an indirect jump, the very first instruction needs to be ENDBR. Similarly, since eBPF tail-calls use indirect branches, their landing site needs to be an ENDBR too. The trampolines need similar adjustment. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Fixed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.464998838@infradead.org |
||
Jakub Kicinski
|
1e8a3f0d2a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c commit |
||
Song Liu
|
676b2daaba |
bpf, x86: Set header->size properly before freeing it
On do_jit failure path, the header is freed by bpf_jit_binary_pack_free. While bpf_jit_binary_pack_free doesn't require proper ro_header->size, bpf_prog_pack_free still uses it. Set header->size in bpf_int_jit_compile before calling bpf_jit_binary_pack_free. Fixes: |
||
Peter Zijlstra (Intel)
|
d45476d983 |
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
The RETPOLINE_AMD name is unfortunate since it isn't necessarily AMD only, in fact Hygon also uses it. Furthermore it will likely be sufficient for some Intel processors. Therefore rename the thing to RETPOLINE_LFENCE to better describe what it is. Add the spectre_v2=retpoline,lfence option as an alias to spectre_v2=retpoline,amd to preserve existing setups. However, the output of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed. [ bp: Fix typos, massage. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> |
||
Song Liu
|
f95f768f0a |
bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
Instead of BUG_ON(), fail gracefully and return orig_prog.
Fixes:
|
||
Song Liu
|
1022a5498f |
bpf, x86_64: Use bpf_jit_binary_pack_alloc
Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes the program to the rw buffer. When the jit is done, the program is copied to the final location with bpf_jit_binary_pack_finalize. Note that we need to do bpf_tail_call_direct_fixup after finalize. Therefore, the text_live = false logic in __bpf_arch_text_poke is no longer needed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-10-song@kernel.org |
||
Song Liu
|
ebc1415d9b |
bpf: Introduce bpf_arch_text_copy
This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-7-song@kernel.org |
||
Hou Tao
|
b6ec79518e |
bpf, x86: Remove unnecessary handling of BPF_SUB atomic op
According to the LLVM commit (https://reviews.llvm.org/D72184), sync_fetch_and_sub() is implemented as a negation followed by sync_fetch_and_add(), so there will be no BPF_SUB op, thus just remove it. BPF_SUB is also rejected by the verifier anyway. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20220127083240.1425481-1-houtao1@huawei.com |
||
Linus Torvalds
|
64ad946152 |
- Get rid of all the .fixup sections because this generates
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and LIVEPATCH as the backtrace misses the function which is being fixed up. - Add Straight Light Speculation mitigation support which uses a new compiler switch -mharden-sls= which sticks an INT3 after a RET or an indirect branch in order to block speculation after them. Reportedly, CPUs do speculate behind such insns. - The usual set of cleanups and improvements -----BEGIN PGP SIGNATURE----- iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHfKA0ACgkQEsHwGGHe VUqLJg/2I2X2xXr5filJVaK+sQgmvDzk67DKnbxRBW2xcPF+B5sSW5yhe3G5UPW7 SJVdhQ3gHcTiliGGlBf/VE7KXbqxFN0vO4/VFHZm78r43g7OrXTxz6WXXQRJ1n67 U3YwRH3b6cqXZNFMs+X4bJt6qsGJM1kdTTZ2as4aERnaFr5AOAfQvfKbyhxLe/XA 3SakfYISVKCBQ2RkTfpMpwmqlsatGFhTC5IrvuDQ83dDsM7O+Dx1J6Gu3fwjKmie iVzPOjCh+xTpZQp/SIZmt7MzoduZvpSym4YVyHvEnMiexQT4AmyaRthWqrhnEXY/ qOvj8/XIqxmix8EaooGqRIK0Y2ZegxkPckNFzaeC3lsWohwMIGIhNXwHNEeuhNyH yvNGAW9Cq6NeDRgz5MRUXcimYw4P4oQKYLObS1WqFZhNMqm4sNtoEAYpai/lPYfs zUDckgXF2AoPOsSqy3hFAVaGovAgzfDaJVzkt0Lk4kzzjX2WQiNLhmiior460w+K 0l2Iej58IajSp3MkWmFH368Jo8YfUVmkjbbpsmjsBppA08e1xamJB7RmswI/Ezj6 s5re6UioCD+UYdjWx41kgbvYdvIkkZ2RLrktoZd/hqHrOLWEIiwEbyFO2nRFJIAh YjvPkB1p7iNuAeYcP1x9Ft9GNYVIsUlJ+hK86wtFCqy+abV+zQ== =R52z -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Get rid of all the .fixup sections because this generates misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and LIVEPATCH as the backtrace misses the function which is being fixed up. - Add Straight Line Speculation mitigation support which uses a new compiler switch -mharden-sls= which sticks an INT3 after a RET or an indirect branch in order to block speculation after them. Reportedly, CPUs do speculate behind such insns. - The usual set of cleanups and improvements * tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/entry_32: Fix segment exceptions objtool: Remove .fixup handling x86: Remove .fixup section x86/word-at-a-time: Remove .fixup usage x86/usercopy: Remove .fixup usage x86/usercopy_32: Simplify __copy_user_intel_nocache() x86/sgx: Remove .fixup usage x86/checksum_32: Remove .fixup usage x86/vmx: Remove .fixup usage x86/kvm: Remove .fixup usage x86/segment: Remove .fixup usage x86/fpu: Remove .fixup usage x86/xen: Remove .fixup usage x86/uaccess: Remove .fixup usage x86/futex: Remove .fixup usage x86/msr: Remove .fixup usage x86/extable: Extend extable functionality x86/entry_32: Remove .fixup usage x86/entry_64: Remove .fixup usage x86/copy_mc_64: Remove .fixup usage ... |
||
David S. Miller
|
e63a023489 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Jakub Kicinski
|
7cd2802d74 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Alexei Starovoitov
|
588a25e924 |
bpf: Fix extable address check.
The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL,
but it cannot distinguish IS_ERR pointer from valid one.
When offset is added to IS_ERR pointer it may become small positive
value which is a user address that is not handled by extable logic
and has to be checked for at the runtime.
Tighten BPF_PROBE_MEM pointer check code to prevent this case.
Fixes:
|
||
Alexei Starovoitov
|
433956e912 |
bpf: Fix extable fixup offset.
The prog - start_of_ldx is the offset before the faulting ldx to the location
after it, so this will be used to adjust pt_regs->ip for jumping over it and
continuing, and with old temp it would have been fixed up to the wrong offset,
causing crash.
Fixes:
|
||
Jiri Olsa
|
f92c1e1836 |
bpf: Add get_func_[arg|ret|arg_cnt] helpers
Adding following helpers for tracing programs: Get n-th argument of the traced function: long bpf_get_func_arg(void *ctx, u32 n, u64 *value) Get return value of the traced function: long bpf_get_func_ret(void *ctx, u64 *value) Get arguments count of the traced function: long bpf_get_func_arg_cnt(void *ctx) The trampoline now stores number of arguments on ctx-8 address, so it's easy to verify argument index and find return value argument's position. Moving function ip address on the trampoline stack behind the number of functions arguments, so it's now stored on ctx-16 address if it's needed. All helpers above are inlined by verifier. Also bit unrelated small change - using newly added function bpf_prog_has_trampoline in check_get_func_ip. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org |
||
Jiri Olsa
|
5edf6a1983 |
bpf, x64: Replace some stack_size usage with offset variables
As suggested by Andrii, adding variables for registers and ip address offsets, which makes the code more clear, rather than abusing single stack_size variable for everything. Also describing the stack layout in the comment. There is no function change. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211208193245.172141-4-jolsa@kernel.org |
||
Peter Zijlstra
|
4b5305decc |
x86/extable: Extend extable functionality
In order to remove further .fixup usage, extend the extable infrastructure to take additional information from the extable entry sites. Specifically add _ASM_EXTABLE_TYPE_REG() and EX_TYPE_IMM_REG that extend the existing _ASM_EXTABLE_TYPE() by taking an additional register argument and encoding that and an s16 immediate into the existing s32 type field. This limits the actual types to the first byte, 255 seem plenty. Also add a few flags into the type word, specifically CLEAR_AX and CLEAR_DX which clear the return and extended return register. Notes: - due to the % in our register names it's hard to make it more generally usable as arm64 did. - the s16 is far larger than used in these patches, future extentions can easily shrink this to get more bits. - without the bitfield fix this will not compile, because: 0xFF > -1 and we can't even extract the TYPE field. [nathanchance: Build fix for clang-lto builds: https://lkml.kernel.org/r/20211210234953.3420108-1-nathan@kernel.org ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211110101325.303890153@infradead.org |
||
Christoph Hellwig
|
58ffa1b413 |
x86, bpf: Cleanup the top of file header in bpf_jit_comp.c
Don't bother mentioning the file name as it is implied, and remove the reference to internal BPF. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211119163215.971383-2-hch@lst.de |
||
Tiezhu Yang
|
ebf7f6f0a6 |
bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
In the current code, the actual max tail call count is 33 which is greater than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance. We can see the historical evolution from commit |
||
Linus Torvalds
|
fc02cb2b37 |
Core:
- Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD +qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld 6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde 0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr 84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE= =srde -----END PGP SIGNATURE----- Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ... |
||
Linus Torvalds
|
8cb1ae19bf |
x86/fpu updates:
- Cleanup of extable fixup handling to be more robust, which in turn allows to make the FPU exception fixups more robust as well. - Change the return code for signal frame related failures from explicit error codes to a boolean fail/success as that's all what the calling code evaluates. - A large refactoring of the FPU code to prepare for adding AMX support: - Distangle the public header maze and remove especially the misnomed kitchen sink internal.h which is despite it's name included all over the place. - Add a proper abstraction for the register buffer storage (struct fpstate) which allows to dynamically size the buffer at runtime by flipping the pointer to the buffer container from the default container which is embedded in task_struct::tread::fpu to a dynamically allocated container with a larger register buffer. - Convert the code over to the new fpstate mechanism. - Consolidate the KVM FPU handling by moving the FPU related code into the FPU core which removes the number of exports and avoids adding even more export when AMX has to be supported in KVM. This also removes duplicated code which was of course unnecessary different and incomplete in the KVM copy. - Simplify the KVM FPU buffer handling by utilizing the new fpstate container and just switching the buffer pointer from the user space buffer to the KVM guest buffer when entering vcpu_run() and flipping it back when leaving the function. This cuts the memory requirements of a vCPU for FPU buffers in half and avoids pointless memory copy operations. This also solves the so far unresolved problem of adding AMX support because the current FPU buffer handling of KVM inflicted a circular dependency between adding AMX support to the core and to KVM. With the new scheme of switching fpstate AMX support can be added to the core code without affecting KVM. - Replace various variables with proper data structures so the extra information required for adding dynamically enabled FPU features (AMX) can be added in one place - Add AMX (Advanved Matrix eXtensions) support (finally): AMX is a large XSTATE component which is going to be available with Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD) which allows to trap the (first) use of an AMX related instruction, which has two benefits: 1) It allows the kernel to control access to the feature 2) It allows the kernel to dynamically allocate the large register state buffer instead of burdening every task with the the extra 8K or larger state storage. It would have been great to gain this kind of control already with AVX512. The support comes with the following infrastructure components: 1) arch_prctl() to - read the supported features (equivalent to XGETBV(0)) - read the permitted features for a task - request permission for a dynamically enabled feature Permission is granted per process, inherited on fork() and cleared on exec(). The permission policy of the kernel is restricted to sigaltstack size validation, but the syscall obviously allows further restrictions via seccomp etc. 2) A stronger sigaltstack size validation for sys_sigaltstack(2) which takes granted permissions and the potentially resulting larger signal frame into account. This mechanism can also be used to enforce factual sigaltstack validation independent of dynamic features to help with finding potential victims of the 2K sigaltstack size constant which is broken since AVX512 support was added. 3) Exception handling for #NM traps to catch first use of a extended feature via a new cause MSR. If the exception was caused by the use of such a feature, the handler checks permission for that feature. If permission has not been granted, the handler sends a SIGILL like the #UD handler would do if the feature would have been disabled in XCR0. If permission has been granted, then a new fpstate which fits the larger buffer requirement is allocated. In the unlikely case that this allocation fails, the handler sends SIGSEGV to the task. That's not elegant, but unavoidable as the other discussed options of preallocation or full per task permissions come with their own set of horrors for kernel and/or userspace. So this is the lesser of the evils and SIGSEGV caused by unexpected memory allocation failures is not a fundamentally new concept either. When allocation succeeds, the fpstate properties are filled in to reflect the extended feature set and the resulting sizes, the fpu::fpstate pointer is updated accordingly and the trap is disarmed for this task permanently. 4) Enumeration and size calculations 5) Trap switching via MSR_XFD The XFD (eXtended Feature Disable) MSR is context switched with the same life time rules as the FPU register state itself. The mechanism is keyed off with a static key which is default disabled so !AMX equipped CPUs have zero overhead. On AMX enabled CPUs the overhead is limited by comparing the tasks XFD value with a per CPU shadow variable to avoid redundant MSR writes. In case of switching from a AMX using task to a non AMX using task or vice versa, the extra MSR write is obviously inevitable. All other places which need to be aware of the variable feature sets and resulting variable sizes are not affected at all because they retrieve the information (feature set, sizes) unconditonally from the fpstate properties. 6) Enable the new AMX states Note, this is relatively new code despite the fact that AMX support is in the works for more than a year now. The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words... Many thanks to Chang Bae and Dave Hansen for working hard on this and also to the various test teams at Intel who reserved extra capacity to follow the rapid development of this closely which provides the confidence level required to offer this rather large update for inclusion into 5.16-rc1. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/NkITHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodDkEADH4+/nN/QoSUHIuuha5Zptj3g2b16a /3TxT9fhwPen/kzMGsUk70s3iWJMA+I5dCfkSZexJ2hfhcRe9cBzZIa1HCawKwf3 YCISTsO/M+LpeORuZ+TpfFLJKnxNr1SEOl+EYffGhq0AkCjifb9Cnr0JZuoMUzGU jpfJZ2bj28ri5lG812DtzSMBM9E3SAwgJv+GNjmZbxZKb9mAfhbAMdBUXHirX7Ej jmx6koQjYOKwYIW8w1BrdC270lUKQUyJTbQgdRkN9Mh/HnKyFixQ18JqGlgaV2cT EtYePUfTEdaHdAhUINLIlEug1MfOslHU+HyGsdywnoChNB4GHPQuePC5Tz60VeFN RbQ9aKcBUu8r95rjlnKtAtBijNMA4bjGwllVxNwJ/ZoA9RPv1SbDZ07RX3qTaLVY YhVQl8+shD33/W24jUTJv1kMMexpHXIlv0gyfMryzpwI7uzzmGHRPAokJdbYKctC dyMPfdE90rxTiMUdL/1IQGhnh3awjbyfArzUhHyQ++HyUyzCFh0slsO0CD18vUy8 FofhCugGBhjuKw3XwLNQ+KsWURz5qHctSzBc3qMOSyqFHbAJCVRANkhsFvWJo2qL 75+Z7OTRebtsyOUZIdq26r4roSxHrps3dupWTtN70HWx2NhQG1nLEw986QYiQu1T hcKvDmehQLrUvg== =x3WL -----END PGP SIGNATURE----- Merge tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: - Cleanup of extable fixup handling to be more robust, which in turn allows to make the FPU exception fixups more robust as well. - Change the return code for signal frame related failures from explicit error codes to a boolean fail/success as that's all what the calling code evaluates. - A large refactoring of the FPU code to prepare for adding AMX support: - Distangle the public header maze and remove especially the misnomed kitchen sink internal.h which is despite it's name included all over the place. - Add a proper abstraction for the register buffer storage (struct fpstate) which allows to dynamically size the buffer at runtime by flipping the pointer to the buffer container from the default container which is embedded in task_struct::tread::fpu to a dynamically allocated container with a larger register buffer. - Convert the code over to the new fpstate mechanism. - Consolidate the KVM FPU handling by moving the FPU related code into the FPU core which removes the number of exports and avoids adding even more export when AMX has to be supported in KVM. This also removes duplicated code which was of course unnecessary different and incomplete in the KVM copy. - Simplify the KVM FPU buffer handling by utilizing the new fpstate container and just switching the buffer pointer from the user space buffer to the KVM guest buffer when entering vcpu_run() and flipping it back when leaving the function. This cuts the memory requirements of a vCPU for FPU buffers in half and avoids pointless memory copy operations. This also solves the so far unresolved problem of adding AMX support because the current FPU buffer handling of KVM inflicted a circular dependency between adding AMX support to the core and to KVM. With the new scheme of switching fpstate AMX support can be added to the core code without affecting KVM. - Replace various variables with proper data structures so the extra information required for adding dynamically enabled FPU features (AMX) can be added in one place - Add AMX (Advanced Matrix eXtensions) support (finally): AMX is a large XSTATE component which is going to be available with Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD) which allows to trap the (first) use of an AMX related instruction, which has two benefits: 1) It allows the kernel to control access to the feature 2) It allows the kernel to dynamically allocate the large register state buffer instead of burdening every task with the the extra 8K or larger state storage. It would have been great to gain this kind of control already with AVX512. The support comes with the following infrastructure components: 1) arch_prctl() to - read the supported features (equivalent to XGETBV(0)) - read the permitted features for a task - request permission for a dynamically enabled feature Permission is granted per process, inherited on fork() and cleared on exec(). The permission policy of the kernel is restricted to sigaltstack size validation, but the syscall obviously allows further restrictions via seccomp etc. 2) A stronger sigaltstack size validation for sys_sigaltstack(2) which takes granted permissions and the potentially resulting larger signal frame into account. This mechanism can also be used to enforce factual sigaltstack validation independent of dynamic features to help with finding potential victims of the 2K sigaltstack size constant which is broken since AVX512 support was added. 3) Exception handling for #NM traps to catch first use of a extended feature via a new cause MSR. If the exception was caused by the use of such a feature, the handler checks permission for that feature. If permission has not been granted, the handler sends a SIGILL like the #UD handler would do if the feature would have been disabled in XCR0. If permission has been granted, then a new fpstate which fits the larger buffer requirement is allocated. In the unlikely case that this allocation fails, the handler sends SIGSEGV to the task. That's not elegant, but unavoidable as the other discussed options of preallocation or full per task permissions come with their own set of horrors for kernel and/or userspace. So this is the lesser of the evils and SIGSEGV caused by unexpected memory allocation failures is not a fundamentally new concept either. When allocation succeeds, the fpstate properties are filled in to reflect the extended feature set and the resulting sizes, the fpu::fpstate pointer is updated accordingly and the trap is disarmed for this task permanently. 4) Enumeration and size calculations 5) Trap switching via MSR_XFD The XFD (eXtended Feature Disable) MSR is context switched with the same life time rules as the FPU register state itself. The mechanism is keyed off with a static key which is default disabled so !AMX equipped CPUs have zero overhead. On AMX enabled CPUs the overhead is limited by comparing the tasks XFD value with a per CPU shadow variable to avoid redundant MSR writes. In case of switching from a AMX using task to a non AMX using task or vice versa, the extra MSR write is obviously inevitable. All other places which need to be aware of the variable feature sets and resulting variable sizes are not affected at all because they retrieve the information (feature set, sizes) unconditonally from the fpstate properties. 6) Enable the new AMX states Note, this is relatively new code despite the fact that AMX support is in the works for more than a year now. The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words... Many thanks to Chang Bae and Dave Hansen for working hard on this and also to the various test teams at Intel who reserved extra capacity to follow the rapid development of this closely which provides the confidence level required to offer this rather large update for inclusion into 5.16-rc1 * tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) Documentation/x86: Add documentation for using dynamic XSTATE features x86/fpu: Include vmalloc.h for vzalloc() selftests/x86/amx: Add context switch test selftests/x86/amx: Add test cases for AMX state management x86/fpu/amx: Enable the AMX feature in 64-bit mode x86/fpu: Add XFD handling for dynamic states x86/fpu: Calculate the default sizes independently x86/fpu/amx: Define AMX state components and have it used for boot-time checks x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers x86/fpu/xstate: Add fpstate_realloc()/free() x86/fpu/xstate: Add XFD #NM handler x86/fpu: Update XFD state where required x86/fpu: Add sanity checks for XFD x86/fpu: Add XFD state to fpstate x86/msr-index: Add MSRs for XFD x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit x86/fpu: Reset permission and fpstate on exec() x86/fpu: Prepare fpu_clone() for dynamically enabled features x86/fpu/signal: Prepare for variable sigframe length x86/signal: Use fpu::__state_user_size for sigalt stack validation ... |
||
Peter Zijlstra
|
87c87ecd00 |
bpf,x86: Respect X86_FEATURE_RETPOLINE*
Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and unconditionally emits a thunk call, this is sub-optimal and doesn't match the regular, compiler generated, code. Update the i386 JIT to emit code equal to what the compiler emits for the regular kernel text (IOW. a plain THUNK call). Update the x86_64 JIT to emit code similar to the result of compiler and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags. Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg), while doing a THUNK call for RETPOLINE. This removes the hard-coded retpoline thunks and shrinks the generated code. Leaving a single retpoline thunk definition in the kernel. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.614772675@infradead.org |
||
Peter Zijlstra
|
dceba0817c |
bpf,x86: Simplify computing label offsets
Take an idea from the 32bit JIT, which uses the multi-pass nature of the JIT to compute the instruction offsets on a prior pass in order to compute the relative jump offsets on a later pass. Application to the x86_64 JIT is slightly more involved because the offsets depend on program variables (such as callee_regs_used and stack_depth) and hence the computed offsets need to be kept in the context of the JIT. This removes, IMO quite fragile, code that hard-codes the offsets and tries to compute the length of variable parts of it. Convert both emit_bpf_tail_call_*() functions which have an out: label at the end. Additionally emit_bpt_tail_call_direct() also has a poke table entry, for which it computes the offset from the end (and thus already relies on the previous pass to have computed addrs[i]), also convert this to be a forward based offset. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.552304864@infradead.org |
||
Peter Zijlstra
|
6fda8a3886 |
x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
Because it makes no sense to split the retpoline gunk over multiple headers. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.106290934@infradead.org |
||
Ingo Molnar
|
082f20b21d |
Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Resolve the conflict between these commits: x86/fpu: |
||
Jie Meng
|
6364d7d75a |
bpf, x64: Factor out emission of REX byte in more cases
Introduce a single reg version of maybe_emit_mod() and factor out common code in more cases. Signed-off-by: Jie Meng <jmeng@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211006194135.608932-1-jmeng@fb.com |
||
Jie Meng
|
57a610f1c5 |
bpf, x64: Save bytes for DIV by reducing reg copies
Instead of unconditionally performing push/pop on %rax/%rdx in case of division/modulo, we can save a few bytes in case of destination register being either BPF r0 (%rax) or r3 (%rdx) since the result is written in there anyway. Also, we do not need to copy the source to %r11 unless the source is either %rax, %rdx or an immediate. For example, before the patch: 22: push %rax 23: push %rdx 24: mov %rsi,%r11 27: xor %edx,%edx 29: div %r11 2c: mov %rax,%r11 2f: pop %rdx 30: pop %rax 31: mov %r11,%rax After: 22: push %rdx 23: xor %edx,%edx 25: div %rsi 28: pop %rdx Signed-off-by: Jie Meng <jmeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211002035626.2041910-1-jmeng@fb.com |
||
Jakub Kicinski
|
dd9a887b35 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/phy/bcm7xxx.c |
||
Johan Almbladh
|
ced185824c |
bpf, x86: Fix bpf mapping of atomic fetch implementation
Fix the case where the dst register maps to %rax as otherwise this produces an incorrect mapping with the implementation in |
||
Jie Meng
|
c035407743 |
bpf,x64 Emit IMUL instead of MUL for x86-64
IMUL allows for multiple operands and saving and storing rax/rdx is no longer needed. Signedness of the operands doesn't matter here because the we only keep the lower 32/64 bit of the product for 32/64 bit multiplications. Signed-off-by: Jie Meng <jmeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210913211337.1564014-1-jmeng@fb.com |
||
Hou Tao
|
356ed64991 |
bpf: Handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
Currently if a function ptr in struct_ops has a return value, its
caller will get a random return value from it, because the return
value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped.
So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline
to save and return the return value of struct_ops prog if ret_size of
the function ptr is greater than 0. Also restricting the flag to be
used alone.
Fixes:
|
||
Thomas Gleixner
|
46d28947d9 |
x86/extable: Rework the exception table mechanics
The exception table entries contain the instruction address, the fixup address and the handler address. All addresses are relative. Storing the handler address has a few downsides: 1) Most handlers need to be exported 2) Handlers can be defined everywhere and there is no overview about the handler types 3) MCE needs to check the handler type to decide whether an in kernel #MC can be recovered. The functionality of the handler itself is not in any way special, but for these checks there need to be separate functions which in the worst case have to be exported. Some of these 'recoverable' exception fixups are pretty obscure and just reuse some other handler to spare code. That obfuscates e.g. the #MC safe copy functions. Cleaning that up would require more handlers and exports Rework the exception fixup mechanics by storing a fixup type number instead of the handler address and invoke the proper handler for each fixup type. Also teach the extable sort to leave the type field alone. This makes most handlers static except for special cases like the MCE MSR fixup and the BPF fixup. This allows to add more types for cleaning up the obscure places without adding more handler code and exports. There is a marginal code size reduction for a production config and it removes _eight_ exported symbols. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de |