linux

iv/linux

Author	SHA1	Message	Date
Daniel Borkmann	75a1a607bb	uaccess: Add strict non-pagefault kernel-space read function Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict() helpers which by default alias to the __probe_kernel_read() and the __strncpy_from_unsafe(), respectively, but can be overridden by archs which have non-overlapping address ranges for kernel space and user space in order to bail out with -EFAULT when attempting to probe user memory including non-canonical user access addresses [0]: 4-level page tables: user-space mem: 0x0000000000000000 - 0x00007fffffffffff non-canonical: 0x0000800000000000 - 0xffff7fffffffffff 5-level page tables: user-space mem: 0x0000000000000000 - 0x00ffffffffffffff non-canonical: 0x0100000000000000 - 0xfeffffffffffffff The idea is that these helpers are complementary to the probe_user_read() and strncpy_from_unsafe_user() which probe user-only memory. Both added helpers here do the same, but for kernel-only addresses. Both set of helpers are going to be used for BPF tracing. They also explicitly avoid throwing the splat for non-canonical user addresses from 00c42373d397 ("x86-64: add warning for non-canonical user access address dereferences"). For compat, the current probe_kernel_read() and strncpy_from_unsafe() are left as-is. [0] Documentation/x86/x86_64/mm.txt Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: x86@kernel.org Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	1d1585ca0f	uaccess: Add non-pagefault user-space write function Commit 3d7081822f7f ("uaccess: Add non-pagefault user-space read functions") missed to add probe write function, therefore factor out a probe_write_common() helper with most logic of probe_kernel_write() except setting KERNEL_DS, and add a new probe_user_write() helper so it can be used from BPF side. Again, on some archs, the user address space and kernel address space can co-exist and be overlapping, so in such case, setting KERNEL_DS would mean that the given address is treated as being in kernel address space. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Alexei Starovoitov	e1cb7d2d60	Merge branch 'map-pinning' Toke Høiland-Jørgensen says: ==================== This series adds support to libbpf for reading 'pinning' settings from BTF-based map definitions. It introduces a new open option which can set the pinning path; if no path is set, /sys/fs/bpf is used as the default. Callers can customise the pinning between open and load by setting the pin path per map, and still get the automatic reuse feature. The semantics of the pinning is similar to the iproute2 "PIN_GLOBAL" setting, and the eventual goal is to move the iproute2 implementation to be based on libbpf and the functions introduced in this series. Changelog: v6: - Fix leak of struct bpf_object in selftest - Make struct bpf_map arg const in bpf_map__is_pinned() and bpf_map__get_pin_path() v5: - Don't pin maps with pinning set, but with a value of LIBBPF_PIN_NONE - Add a few more selftests: - Should not pin map with pinning set, but value LIBBPF_PIN_NONE - Should fail to load a map with an invalid pinning value - Should fail to re-use maps with parameter mismatch - Alphabetise libbpf.map - Whitespace and typo fixes v4: - Don't check key_type_id and value_type_id when checking for map reuse compatibility. - Move building of map->pin_path into init_user_btf_map() - Get rid of 'pinning' attribute in struct bpf_map - Make sure we also create parent directory on auto-pin (new patch 3). - Abort the selftest on error instead of attempting to continue. - Support unpinning all pinned maps with bpf_object__unpin_maps(obj, NULL) - Support pinning at map->pin_path with bpf_object__pin_maps(obj, NULL) - Make re-pinning a map at the same path a noop - Rename the open option to pin_root_path - Add a bunch more self-tests for pin_maps(NULL) and unpin_maps(NULL) - Fix a couple of smaller nits v3: - Drop bpf_object__pin_maps_opts() and just use an open option to customise the pin path; also don't touch bpf_object__{un,}pin_maps() - Integrate pinning and reuse into bpf_object__create_maps() instead of having multiple loops though the map structure - Make errors in map reuse and pinning fatal to the load procedure - Add selftest to exercise pinning feature - Rebase series to latest bpf-next v2: - Drop patch that adds mounting of bpffs - Only support a single value of the pinning attribute - Add patch to fixup error handling in reuse_fd() - Implement the full automatic pinning and map reuse logic on load ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-02 12:35:12 -07:00
Toke Høiland-Jørgensen	2f4a32cc83	selftests: Add tests for automatic map pinning This adds a new BPF selftest to exercise the new automatic map pinning code. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298209.394725.15420085139296213182.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	57a00f4164	libbpf: Add auto-pinning of maps when loading BPF objects This adds support to libbpf for setting map pinning information as part of the BTF map declaration, to get automatic map pinning (and reuse) on load. The pinning type currently only supports a single PIN_BY_NAME mode, where each map will be pinned by its name in a path that can be overridden, but defaults to /sys/fs/bpf. Since auto-pinning only does something if any maps actually have a 'pinning' BTF attribute set, we default the new option to enabled, on the assumption that seamless pinning is what most callers want. When a map has a pin_path set at load time, libbpf will compare the map pinned at that location (if any), and if the attributes match, will re-use that map instead of creating a new one. If no existing map is found, the newly created map will instead be pinned at the location. Programs wanting to customise the pinning can override the pinning paths using bpf_map__set_pin_path() before calling bpf_object__load() (including setting it to NULL to disable pinning of a particular map). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298092.394725.3966306029218559681.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	196f8487f5	libbpf: Move directory creation into _pin() functions The existing pin_*() functions all try to create the parent directory before pinning. Move this check into the per-object _pin() functions instead. This ensures consistent behaviour when auto-pinning is added (which doesn't go through the top-level pin_maps() function), at the cost of a few more calls to mkdir(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297985.394725.5882630952992598610.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	4580b25fce	libbpf: Store map pin path and status in struct bpf_map Support storing and setting a pin path in struct bpf_map, which can be used for automatic pinning. Also store the pin status so we can avoid attempts to re-pin a map that has already been pinned (or reused from a previous pinning). The behaviour of bpf_object__{un,}pin_maps() is changed so that if it is called with a NULL path argument (which was previously illegal), it will (un)pin only those maps that have a pin_path set. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297876.394725.14782206533681896279.stgit@toke.dk	2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen	d1b4574a4b	libbpf: Fix error handling in bpf_map__reuse_fd() bpf_map__reuse_fd() was calling close() in the error path before returning an error value based on errno. However, close can change errno, so that can lead to potentially misleading error messages. Instead, explicitly store errno in the err variable before each goto. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297769.394725.12634985106772698611.stgit@toke.dk	2019-11-02 12:35:06 -07:00
Daniel Borkmann	78db77fab1	Merge branch 'bpf-xskmap-perf-improvements' Björn Töpel says: ==================== This set consists of three patches from Maciej and myself which are optimizing the XSKMAP lookups. In the first patch, the sockets are moved to be stored at the tail of the struct xsk_map. The second patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch, introduced in this revision, moves various XSKMAP functions, to permit the compiler to do more aggressive inlining. Based on the XDP program from tools/lib/bpf/xsk.c where bpf_map_lookup_elem() is explicitly called, this work yields a 5% improvement for xdpsock's rxdrop scenario. The last patch yields 2% improvement. Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the overflow checks are done in the bpf_map_area_alloc() and bpf_map_charge_init() functions, which was fixed in commit ff1c08e1f74b ("bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}()"). [1] https://patchwork.ozlabs.org/patch/1186170/ v1->v2: * Change size/cost to size_t and use {struct, array}_size where appropriate. (Jakub) v2->v3: * Proper commit message for patch 2. v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub) * Introduced patch 3. v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct pointer-sized loads. (Daniel) ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-02 00:38:59 +01:00
Björn Töpel	d817991cc7	xsk: Restructure/inline XSKMAP lookup/redirect/flush In this commit the XSKMAP entry lookup function used by the XDP redirect code is moved from the xskmap.c file to the xdp_sock.h header, so the lookup can be inlined from, e.g., the bpf_xdp_redirect_map() function. Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush() functions. Finally, all the XDP socket functions were moved from linux/bpf.h to net/xdp_sock.h, where most of the XDP sockets functions are anyway. This yields a ~2% performance boost for the xdpsock "rx_drop" scenario. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Maciej Fijalkowski	e65650f291	bpf: Implement map_gen_lookup() callback for XSKMAP Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup() callback. This results in emitting the bpf instructions in place of bpf_map_lookup_elem() helper call and better performance of bpf programs. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Björn Töpel	64fe8c061d	xsk: Store struct xdp_sock as a flexible array member of the XSKMAP Prior this commit, the array storing XDP socket instances were stored in a separate allocated array of the XSKMAP. Now, we store the sockets as a flexible array member in a similar fashion as the arraymap. Doing so, we do less pointer chasing in the lookup. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Jakub Kicinski	75b0bfd2e1	Revert "selftests: bpf: Don't try to read files without read permission" This reverts commit 5bc60de50dfe ("selftests: bpf: Don't try to read files without read permission"). Quoted commit does not work at all, and was never tested. Script requires root permissions (and tests for them) and os.access() will always return true for root. The correct fix is needed in the bpf tree, so let's just revert and save ourselves the merge conflict. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/bpf/20191101005127.1355-1-jakub.kicinski@netronome.com	2019-11-01 13:13:21 +01:00
Daniel Borkmann	0608711460	Merge branch 'bpf-cleanup-btf-raw-tp' Alexei Starovoitov says: ==================== v1->v2: addressed Andrii's feedback When BTF-enabled raw_tp were introduced the plan was to follow up with BTF-enabled kprobe and kretprobe reusing PROG_RAW_TRACEPOINT and PROG_KPROBE types. But k[ret]probe expect pt_regs while BTF-enabled program ctx will be the same as raw_tp. kretprobe is indistinguishable from kprobe while BTF-enabled kretprobe will have access to retval while kprobe will not. Hence PROG_KPROBE type is not reusable and reusing PROG_RAW_TRACEPOINT no longer fits well. Hence introduce 'umbrella' prog type BPF_PROG_TYPE_TRACING that will cover different BTF-enabled tracing attach points. The changes make libbpf side cleaner as well. check_attach_btf_id() is cleaner too. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-10-31 15:17:14 +01:00
Alexei Starovoitov	12a8654b2e	libbpf: Add support for prog_tracing Cleanup libbpf from expected_attach_type == attach_btf_id hack and introduce BPF_PROG_TYPE_TRACING. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-3-ast@kernel.org	2019-10-31 15:16:59 +01:00
Alexei Starovoitov	f1b9509c2f	bpf: Replace prog_raw_tp+btf_id with prog_tracing The bpf program type raw_tp together with 'expected_attach_type' was the most appropriate api to indicate BTF-enabled raw_tp programs. But during development it became apparent that 'expected_attach_type' cannot be used and new 'attach_btf_id' field had to be introduced. Which means that the information is duplicated in two fields where one of them is ignored. Clean it up by introducing new program type where both 'expected_attach_type' and 'attach_btf_id' fields have specific meaning. In the future 'expected_attach_type' will be extended with other attach points that have similar semantics to raw_tp. This patch is replacing BTF-enabled BPF_PROG_TYPE_RAW_TRACEPOINT with prog_type = BPF_RPOG_TYPE_TRACING expected_attach_type = BPF_TRACE_RAW_TP attach_btf_id = btf_id of raw tracepoint inside the kernel Future patches will add expected_attach_type = BPF_TRACE_FENTRY or BPF_TRACE_FEXIT where programs have the same input context and the same helpers, but different attach points. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-2-ast@kernel.org	2019-10-31 15:16:59 +01:00
Alexei Starovoitov	af91acbc62	bpf: Fix bpf jit kallsym access Jiri reported crash when JIT is on, but net.core.bpf_jit_kallsyms is off. bpf_prog_kallsyms_find() was skipping addr->bpf_prog resolution logic in oops and stack traces. That's incorrect. It should only skip addr->name resolution for 'cat /proc/kallsyms'. That's what bpf_jit_kallsyms and bpf_jit_harden protect. Fixes: 3dec541b2e63 ("bpf: Add support for BTF pointers to x86 JIT") Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191030233019.1187404-1-ast@kernel.org	2019-10-31 02:02:29 +01:00
Shmulik Ladkani	cf204a7183	bpf, testing: Introduce 'gso_linear_no_head_frag' skb_segment test Following reports of skb_segment() hitting a BUG_ON when working on GROed skbs which have their gso_size mangled (e.g. after a bpf_skb_change_proto call), add a reproducer test that mimics the input skbs that lead to the mentioned BUG_ON as in [1] and validates the fix submitted in [2]. [1] https://lists.openwall.net/netdev/2019/08/26/110 [2] commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191025134223.2761-3-shmulik.ladkani@gmail.com	2019-10-30 16:37:08 +01:00
Shmulik Ladkani	af21c717f4	bpf, testing: Refactor test_skb_segment() for testing skb_segment() on different skbs Currently, test_skb_segment() builds a single test skb and runs skb_segment() on it. Extend test_skb_segment() so it processes an array of numerous skb/feature pairs to test. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191025134223.2761-2-shmulik.ladkani@gmail.com	2019-10-30 16:37:01 +01:00
Ilya Leoshkevich	7e07e7aec5	bpf: Add s390 testing documentation This commits adds a document that explains how to test BPF in an s390 QEMU guest. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191029172916.36528-1-iii@linux.ibm.com	2019-10-30 16:25:31 +01:00
Ilya Leoshkevich	9ffccb7606	selftests/bpf: Test narrow load from bpf_sysctl.write There are tests for full and narrows loads from bpf_sysctl.file_pos, but for bpf_sysctl.write only full load is tested. Add the missing test. Suggested-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20191029143027.28681-1-iii@linux.ibm.com	2019-10-30 16:24:06 +01:00
Alexei Starovoitov	15ab09bdca	bpf: Enforce 'return 0' in BTF-enabled raw_tp programs The return value of raw_tp programs is ignored by __bpf_trace_run() that calls them. The verifier also allows any value to be returned. For BTF-enabled raw_tp lets enforce 'return 0', so that return value can be used for something in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191029032426.1206762-1-ast@kernel.org	2019-10-30 16:22:55 +01:00
Andrii Nakryiko	a566e35f1e	libbpf: Don't use kernel-side u32 type in xsk.c u32 is a kernel-side typedef. User-space library is supposed to use __u32. This breaks Github's projection of libbpf. Do u32 -> __u32 fix. Fixes: 94ff9ebb49a5 ("libbpf: Fix compatibility for kernels without need_wakeup") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20191029055953.2461336-1-andriin@fb.com	2019-10-29 06:38:12 -07:00
Andrii Nakryiko	d3a3aa0c59	libbpf: Fix off-by-one error in ELF sanity check libbpf's bpf_object__elf_collect() does simple sanity check after iterating over all ELF sections, if checks that .strtab index is correct. Unfortunately, due to section indices being 1-based, the check breaks for cases when .strtab ends up being the very last section in ELF. Fixes: 77ba9a5b48a7 ("tools lib bpf: Fetch map names from correct strtab") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191028233727.1286699-1-andriin@fb.com	2019-10-28 20:27:40 -07:00
Magnus Karlsson	94ff9ebb49	libbpf: Fix compatibility for kernels without need_wakeup When the need_wakeup flag was added to AF_XDP, the format of the XDP_MMAP_OFFSETS getsockopt was extended. Code was added to the kernel to take care of compatibility issues arrising from running applications using any of the two formats. However, libbpf was not extended to take care of the case when the application/libbpf uses the new format but the kernel only supports the old format. This patch adds support in libbpf for parsing the old format, before the need_wakeup flag was added, and emulating a set of static need_wakeup flags that will always work for the application. v2 -> v3: * Incorporated code improvements suggested by Jonathan Lemon v1 -> v2: * Rebased to bpf-next * Rewrote the code as the previous version made you blind Fixes: a4500432c2587cb2a ("libbpf: add support for need_wakeup flag in AF_XDP part") Reported-by: Eloy Degen <degeneloy@gmail.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1571995035-21889-1-git-send-email-magnus.karlsson@intel.com	2019-10-28 20:25:32 -07:00
Ilya Leoshkevich	e93d99180a	selftests/bpf: Restore $(OUTPUT)/test_stub.o rule `make O=/linux-build kselftest TARGETS=bpf` fails with make[3]: *** No rule to make target '/linux-build/bpf/test_stub.o', needed by '/linux-build/bpf/test_verifier' The same command without the O= part works, presumably thanks to the implicit rule. Fix by restoring the explicit $(OUTPUT)/test_stub.o rule. Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191028102110.7545-1-iii@linux.ibm.com	2019-10-28 16:16:10 +01:00
Ilya Leoshkevich	313e7f6fb1	selftest/bpf: Use -m{little, big}-endian for clang When cross-compiling tests from x86 to s390, the resulting BPF objects fail to load due to endianness mismatch. Fix by using BPF-GCC endianness check for clang as well. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191028102049.7489-1-iii@linux.ibm.com	2019-10-28 16:15:51 +01:00
David S. Miller	5b7fe93db0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-10-27 The following pull-request contains BPF updates for your net-next tree. We've added 52 non-merge commits during the last 11 day(s) which contain a total of 65 files changed, 2604 insertions(+), 1100 deletions(-). The main changes are: 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF assembly code. The work here teaches BPF verifier to recognize kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints such that verifier allows direct use of bpf_skb_event_output() helper used in tc BPF et al (w/o probing memory access) that dumps skb data into perf ring buffer. Also add direct loads to probe memory in order to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov. 2) Big batch of changes to improve libbpf and BPF kselftests. Besides others: generalization of libbpf's CO-RE relocation support to now also include field existence relocations, revamp the BPF kselftest Makefile to add test runner concept allowing to exercise various ways to build BPF programs, and teach bpf_object__open() and friends to automatically derive BPF program type/expected attach type from section names to ease their use, from Andrii Nakryiko. 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu. 4) Allow to read BTF as raw data from bpftool. Most notable use case is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa. 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which manages to improve "rx_drop" performance by ~4%., from Björn Töpel. 6) Fix to restore the flow dissector after reattach BPF test and also fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki. 7) Improve verifier's BTF ctx access for use outside of raw_tp, from Martin KaFai Lau. 8) Improve documentation for AF_XDP with new sections and to reflect latest features, from Magnus Karlsson. 9) Add back 'version' section parsing to libbpf for old kernels, from John Fastabend. 10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(), from KP Singh. 11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order to improve insn coverage for built BPF progs, from Yonghong Song. 12) Misc minor cleanups and fixes, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 22:57:27 -07:00
Roman Mashak	b951248518	tc-testing: list required kernel options for act_ct action Updated config with required kernel options for conntrac TC action, so that tdc can run the tests. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:40:22 -07:00
David S. Miller	4b1f5ddaff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, more specifically: * Updates for ipset: 1) Coding style fix for ipset comment extension, from Jeremy Sowden. 2) De-inline many functions in ipset, from Jeremy Sowden. 3) Move ipset function definition from header to source file. 4) Move ip_set_put_flags() to source, export it as a symbol, remove inline. 5) Move range_to_mask() to the source file where this is used. 6) Move ip_set_get_ip_port() to the source file where this is used. * IPVS selftests and netns improvements: 7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan. 8) Three patches to add selftest script for ipvs, also from Haishuang Yan. * Conntrack updates and new nf_hook_slow_list() function: 9) Document ct ecache extension, from Florian Westphal. 10) Skip ct extensions from ctnetlink dump, from Florian. 11) Free ct extension immediately, from Florian. 12) Skip access to ecache extension from nf_ct_deliver_cached_events() this is not correct as reported by Syzbot. 13) Add and use nf_hook_slow_list(), from Florian. * Flowtable infrastructure updates: 14) Move priority to nf_flowtable definition. 15) Dynamic allocation of per-device hooks in flowtables. 16) Allow to include netdevice only once in flowtable definitions. 17) Rise maximum number of devices per flowtable. * Netfilter hardware offload infrastructure updates: 18) Add nft_flow_block_chain() helper function. 19) Pass callback list to nft_setup_cb_call(). 20) Add nft_flow_cls_offload_setup() helper function. 21) Remove rules for the unregistered device via netdevice event. 22) Support for multiple devices in a basechain definition at the ingress hook. 22) Add nft_chain_offload_cmd() helper function. 23) Add nft_flow_block_offload_init() helper function. 24) Rewind in case of failing to bind multiple devices to hook. 25) Typo in IPv6 tproxy module description, from Norman Rasmussen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:35:43 -07:00
David S. Miller	64fe8e9769	Merge branch 'net-aquantia-ptp-followup-fixes' Igor Russkikh says: ==================== net: aquantia: ptp followup fixes Here are two sparse warnings, third patch is a fix for scaled_ppm_to_ppb missing. Eventually I reworked this to exclude ptp module from build. Please consider it instead of this patch: https://patchwork.ozlabs.org/patch/1184171/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:28:40 -07:00
Igor Russkikh	7873ee26b1	net: aquantia: disable ptp object build if no config We do disable aq_ptp module build using inline stubs when CONFIG_PTP_1588_CLOCK is not declared. This reduces module size and removes unnecessary code. Reported-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:28:40 -07:00
Igor Russkikh	5eeb6c3cf2	net: aquantia: fix warnings on endianness fixes to remove sparse warnings: sparse: sparse: cast to restricted __be64 Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:28:40 -07:00
Igor Russkikh	bb1eded18d	net: aquantia: fix var initialization warning found by sparse, simply useless local initialization with zero. Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:28:40 -07:00
Pablo Neira Ayuso	671312e1a0	netfilter: nf_tables_offload: unbind if multi-device binding fails nft_flow_block_chain() needs to unbind in case of error when performing the multi-device binding. Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Reported-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-10-26 12:36:44 +02:00
Pablo Neira Ayuso	75ceaf862d	netfilter: nf_tables_offload: add nft_flow_block_offload_init() This patch adds the nft_flow_block_offload_init() helper function to initialize the flow_block_offload object. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-10-26 12:36:43 +02:00
Pablo Neira Ayuso	6df5490fbb	netfilter: nf_tables_offload: add nft_chain_offload_cmd() This patch adds the nft_chain_offload_cmd() helper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-10-26 12:36:42 +02:00
Florian Westphal	ad88b7a6aa	netfilter: ecache: don't look for ecache extension on dying/unconfirmed conntracks syzbot reported following splat: BUG: KASAN: use-after-free in __nf_ct_ext_exist include/net/netfilter/nf_conntrack_extend.h:53 [inline] BUG: KASAN: use-after-free in nf_ct_deliver_cached_events+0x5c3/0x6d0 net/netfilter/nf_conntrack_ecache.c:205 nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:65 [inline] nf_confirm+0x3d8/0x4d0 net/netfilter/nf_conntrack_proto.c:154 [..] While there is no reproducer yet, the syzbot report contains one interesting bit of information: Freed by task 27585: [..] kfree+0x10a/0x2c0 mm/slab.c:3757 nf_ct_ext_destroy+0x2ab/0x2e0 net/netfilter/nf_conntrack_extend.c:38 nf_conntrack_free+0x8f/0xe0 net/netfilter/nf_conntrack_core.c:1418 destroy_conntrack+0x1a2/0x270 net/netfilter/nf_conntrack_core.c:626 nf_conntrack_put include/linux/netfilter/nf_conntrack_common.h:31 [inline] nf_ct_resolve_clash net/netfilter/nf_conntrack_core.c:915 [inline] ^^^^^^^^^^^^^^^^^^^ __nf_conntrack_confirm+0x21ca/0x2830 net/netfilter/nf_conntrack_core.c:1038 nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:63 [inline] nf_confirm+0x3e7/0x4d0 net/netfilter/nf_conntrack_proto.c:154 This is whats happening: 1. a conntrack entry is about to be confirmed (added to hash table). 2. a clash with existing entry is detected. 3. nf_ct_resolve_clash() puts skb->nfct (the "losing" entry). 4. this entry now has a refcount of 0 and is freed to SLAB_TYPESAFE_BY_RCU kmem cache. skb->nfct has been replaced by the one found in the hash. Problem is that nf_conntrack_confirm() uses the old ct: static inline int nf_conntrack_confirm(struct sk_buff skb) { struct nf_conn ct = (struct nf_conn )skb_nfct(skb); int ret = NF_ACCEPT; if (ct) { if (!nf_ct_is_confirmed(ct)) ret = __nf_conntrack_confirm(skb); if (likely(ret == NF_ACCEPT)) nf_ct_deliver_cached_events(ct); / This ct has refcount 0! */ } return ret; } As of "netfilter: conntrack: free extension area immediately", we can't access conntrack extensions in this case. To fix this, make sure we check the dying bit presence before attempting to get the eache extension. Reported-by: syzbot+c7aabc9fe93e7f3637ba@syzkaller.appspotmail.com Fixes: 2ad9d7747c10d1 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-10-26 12:36:42 +02:00
David S. Miller	0629d2456a	Merge branch 'ionic-updates' Shannon Nelson says: ==================== ionic updates These are a few of the driver updates we've been working on internally. These clean up a few mismatched struct comments, add checking for dead firmware, fix an initialization bug, and change the Rx buffer management. These are based on net-next v5.4-rc3-709-g985fd98ab5cc. v2: clear napi->skb in the error case in ionic_rx_frags() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	63ad1cd680	ionic: update driver version Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	08f2e4b2b2	ionic: implement support for rx sgl Even out Rx performance across MTU sizes by changing from full skb allocations to page-based frag allocations. The device supports a form of scatter-gather in the Rx path, so we can set up a number of pages for each descriptor, all of which are easier to alloc and pass around than the standard kzalloc'd buffer. An skb is wrapped around the pages while processing the received packets, and pages are recycled as needed, or left alone if they weren't used in the Rx. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	089406bc5a	ionic: add a watchdog timer to monitor heartbeat Add a watchdog to periodically monitor the NIC heartbeat. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	97ca486592	ionic: add heartbeat check Most of our firmware has a heartbeat feature that the driver can watch for to see if the FW is still alive and likely to answer a dev_cmd or AdminQ request. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	ff7ebed945	ionic: reverse an interrupt coalesce calculation Fix the initial interrupt coalesce usec-to-hw setting to actually be usec-to-hw. Fixes: 780eded34ccc ("ionic: report users coalesce request") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Shannon Nelson	5c28f213ef	ionic: fix up struct name comments Fix up struct names in the ionic_if.h comments Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:52:36 -07:00
Heiner Kallweit	e4b5c7a582	r8169: improve rtl8169_rx_fill We have only one user of the error path, so we can inline it. In addition the call to rtl8169_make_unusable_by_asic() can be removed because rtl8169_alloc_rx_data() didn't call rtl8169_mark_to_asic() yet for the respective index if returning NULL. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:28:59 -07:00
Heiner Kallweit	7cb83b21fd	r8169: align fix_features callback with vendor driver This patch aligns the fix_features callback with the vendor driver and also disables IPv6 HW checksumming and TSO if jumbo packets are used on RTL8101/RTL8168/RTL8125. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:28:30 -07:00
David S. Miller	8ca12bc36f	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-10-23 Here's the main bluetooth-next pull request for the 5.5 kernel: - Multiple fixes to hci_qca driver - Fix for HCI_USER_CHANNEL initialization - btwlink: drop superseded driver - Add support for Intel FW download error recovery - Various other smaller fixes & improvements Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 20:19:44 -07:00
Jason Baron	480274787d	tcp: add TCP_INFO status for failed client TFO The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether or not data-in-SYN was ack'd on both the client and server side. We'd like to gather more information on the client-side in the failure case in order to indicate the reason for the failure. This can be useful for not only debugging TFO, but also for creating TFO socket policies. For example, if a middle box removes the TFO option or drops a data-in-SYN, we can can detect this case, and turn off TFO for these connections saving the extra retransmits. The newly added tcpi_fastopen_client_fail status is 2 bits and has the following 4 states: 1) TFO_STATUS_UNSPEC Catch-all state which includes when TFO is disabled via black hole detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE. 2) TFO_COOKIE_UNAVAILABLE If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie is available in the cache. 3) TFO_DATA_NOT_ACKED Data was sent with SYN, we received a SYN/ACK but it did not cover the data portion. Cookie is not accepted by server because the cookie may be invalid or the server may be overloaded. 4) TFO_SYN_RETRANSMITTED Data was sent with SYN, we received a SYN/ACK which did not cover the data after at least 1 additional SYN was sent (without data). It may be the case that a middle-box is dropping data-in-SYN packets. Thus, it would be more efficient to not use TFO on this connection to avoid extra retransmits during connection establishment. These new fields do not cover all the cases where TFO may fail, but other failures, such as SYN/ACK + data being dropped, will result in the connection not becoming established. And a connection blackhole after session establishment shows up as a stalled connection. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Christoph Paasch <cpaasch@apple.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 19:25:37 -07:00
David S. Miller	79f2056b8b	Merge branch 'phy-dp83867-enable-robust-auto-mdix' Grygorii Strashko says: ==================== net: phy: dp83867: enable robust auto-mdix Patch 1 - improves link detection when dp83867 PHY is configured in manual mode by enabling CFG3[9] Robust Auto-MDIX option. Patch 2 - is minor optimization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-25 19:24:47 -07:00

1 2 3 4 5 ...

872588 Commits