linux

iv/linux

Author	SHA1	Message	Date
Andrii Nakryiko	6e05abc9ab	selftests/bpf: Convert test_btf_dump into test_progs test Convert test_btf_dump into a part of test_progs, instead of a stand-alone test binary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191008231009.2991130-3-andriin@fb.com	2019-10-09 15:38:36 -07:00
Andrii Nakryiko	b4099769f3	libbpf: Fix struct end padding in btf_dump Fix a case where explicit padding at the end of a struct is necessary due to non-standart alignment requirements of fields (which BTF doesn't capture explicitly). Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion") Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191008231009.2991130-2-andriin@fb.com	2019-10-09 15:38:36 -07:00
Daniel Borkmann	f05c2001ec	Merge branch 'bpf-libbpf-helpers' Andrii Nakryiko says: ==================== This patch set makes bpf_helpers.h and bpf_endian.h a part of libbpf itself for consumption by user BPF programs, not just selftests. It also splits off tracing helpers into bpf_tracing.h, which also becomes part of libbpf. Some of the legacy stuff (BPF_ANNOTATE_KV_PAIR, load_{byte,half,word}, bpf_map_def with unsupported fields, etc, is extracted into selftests-only bpf_legacy.h. All the selftests and samples are switched to use libbpf's headers and selftests' ones are removed. As part of this patch set we also add BPF_CORE_READ variadic macros, that are simplifying BPF CO-RE reads, especially the ones that have to follow few pointers. E.g., what in non-BPF world (and when using BCC) would be: int x = s->a->b.c->d; /* s, a, and b.c are pointers / Today would have to be written using explicit bpf_probe_read() calls as: void t; int x; bpf_probe_read(&t, sizeof(t), s->a); bpf_probe_read(&t, sizeof(t), ((struct b )t)->b.c); bpf_probe_read(&x, sizeof(x), ((struct c )t)->d); This is super inconvenient and distracts from program logic a lot. Now, with added BPF_CORE_READ() macros, you can write the above as: int x = BPF_CORE_READ(s, a, b.c, d); Up to 9 levels of pointer chasing are supported, which should be enough for any practical purpose, hopefully, without adding too much boilerplate macro definitions (though there is admittedly some, given how variadic and recursive C macro have to be implemented). There is also BPF_CORE_READ_INTO() variant, which relies on caller to allocate space for result: int x; BPF_CORE_READ_INTO(&x, s, a, b.c, d); Result of last bpf_probe_read() call in the chain of calls is the result of BPF_CORE_READ_INTO(). If any intermediate bpf_probe_read() aall fails, then all the subsequent ones will fail too, so this is sufficient to know whether overall "operation" succeeded or not. No short-circuiting of bpf_probe_read()s is done, though. BPF_CORE_READ_STR_INTO() is added as well, which differs from BPF_CORE_READ_INTO() only in that last bpf_probe_read() call (to read final field after chasing pointers) is replaced with bpf_probe_read_str(). Result of bpf_probe_read_str() is returned as a result of BPF_CORE_READ_STR_INTO() macro itself, so that applications can track return code and/or length of read string. Patch set outline: - patch #1 undoes previously added GCC-specific bpf-helpers.h include; - patch #2 splits off legacy stuff we don't want to carry over; - patch #3 adjusts CO-RE reloc tests to avoid subsequent naming conflict with BPF_CORE_READ; - patch #4 splits off bpf_tracing.h; - patch #5 moves bpf_{helpers,endian,tracing}.h and bpf_helper_defs.h generation into libbpf and adjusts Makefiles to include libbpf for header search; - patch #6 adds variadic BPF_CORE_READ() macro family, as described above; - patch #7 adds tests to verify all possible levels of pointer nestedness for BPF_CORE_READ(), as well as correctness test for BPF_CORE_READ_STR_INTO(). v4->v5: - move BPF_CORE_READ() stuff into bpf_core_read.h header (Alexei); v3->v4: - rebase on latest bpf-next master; - bpf_helper_defs.h generation is moved into libbpf's Makefile; v2->v3: - small formatting fixes and macro () fixes (Song); v1->v2: - fix CO-RE reloc tests before bpf_helpers.h move (Song); - split off legacy stuff we don't want to carry over (Daniel, Toke); - split off bpf_tracing.h (Daniel); - fix samples/bpf build (assuming other fixes are applied); - switch remaining maps either to bpf_map_def_legacy or BTF-defined maps; ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-10-08 23:18:19 +02:00
Andrii Nakryiko	ee2eb063d3	selftests/bpf: Add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests Validate BPF_CORE_READ correctness and handling of up to 9 levels of nestedness using cyclic task->(group_leader->)*->tgid chains. Also add a test of maximum-dpeth BPF_CORE_READ_STR_INTO() macro. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-8-andriin@fb.com	2019-10-08 23:16:04 +02:00
Andrii Nakryiko	7db3822ab9	libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers Add few macros simplifying BCC-like multi-level probe reads, while also emitting CO-RE relocations for each read. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-7-andriin@fb.com	2019-10-08 23:16:03 +02:00
Andrii Nakryiko	e01a75c159	libbpf: Move bpf_{helpers, helper_defs, endian, tracing}.h into libbpf Move bpf_helpers.h, bpf_tracing.h, and bpf_endian.h into libbpf. Move bpf_helper_defs.h generation into libbpf's Makefile. Ensure all those headers are installed along the other libbpf headers. Also, adjust selftests and samples include path to include libbpf now. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-6-andriin@fb.com	2019-10-08 23:16:03 +02:00
Andrii Nakryiko	3ac4dbe3dd	selftests/bpf: Split off tracing-only helpers into bpf_tracing.h Split-off PT_REGS-related helpers into bpf_tracing.h header. Adjust selftests and samples to include it where necessary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-5-andriin@fb.com	2019-10-08 23:16:03 +02:00
Andrii Nakryiko	694731e8ea	selftests/bpf: Adjust CO-RE reloc tests for new bpf_core_read() macro To allow adding a variadic BPF_CORE_READ macro with slightly different syntax and semantics, define CORE_READ in CO-RE reloc tests, which is a thin wrapper around low-level bpf_core_read() macro, which in turn is just a wrapper around bpf_probe_read(). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-4-andriin@fb.com	2019-10-08 23:16:03 +02:00
Andrii Nakryiko	36b5d47113	selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h Split off few legacy things from bpf_helpers.h into separate bpf_legacy.h file: - load_{byte\|half\|word}; - remove extra inner_idx and numa_node fields from bpf_map_def and introduce bpf_map_def_legacy for use in samples; - move BPF_ANNOTATE_KV_PAIR into bpf_legacy.h. Adjust samples and selftests accordingly by either including bpf_legacy.h and using bpf_map_def_legacy, or switching to BTF-defined maps altogether. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-3-andriin@fb.com	2019-10-08 23:16:03 +02:00
Andrii Nakryiko	cf0e9718da	selftests/bpf: Undo GCC-specific bpf_helpers.h changes Having GCC provide its own bpf-helper.h is not the right approach and is going to be changed. Undo bpf_helpers.h change before moving bpf_helpers.h into libbpf. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191008175942.1769476-2-andriin@fb.com	2019-10-08 23:16:03 +02:00
Daniel T. Lee	8fdf5b780a	samples: bpf: Add max_pckt_size option at xdp_adjust_tail Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited to 600. To make this size flexible, static global variable 'max_pcktsz' is added. By updating new packet size from the user space, xdp_adjust_tail_kern.o will use this value as a new max packet size. This static global variable can be accesible from .data section with bpf_object__find_map* from user space, since it is considered as internal map (accessible with .bss/.data/.rodata suffix). If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet will be 600 as a default. For clarity, change the helper to fetch map from 'bpf_map__next' to 'bpf_object__find_map_fd_by_name'. Also, changed the way to test prog_fd, map_fd from '!= 0' to '< 0', since fd could be 0 when stdin is closed. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191007172117.3916-1-danieltimlee@gmail.com	2019-10-07 20:22:27 -07:00
Alexei Starovoitov	72ccd9200f	Merge branch 'enforce-global-flow-dissector' Stanislav Fomichev says: ==================== While having a per-net-ns flow dissector programs is convenient for testing, security-wise it's better to have only one vetted global flow dissector implementation. Let's have a convention that when BPF flow dissector is installed in the root namespace, child namespaces can't override it. The intended use-case is to attach global BPF flow dissector early from the init scripts/systemd. Attaching global dissector is prohibited if some non-root namespace already has flow dissector attached. Also, attaching to non-root namespace is prohibited when there is flow dissector attached to the root namespace. v3: * drop extra check and empty line (Andrii Nakryiko) v2: * EPERM -> EEXIST (Song Liu) * Make sure we don't have dissector attached to non-root namespaces when attaching the global one (Andrii Nakryiko) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-07 20:16:34 -07:00
Stanislav Fomichev	1d9626dc08	selftests/bpf: add test for BPF flow dissector in the root namespace Make sure non-root namespaces get an error if root flow dissector is attached. Cc: Petar Penkov <ppenkov@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-07 20:16:33 -07:00
Stanislav Fomichev	a11c397c43	bpf/flow_dissector: add mode to enforce global BPF flow dissector Always use init_net flow dissector BPF program if it's attached and fall back to the per-net namespace one. Also, deny installing new programs if there is already one attached to the root namespace. Users can still detach their BPF programs, but can't attach any new ones (-EEXIST). Cc: Petar Penkov <ppenkov@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-07 20:16:33 -07:00
Anton Ivanov	4564a8bb57	samples/bpf: Trivial - fix spelling mistake in usage Fix spelling mistake. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191007082636.14686-1-anton.ivanov@cambridgegreys.com	2019-10-07 20:12:55 -07:00
Andrii Nakryiko	32e3e58e4c	bpftool: Fix bpftool build by switching to bpf_object__open_file() As part of libbpf in 5e61f2707029 ("libbpf: stop enforcing kern_version, populate it for users") non-LIBBPF_API __bpf_object__open_xattr() API was removed from libbpf.h header. This broke bpftool, which relied on that function. This patch fixes the build by switching to newly added bpf_object__open_file() which provides the same capabilities, but is official and future-proof API. v1->v2: - fix prog_type shadowing (Stanislav). Fixes: 5e61f2707029 ("libbpf: stop enforcing kern_version, populate it for users") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20191007225604.2006146-1-andriin@fb.com	2019-10-07 18:44:28 -07:00
Andrii Nakryiko	dcb5f40054	selftests/bpf: Fix dependency ordering for attach_probe test Current Makefile dependency chain is not strict enough and allows test_attach_probe.o to be built before test_progs's prog_test/attach_probe.o is built, which leads to assembler complaining about missing included binary. This patch is a minimal fix to fix this issue by enforcing that test_attach_probe.o (BPF object file) is built before prog_tests/attach_probe.c is attempted to be compiled. Fixes: 928ca75e59d7 ("selftests/bpf: switch tests to new bpf_object__open_{file, mem}() APIs") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191007204149.1575990-1-andriin@fb.com	2019-10-07 13:47:04 -07:00
Alexei Starovoitov	05949f6305	Merge branch 'autogen-bpf-helpers' Andrii Nakryiko says: ==================== This patch set adds ability to auto-generate list of BPF helper definitions. It relies on existing scripts/bpf_helpers_doc.py and include/uapi/linux/bpf.h having a well-defined set of comments. bpf_helper_defs.h contains all BPF helper signatures which stay in sync with latest bpf.h UAPI. This auto-generated header is included from bpf_helpers.h, while all previously hand-written BPF helper definitions are simultaneously removed in patch #3. The end result is less manually maintained and redundant boilerplate code, while also more consistent and well-documented set of BPF helpers. Generated helper definitions are completely independent from a specific bpf.h on a target system, because it doesn't use BPF_FUNC_xxx enums. v3->v4: - instead of libbpf's Makefile, integrate with selftest/bpf's Makefile (Alexei); v2->v3: - delete bpf_helper_defs.h properly (Alexei); v1->v2: - add bpf_helper_defs.h to .gitignore and `make clean` (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-06 22:29:37 -07:00
Andrii Nakryiko	24f25763d6	libbpf: auto-generate list of BPF helper definitions Get rid of list of BPF helpers in bpf_helpers.h (irony...) and auto-generate it into bpf_helpers_defs.h, which is now included from bpf_helpers.h. Suggested-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-06 22:29:36 -07:00
Andrii Nakryiko	7a387bed47	scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions Enhance scripts/bpf_helpers_doc.py to emit C header with BPF helper definitions (to be included from libbpf's bpf_helpers.h). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-06 22:29:36 -07:00
Andrii Nakryiko	5f0e541278	uapi/bpf: fix helper docs Various small fixes to BPF helper documentation comments, enabling automatic header generation with a list of BPF helpers. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-06 22:29:36 -07:00
Toke Høiland-Jørgensen	a9eb048d56	libbpf: Add cscope and tags targets to Makefile Using cscope and/or TAGS files for navigating the source code is useful. Add simple targets to the Makefile to generate the index files for both tools. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191004153444.1711278-1-toke@redhat.com	2019-10-05 18:24:56 -07:00
Alexei Starovoitov	b84fbfe2ce	Merge branch 'libbpf-api' Andrii Nakryiko says: ==================== Add bpf_object__open_file() and bpf_object__open_mem() APIs that use a new approach to providing future-proof non-ABI-breaking API changes. It relies on APIs accepting optional self-describing "opts" struct, containing its own size, filled out and provided by potentially outdated (as well as newer-than-libbpf) user application. A set of internal helper macros (OPTS_VALID, OPTS_HAS, and OPTS_GET) streamline and simplify a graceful handling forward and backward compatibility for user applications dynamically linked against different versions of libbpf shared library. Users of libbpf are provided with convenience macro LIBBPF_OPTS that takes care of populating correct structure size and zero-initializes options struct, which helps avoid obscure issues of unitialized padding. Uninitialized padding in a struct might turn into garbage-populated new fields understood by future versions of libbpf. Patch #1 removes enforcement of kern_version in libbpf and always populates correct one on behalf of users. Patch #2 defines necessary infrastructure for options and two new open APIs relying on it. Patch #3 fixes bug in bpf_object__name(). Patch #4 switches two of test_progs' tests to use new APIs as a validation that they work as expected. v2->v3: - fix LIBBPF_OPTS() to ensure zero-initialization of padded bytes; - pass through name override and relaxed maps flag for open_file() (Toke); - fix bpf_object__name() to actually return object name; - don't bother parsing and verifying version section (John); v1->v2: - use better approach for tracking last field in opts struct; - convert few tests to new APIs for validation; - fix bug with using offsetof(last_field) instead of offsetofend(last_field). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-05 18:09:49 -07:00
Andrii Nakryiko	928ca75e59	selftests/bpf: switch tests to new bpf_object__open_{file, mem}() APIs Verify new bpf_object__open_mem() and bpf_object__open_file() APIs work as expected by switching test_attach_probe test to use embedded BPF object and bpf_object__open_mem() and test_reference_tracking to bpf_object__open_file(). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-05 18:09:48 -07:00
Andrii Nakryiko	c9e4c3010c	libbpf: fix bpf_object__name() to actually return object name bpf_object__name() was returning file path, not name. Fix this. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-05 18:09:48 -07:00
Andrii Nakryiko	2ce8450ef5	libbpf: add bpf_object__open_{file, mem} w/ extensible opts Add new set of bpf_object__open APIs using new approach to optional parameters extensibility allowing simpler ABI compatibility approach. This patch demonstrates an approach to implementing libbpf APIs that makes it easy to extend existing APIs with extra optional parameters in such a way, that ABI compatibility is preserved without having to do symbol versioning and generating lots of boilerplate code to handle it. To facilitate succinct code for working with options, add OPTS_VALID, OPTS_HAS, and OPTS_GET macros that hide all the NULL, size, and zero checks. Additionally, newly added libbpf APIs are encouraged to follow similar pattern of having all mandatory parameters as formal function parameters and always have optional (NULL-able) xxx_opts struct, which should always have real struct size as a first field and the rest would be optional parameters added over time, which tune the behavior of existing API, if specified by user. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-05 18:09:47 -07:00
Andrii Nakryiko	5e61f27070	libbpf: stop enforcing kern_version, populate it for users Kernel version enforcement for kprobes/kretprobes was removed from 5.0 kernel in 6c4fc209fcf9 ("bpf: remove useless version check for prog load"). Since then, BPF programs were specifying SEC("version") just to please libbpf. We should stop enforcing this in libbpf, if even kernel doesn't care. Furthermore, libbpf now will pre-populate current kernel version of the host system, in case we are still running on old kernel. This patch also removes __bpf_object__open_xattr from libbpf.h, as nothing in libbpf is relying on having it in that header. That function was never exported as LIBBPF_API and even name suggests its internal version. So this should be safe to remove, as it doesn't break ABI. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-10-05 18:09:47 -07:00
Andrii Nakryiko	a53ba15d81	libbpf: Fix BTF-defined map's __type macro handling of arrays Due to a quirky C syntax of declaring pointers to array or function prototype, existing __type() macro doesn't work with map key/value types that are array or function prototype. One has to create a typedef first and use it to specify key/value type for a BPF map. By using typeof(), pointer to type is now handled uniformly for all kinds of types. Convert one of self-tests as a demonstration. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191004040211.2434033-1-andriin@fb.com	2019-10-05 18:03:12 -07:00
Daniel Borkmann	4bbbf164f1	bpf: Add loop test case with 32 bit reg comparison against 0 Add a loop test with 32 bit register against 0 immediate: # ./test_verifier 631 #631/p taken loop with back jump to 1st insn, 2 OK Disassembly: [...] 1b: test %edi,%edi 1d: jne 0x0000000000000014 [...] Pretty much similar to prior "taken loop with back jump to 1st insn" test case just as jmp32 variant. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com>	2019-10-04 12:27:36 -07:00
Daniel Borkmann	38f51c0705	bpf, x86: Small optimization in comparing against imm0 Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against zero. Saves 1 byte of instruction encoding per occurrence. The flag results of test 'reg, reg' are identical to 'cmp reg, 0' in all cases except for AF which we don't use/care about. In terms of macro-fusibility in combination with a subsequent conditional jump instruction, both have the same properties for the jumps used in the JIT translation. For example, same JITed Cilium program can shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are used quite frequently. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com>	2019-10-04 12:26:51 -07:00
Ivan Khoronzhuk	c588146378	selftests/bpf: Correct path to include msg + path The "path" buf is supposed to contain path + printf msg up to 24 bytes. It will be cut anyway, but compiler generates truncation warns like: " samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In function ‘setup_cgroup_environment’: samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34: warning: ‘/cgroup.controllers’ directive output may be truncated writing 19 bytes into a region of size between 1 and 4097 [-Wformat-truncation=] snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path); ^~~~~~~~~~~~~~~~~~~ samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2: note: ‘snprintf’ output between 20 and 4116 bytes into a destination of size 4097 snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34: warning: ‘/cgroup.subtree_control’ directive output may be truncated writing 23 bytes into a region of size between 1 and 4097 [-Wformat-truncation=] snprintf(path, sizeof(path), "%s/cgroup.subtree_control", ^~~~~~~~~~~~~~~~~~~~~~~ cgroup_path); samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2: note: ‘snprintf’ output between 24 and 4120 bytes into a destination of size 4097 snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path); " In order to avoid warns, lets decrease buf size for cgroup workdir on 24 bytes with assumption to include also "/cgroup.subtree_control" to the address. The cut will never happen anyway. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.org	2019-10-03 17:21:57 +02:00
Ivan Khoronzhuk	fb27dcd290	selftests/bpf: Add static to enable_all_controllers() Add static to enable_all_controllers() to get rid from annoying warning during samples/bpf build: samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:44:5: warning: no previous prototype for ‘enable_all_controllers’ [-Wmissing-prototypes] int enable_all_controllers(char *cgroup_path) Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191002120404.26962-2-ivan.khoronzhuk@linaro.org	2019-10-03 17:21:35 +02:00
Andrii Nakryiko	03bd4773d8	libbpf: Bump current version to v0.0.6 New release cycle started, let's bump to v0.0.6 proactively. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20190930222503.519782-1-andriin@fb.com	2019-10-02 01:05:10 +02:00
Simon Horman	37a2fce090	dt-bindings: sh_eth convert bindings to json-schema Convert Renesas Electronics SH EtherMAC bindings documentation to json-schema. Also name bindings documentation file according to the compat string being documented. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 15:31:33 -07:00
Peter Fink	9fb137aef3	net: usb: ax88179_178a: allow optionally getting mac address from device tree Adopt and integrate the feature to pass the MAC address via device tree from asix_device.c (03fc5d4) also to other ax88179 based asix chips. E.g. the bootloader fills in local-mac-address and the driver will then pick up and use this MAC address. Signed-off-by: Peter Fink <pfink@christ-es.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 15:19:28 -07:00
Nicolas Dichtel	0d7982ce6e	ipv6: minor code reorg in inet6_fill_ifla6_attrs() Just put related code together to ease code reading: the memcpy() is related to the nla_reserve(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:59:03 -07:00
David S. Miller	7a56493f06	Merge branch 'netdev-altnames' Jiri Pirko says: ==================== net: introduce alternative names for network interfaces In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for netdevice name length. Now when we have PF and VF representors with port names like "pfXvfY", it became quite common to hit this limit: 0123456789012345 enp131s0f1npf0vf6 enp131s0f1npf0vf22 Udev cannot rename these interfaces out-of-the-box and user needs to create custom rules to handle them. Also, udev has multiple schemes of netdev names. From udev code: * Type of names: * b<number> - BCMA bus core number * c<bus_id> - bus id of a grouped CCW or CCW device, * with all leading zeros stripped [s390] * o<index>[n<phys_port_name>\|d<dev_port>] * - on-board device index number * s<slot>[f<function>][n<phys_port_name>\|d<dev_port>] * - hotplug slot index number * x<MAC> - MAC address * [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>\|d<dev_port>] * - PCI geographical location * [P<domain>]p<bus>s<slot>[f<function>][u<port>][..][c<config>][i<interface>] * - USB port number chain * v<slot> - VIO slot number (IBM PowerVM) * a<vendor><model>i<instance> - Platform bus ACPI instance id * i<addr>n<phys_port_name> - Netdevsim bus address and port name One device can be often renamed by multiple patterns at the same time (e.g. pci address/mac). This patchset introduces alternative names for network interfaces. Main goal is to: 1) Overcome the IFNAMSIZ limitation (altname limitation is 128 bytes) 2) Allow to have multiple names at the same time (multiple udev patterns) 3) Allow to use alternative names as handle for commands The patchset introduces two new commands to add/delete list of properties. Currently only alternative names are implemented but the ifrastructure could be easily extended later on. This is very similar to the list of vlan and tunnels being added/deleted to/from bridge ports. See following examples. $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff -> Add alternative names for dummy0: $ ip link prop add dummy0 altname someothername $ ip link prop add dummy0 altname someotherveryveryveryverylongname $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname someothername altname someotherveryveryveryverylongname $ ip link show someotherveryveryveryverylongname 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname someothername altname someotherveryveryveryverylongname -> Add bridge brx, add it's alternative name and use alternative names to do enslavement. $ ip link add name brx type bridge $ ip link prop add brx altname mypersonalsuperspecialbridge $ ip link set someotherveryveryveryverylongname master mypersonalsuperspecialbridge $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname someothername altname someotherveryveryveryverylongname 3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname mypersonalsuperspecialbridge -> Add ipv4 address to the bridge using alternative name: $ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge $ ip addr show mypersonalsuperspecialbridge 3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname mypersonalsuperspecialbridge inet 192.168.0.1/24 scope global brx valid_lft forever preferred_lft forever -> Delete one of dummy0 alternative names: $ ip link prop del dummy0 altname someotherveryveryveryverylongname $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname someothername 3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname mypersonalsuperspecialbridge -> Add multiple alternative names at once $ ip link prop add dummy0 altname a altname b altname c altname d $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname someothername altname a altname b altname c altname d 3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff altname mypersonalsuperspecialbridge ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	76c9ac0ee8	net: rtnetlink: add possibility to use alternative names as message handle Extend the basic rtnetlink commands to use alternative interface names as a handle instead of ifindex and ifname. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	cc6090e985	net: rtnetlink: introduce helper to get net_device instance by ifname Introduce helper function rtnl_get_dev() that gets net_device structure instance pointer according to passed ifname or ifname attribute. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	7af12cba4e	net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest __rtnl_newlink() code flow is a bit different around tb[IFLA_IFNAME] processing comparing to the other places. Change that to be unified with the rest. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	88f4fb0c74	net: rtnetlink: put alternative names to getlink message Extend exiting getlink info message with list of properties. Now the only ones are alternative names. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	36fbf1e52b	net: rtnetlink: add linkprop commands to add and delete alternative ifnames Add two commands to add and delete list of link properties. Implement the first property type along - alternative ifnames. Each net device can have multiple alternative names. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	ff92741270	net: introduce name_node struct to be used in hashlist Introduce name_node structure to hold name of device and put it into hashlist instead of putting there struct net_device directly. Add a necessary infrastructure to manipulate the hashlist. This prepares the code to use the same hashlist for alternative names introduced later in this set. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Jiri Pirko	6958c97a48	net: procfs: use index hashlist instead of name hashlist Name hashlist is going to be used for more than just dev->name, so use rather index hashlist for iteration over net_device instances. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 14:47:19 -07:00
Eric Dumazet	be2644aac3	tcp: add ipv6_addr_v4mapped_loopback() helper tcp_twsk_unique() has a hard coded assumption about ipv4 loopback being 127/8 Lets instead use the standard ipv4_is_loopback() method, in a new ipv6_addr_v4mapped_loopback() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 13:07:53 -07:00
Julio Faracco	5be5515a8e	net: core: dev: replace state xoff flag comparison by netif_xmit_stopped method Function netif_schedule_queue() has a hardcoded comparison between queue state and any xoff flag. This comparison does the same thing as method netif_xmit_stopped(). In terms of code clarity, it is better. See other methods like: generic_xdp_tx() and dev_direct_xmit(). Signed-off-by: Julio Faracco <jcfaracco@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 09:26:03 -07:00
Prashant Malani	5f71c84038	r8152: Factor out OOB link list waits The same for-loop check for the LINK_LIST_READY bit of an OOB_CTRL register is used in several places. Factor these out into a single function to reduce the lines of code. Change-Id: I20e8f327045a72acc0a83e2d145ae2993ab62915 Signed-off-by: Prashant Malani <pmalani@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-01 09:14:37 -07:00
Linus Torvalds	02dc96ef6c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Sanity check URB networking device parameters to avoid divide by zero, from Oliver Neukum. 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6 don't work properly. Longer term this needs a better fix tho. From Vijay Khemka. 3) Small fixes to selftests (use ping when ping6 is not present, etc.) from David Ahern. 4) Bring back rt_uses_gateway member of struct rtable, it's semantics were not well understood and trying to remove it broke things. From David Ahern. 5) Move usbnet snaity checking, ignore endpoints with invalid wMaxPacketSize. From Bjørn Mork. 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan. 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel, Alex Vesker, and Yevgeny Kliteynik 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron. 9) Fix crash when removing sch_cbs entry while offloading is enabled, from Vinicius Costa Gomes. 10) Signedness bug fixes, generally in looking at the result given by of_get_phy_mode() and friends. From Dan Crapenter. 11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet. 12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern. 13) Fix quantization code in tcp_bbr, from Kevin Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits) net: tap: clean up an indentation issue nfp: abm: fix memory leak in nfp_abm_u32_knode_replace tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state sk_buff: drop all skb extensions on free and skb scrubbing tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions Documentation: Clarify trap's description mlxsw: spectrum: Clear VLAN filters during port initialization net: ena: clean up indentation issue NFC: st95hf: clean up indentation issue net: phy: micrel: add Asym Pause workaround for KSZ9021 net: socionext: ave: Avoid using netdev_err() before calling register_netdev() ptp: correctly disable flags on old ioctls lib: dimlib: fix help text typos net: dsa: microchip: Always set regmap stride to 1 nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled net: sched: sch_sfb: don't call qdisc_put() while holding tree lock ...	2019-09-28 17:47:33 -07:00
Linus Torvalds	edf445ad7c	Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) Merge hugepage allocation updates from David Rientjes: "We (mostly Linus, Andrea, and myself) have been discussing offlist how to implement a sane default allocation strategy for hugepages on NUMA platforms. With these reverts in place, the page allocator will happily allocate a remote hugepage immediately rather than try to make a local hugepage available. This incurs a substantial performance degradation when memory compaction would have otherwise made a local hugepage available. This series reverts those reverts and attempts to propose a more sane default allocation strategy specifically for hugepages. Andrea acknowledges this is likely to fix the swap storms that he originally reported that resulted in the patches that removed __GFP_THISNODE from hugepage allocations. The immediate goal is to return 5.3 to the behavior the kernel has implemented over the past several years so that remote hugepages are not immediately allocated when local hugepages could have been made available because the increased access latency is untenable. The next goal is to introduce a sane default allocation strategy for hugepages allocations in general regardless of the configuration of the system so that we prevent thrashing of local memory when compaction is unlikely to succeed and can prefer remote hugepages over remote native pages when the local node is low on memory." Note on timing: this reverts the hugepage VM behavior changes that got introduced fairly late in the 5.3 cycle, and that fixed a huge performance regression for certain loads that had been around since 4.18. Andrea had this note: "The regression of 4.18 was that it was taking hours to start a VM where 3.10 was only taking a few seconds, I reported all the details on lkml when it was finally tracked down in August 2018. https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/ __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio workload degrade like in the "current upstream" above. And it still would have been that bad as above until 5.3-rc5" where the bad behavior ends up happening as you fill up a local node, and without that change, you'd get into the nasty swap storm behavior due to compaction working overtime to make room for more memory on the nodes. As a result 5.3 got the two performance fix reverts in rc5. However, David Rientjes then noted that those performance fixes in turn regressed performance for other loads - although not quite to the same degree. He suggested reverting the reverts and instead replacing them with two small changes to how hugepage allocations are done (patch descriptions rephrased by me): - "avoid expensive reclaim when compaction may not succeed": just admit that the allocation failed when you're trying to allocate a huge-page and compaction wasn't successful. - "allow hugepage fallback to remote nodes when madvised": when that node-local huge-page allocation failed, retry without forcing the local node. but by then I judged it too late to replace the fixes for a 5.3 release. So 5.3 was released with behavior that harked back to the pre-4.18 logic. But now we're in the merge window for 5.4, and we can see if this alternate model fixes not just the horrendous swap storm behavior, but also restores the performance regression that the late reverts caused. Fingers crossed. * emailed patches from David Rientjes <rientjes@google.com>: mm, page_alloc: allow hugepage fallback to remote nodes when madvised mm, page_alloc: avoid expensive reclaim when compaction may not succeed Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Revert "Revert "mm, thp: restore node-local hugepage allocations""	2019-09-28 14:26:47 -07:00
David Rientjes	76e654cc91	mm, page_alloc: allow hugepage fallback to remote nodes when madvised For systems configured to always try hard to allocate transparent hugepages (thp defrag setting of "always") or for memory that has been explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to remote memory to allocate the hugepage if the local allocation fails first. The point is to allow the initial call to __alloc_pages_node() to attempt to defragment local memory to make a hugepage available, if possible, rather than immediately fallback to remote memory. Local hugepages will always have a better access latency than remote (huge)pages, so an attempt to make a hugepage available locally is always preferred. If memory compaction cannot be successful locally, however, it is likely better to fallback to remote memory. This could take on two forms: either allow immediate fallback to remote memory or do per-zone watermark checks. It would be possible to fallback only when per-zone watermarks fail for order-0 memory, since that would require local reclaim for all subsequent faults so remote huge allocation is likely better than thrashing the local zone for large workloads. In this case, it is assumed that because the system is configured to try hard to allocate hugepages or the vma is advised to explicitly want to try hard for hugepages that remote allocation is better when local allocation and memory compaction have both failed. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-28 14:05:38 -07:00

1 2 3 4 5 ...

870910 Commits