linux

iv/linux

Author	SHA1	Message	Date
Joe Stringer	a610b665ec	Documentation: Describe bpf reference tracking Document the new pointer types in the verifier and how the pointer ID tracking works to ensure that references which are taken are later released. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:48 +02:00
Joe Stringer	de375f4e91	selftests/bpf: Add C tests for reference tracking Add some tests that demonstrate and test the balanced lookup/free nature of socket lookup. Section names that start with "fail" represent programs that are expected to fail verification; all others should succeed. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:48 +02:00
Joe Stringer	29cd77f416	libbpf: Support loading individual progs Allow the individual program load to be invoked. This will help with testing, where a single ELF may contain several sections, some of which denote subprograms that are expected to fail verification, along with some which are expected to pass verification. By allowing programs to be iterated and individually loaded, each program can be independently checked against its expected verification result. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:48 +02:00
Joe Stringer	b584ab8840	selftests/bpf: Add tests for reference tracking reference tracking: leak potential reference reference tracking: leak potential reference on stack reference tracking: leak potential reference on stack 2 reference tracking: zero potential reference reference tracking: copy and zero potential references reference tracking: release reference without check reference tracking: release reference reference tracking: release reference twice reference tracking: release reference twice inside branch reference tracking: alloc, check, free in one subbranch reference tracking: alloc, check, free in both subbranches reference tracking in call: free reference in subprog reference tracking in call: free reference in subprog and outside reference tracking in call: alloc & leak reference in subprog reference tracking in call: alloc in subprog, release outside reference tracking in call: sk_ptr leak into caller stack reference tracking in call: sk_ptr spill into caller stack reference tracking: allow LD_ABS reference tracking: forbid LD_ABS while holding reference reference tracking: allow LD_IND reference tracking: forbid LD_IND while holding reference reference tracking: check reference or tail call reference tracking: release reference then tail call reference tracking: leak possible reference over tail call reference tracking: leak checked reference over tail call reference tracking: mangle and release sock_or_null reference tracking: mangle and release sock reference tracking: access member reference tracking: write to member reference tracking: invalid 64-bit access of member reference tracking: access after release reference tracking: direct access for lookup unpriv: spill/fill of different pointers stx - ctx and sock unpriv: spill/fill of different pointers stx - leak sock unpriv: spill/fill of different pointers stx - sock and ctx (read) unpriv: spill/fill of different pointers stx - sock and ctx (write) Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:48 +02:00
Joe Stringer	0c586079f8	selftests/bpf: Generalize dummy program types Don't hardcode the dummy program types to SOCKET_FILTER type, as this prevents testing bpf_tail_call in conjunction with other program types. Instead, use the program type specified in the test case. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:48 +02:00
Joe Stringer	6acc9b432e	bpf: Add helper to retrieve socket in BPF This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and bpf_sk_lookup_udp() which allows BPF programs to find out if there is a socket listening on this host, and returns a socket pointer which the BPF program can then access to determine, for instance, whether to forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the socket, so when a BPF program makes use of this function, it must subsequently pass the returned pointer into the newly added sk_release() to return the reference. By way of example, the following pseudocode would filter inbound connections at XDP if there is no corresponding service listening for the traffic: struct bpf_sock_tuple tuple; struct bpf_sock_ops *sk; populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); if (!sk) { // Couldn't find a socket listening for this traffic. Drop. return TC_ACT_SHOT; } bpf_sk_release(sk, 0); return TC_ACT_OK; Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	fd978bf7fd	bpf: Add reference tracking to verifier Allow helper functions to acquire a reference and return it into a register. Specific pointer types such as the PTR_TO_SOCKET will implicitly represent such a reference. The verifier must ensure that these references are released exactly once in each path through the program. To achieve this, this commit assigns an id to the pointer and tracks it in the 'bpf_func_state', then when the function or program exits, verifies that all of the acquired references have been freed. When the pointer is passed to a function that frees the reference, it is removed from the 'bpf_func_state` and all existing copies of the pointer in registers are marked invalid. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	84dbf35073	bpf: Macrofy stack state copy An upcoming commit will need very similar copy/realloc boilerplate, so refactor the existing stack copy/realloc functions into macros to simplify it. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	c64b798328	bpf: Add PTR_TO_SOCKET verifier type Teach the verifier a little bit about a new type of pointer, a PTR_TO_SOCKET. This pointer type is accessed from BPF through the 'struct bpf_sock' structure. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	840b9615d6	bpf: Generalize ptr_or_null regs check This check will be reused by an upcoming commit for conditional jump checks for sockets. Refactor it a bit to simplify the later commit. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	9d2be44a7f	bpf: Reuse canonical string formatter for ctx errs The array "reg_type_str" provides canonical formatting of register types, however a couple of places would previously check whether a register represented the context and write the name "context" directly. An upcoming commit will add another pointer type to these statements, so to provide more accurate error messages in the verifier, update these error messages to use "reg_type_str" instead. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	aad2eeaf46	bpf: Simplify ptr_min_max_vals adjustment An upcoming commit will add another two pointer types that need very similar behaviour, so generalise this function now. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:47 +02:00
Joe Stringer	f3709f69b7	bpf: Add iterator for spilled registers Add this iterator for spilled registers, it concentrates the details of how to get the current frame's spilled registers into a single macro while clarifying the intention of the code which is calling the macro. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-03 02:53:46 +02:00
Daniel Borkmann	940656fb3f	Merge branch 'bpf-big-map-entries' Jakub Kicinski says: ==================== This series makes the control message parsing for interacting with BPF maps more flexible. Up until now we had a hard limit in the ABI for key and value size to be 64B at most. Using TLV capability allows us to support large map entries. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-02 14:40:00 +02:00
Jakub Kicinski	0c9864c05f	nfp: bpf: allow control message sizing for map ops In current ABI the size of the messages carrying map elements was statically defined to at most 16 words of key and 16 words of value (NFP word is 4 bytes). We should not make this assumption and use the max key and value sizes from the BPF capability instead. To make sure old kernels don't get surprised with larger (or smaller) messages bump the FW ABI version to 3 when key/value size is different than 16 words. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-02 14:39:59 +02:00
Jakub Kicinski	9bbdd41b8a	nfp: allow apps to request larger MTU on control vNIC Some apps may want to have higher MTU on the control vNIC/queue. Allow them to set the requested MTU at init time. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-02 14:39:59 +02:00
Jakub Kicinski	28264eb227	nfp: bpf: parse global BPF ABI version capability Up until now we only had per-vNIC BPF ABI version capabilities, which are slightly awkward to use because bulk of the resources and configuration does not relate to any particular vNIC. Add a new capability for global ABI version and check the per-vNIC version are equal to it. Assume the ABI version 2 if no explicit version capability is present. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-02 14:39:58 +02:00
Daniel Borkmann	cb86d0f878	Merge branch 'bpf-per-cpu-cgroup-storage' Roman Gushchin says: ==================== This patchset implements per-cpu cgroup local storage and provides an example how per-cpu and shared cgroup local storage can be used for efficient accounting of network traffic. v4->v3: 1) incorporated Alexei's feedback v3->v2: 1) incorporated Song's feedback 2) rebased on top of current bpf-next v2->v1: 1) added a selftest implementing network counters 2) added a missing free() in cgroup local storage selftest ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:35 +02:00
Roman Gushchin	371e4fcc9d	selftests/bpf: cgroup local storage-based network counters This commit adds a bpf kselftest, which demonstrates how percpu and shared cgroup local storage can be used for efficient lookup-free network accounting. Cgroup local storage provides generic memory area with a very efficient lookup free access. To avoid expensive atomic operations for each packet, per-cpu cgroup local storage is used. Each packet is initially charged to a per-cpu counter, and only if the counter reaches certain value (32 in this case), the charge is moved into the global atomic counter. This allows to amortize atomic operations, keeping reasonable accuracy. The test also implements a naive network traffic throttling, mostly to demonstrate the possibility of bpf cgroup--based network bandwidth control. Expected output: ./test_netcnt test_netcnt:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	5fcbd29b37	samples/bpf: extend test_cgrp2_attach2 test to use per-cpu cgroup storage This commit extends the test_cgrp2_attach2 test to cover per-cpu cgroup storage. Bpf program will use shared and per-cpu cgroup storages simultaneously, so a better coverage of corresponding core code will be achieved. Expected output: $ ./test_cgrp2_attach2 Attached DROP prog. This ping in cgroup /foo should fail... ping: sendmsg: Operation not permitted Attached DROP prog. This ping in cgroup /foo/bar should fail... ping: sendmsg: Operation not permitted Attached PASS prog. This ping in cgroup /foo/bar should pass... Detached PASS from /foo/bar while DROP is attached to /foo. This ping in cgroup /foo/bar should fail... ping: sendmsg: Operation not permitted Attached PASS from /foo/bar and detached DROP from /foo. This ping in cgroup /foo/bar should pass... ### override:PASS ### multi:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	919646d2a3	selftests/bpf: extend the storage test to test per-cpu cgroup storage This test extends the cgroup storage test to use per-cpu flavor of the cgroup storage as well. The test initializes a per-cpu cgroup storage to some non-zero initial value (1000), and then simple bumps a per-cpu counter each time the shared counter is atomically incremented. Then it reads all per-cpu areas from the userspace side, and checks that the sum of values adds to the expected sum. Expected output: $ ./test_cgroup_storage test_cgroup_storage:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	a3c6054f81	selftests/bpf: add verifier per-cpu cgroup storage tests This commits adds verifier tests covering per-cpu cgroup storage functionality. There are 6 new tests, which are exactly the same as for shared cgroup storage, but do use per-cpu cgroup storage map. Expected output: $ ./test_verifier #0/u add+sub+mul OK #0/p add+sub+mul OK ... #286/p invalid cgroup storage access 6 OK #287/p valid per-cpu cgroup storage access OK #288/p invalid per-cpu cgroup storage access 1 OK #289/p invalid per-cpu cgroup storage access 2 OK #290/p invalid per-cpu cgroup storage access 3 OK #291/p invalid per-cpu cgroup storage access 4 OK #292/p invalid per-cpu cgroup storage access 5 OK #293/p invalid per-cpu cgroup storage access 6 OK #294/p multiple registers share map_lookup_elem result OK ... #662/p mov64 src == dst OK #663/p mov64 src != dst OK Summary: 914 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	e54870924f	bpftool: add support for PERCPU_CGROUP_STORAGE maps This commit adds support for BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	25025e0aab	bpf: sync include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.h The sync is required due to the appearance of a new map type: BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, which implements per-cpu cgroup local storage. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	c6fdcd6e0c	bpf: don't allow create maps of per-cpu cgroup local storages Explicitly forbid creating map of per-cpu cgroup local storages. This behavior matches the behavior of shared cgroup storages. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:33 +02:00
Roman Gushchin	b741f16303	bpf: introduce per-cpu cgroup local storage This commit introduced per-cpu cgroup local storage. Per-cpu cgroup local storage is very similar to simple cgroup storage (let's call it shared), except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations. >From userspace's point of view, accessing a per-cpu cgroup storage is similar to other per-cpu map types (e.g. per-cpu hashmaps and arrays). Writing to a per-cpu cgroup storage is not atomic, but is performed by copying longs, so some minimal atomicity is here, exactly as with other per-cpu maps. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:32 +02:00
Roman Gushchin	f294b37ec7	bpf: rework cgroup storage pointer passing To simplify the following introduction of per-cpu cgroup storage, let's rework a bit a mechanism of passing a pointer to a cgroup storage into the bpf_get_local_storage(). Let's save a pointer to the corresponding bpf_cgroup_storage structure, instead of a pointer to the actual buffer. It will help us to handle per-cpu storage later, which has a different way of accessing to the actual data. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:32 +02:00
Roman Gushchin	8bad74f984	bpf: extend cgroup bpf core to allow multiple cgroup storage types In order to introduce per-cpu cgroup storage, let's generalize bpf cgroup core to support multiple cgroup storage types. Potentially, per-node cgroup storage can be added later. This commit is mostly a formal change that replaces cgroup_storage pointer with a array of cgroup_storage pointers. It doesn't actually introduce a new storage type, it will be done later. Each bpf program is now able to have one cgroup storage of each type. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-01 16:18:32 +02:00
Yonghong Song	5bf7a60b8e	bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id() Currently, helper bpf_get_current_cgroup_id() is not permitted for CGROUP_DEVICE type of programs. If the helper is used in such cases, the verifier will log the following error: 0: (bf) r6 = r1 1: (69) r7 = (u16 )(r6 +0) 2: (85) call bpf_get_current_cgroup_id#80 unknown func bpf_get_current_cgroup_id#80 The bpf_get_current_cgroup_id() is useful for CGROUP_DEVICE type of programs in order to customize action based on cgroup id. This patch added such a support. Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-28 14:15:19 +02:00
Daniel Borkmann	78e6e5c11a	Merge branch 'bpf-libbpf-attach-by-name' Andrey Ignatov says: ==================== This patch set introduces libbpf_attach_type_by_name function in libbpf to identify attach type by section name. This is useful to avoid writing same logic over and over again in user space applications that leverage libbpf. Patch 1 has more details on the new function and problem being solved. Patches 2 and 3 add support for new section names. Patch 4 uses new function in a selftest. Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name. As a side note there are a lot of inconsistencies now between names used by libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and device vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not address it but it tries not to make it harder to address it in the future. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:19:34 +02:00
Andrey Ignatov	370920c47b	selftests/bpf: Test libbpf_{prog,attach}_type_by_name Add selftest for libbpf functions libbpf_prog_type_by_name and libbpf_attach_type_by_name. Example of output: % ./tools/testing/selftests/bpf/test_section_names Summary: 35 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:14:59 +02:00
Andrey Ignatov	c9bf507d0a	selftests/bpf: Use libbpf_attach_type_by_name in test_socket_cookie Use newly introduced libbpf_attach_type_by_name in test_socket_cookie selftest. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:14:59 +02:00
Andrey Ignatov	c6f6851b28	libbpf: Support sk_skb/stream_{parser, verdict} section names Add section names for BPF_SK_SKB_STREAM_PARSER and BPF_SK_SKB_STREAM_VERDICT attach types to be able to identify them in libbpf_attach_type_by_name. "stream_parser" and "stream_verdict" are used instead of simple "parser" and "verdict" just to avoid possible confusion in a place where attach type is used alone (e.g. in bpftool's show sub-commands) since there is another attach point that can be named as "verdict": BPF_SK_MSG_VERDICT. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:14:59 +02:00
Andrey Ignatov	bafa7afe63	libbpf: Support cgroup_skb/{e,in}gress section names Add section names for BPF_CGROUP_INET_INGRESS and BPF_CGROUP_INET_EGRESS attach types to be able to identify them in libbpf_attach_type_by_name. "cgroup_skb" is used instead of "cgroup/skb" mostly to easy possible unifying of how libbpf and bpftool works with section names: * bpftool uses "cgroup_skb" to in "prog list" sub-command; * bpftool uses "ingress" and "egress" in "cgroup list" sub-command; * having two parts instead of three in a string like "cgroup_skb/ingress" can be leveraged to split it to prog_type part and attach_type part, or vise versa: use two parts to make a section name. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:14:59 +02:00
Andrey Ignatov	956b620fcf	libbpf: Introduce libbpf_attach_type_by_name There is a common use-case when ELF object contains multiple BPF programs and every program has its own section name. If it's cgroup-bpf then programs have to be 1) loaded and 2) attached to a cgroup. It's convenient to have information necessary to load BPF program together with program itself. This is where section name works fine in conjunction with libbpf_prog_type_by_name that identifies prog_type and expected_attach_type and these can be used with BPF_PROG_LOAD. But there is currently no way to identify attach_type by section name and it leads to messy code in user space that reinvents guessing logic every time it has to identify attach type to use with BPF_PROG_ATTACH. The patch introduces libbpf_attach_type_by_name that guesses attach type by section name if a program can be attached. The difference between expected_attach_type provided by libbpf_prog_type_by_name and attach_type provided by libbpf_attach_type_by_name is the former is used at BPF_PROG_LOAD time and can be zero if a program of prog_type X has only one corresponding attach type Y whether the latter provides specific attach type to use with BPF_PROG_ATTACH. No new section names were added to section_names array. Only existing ones were reorganized and attach_type was added where appropriate. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:14:59 +02:00
Song Liu	100811936f	bpf: test_bpf: add init_net to dev for flow_dissector Latest changes in __skb_flow_dissect() assume skb->dev has valid nd_net. However, this is not true for test_bpf. As a result, test_bpf.ko crashes the system with the following stack trace: [ 1133.716622] BUG: unable to handle kernel paging request at 0000000000001030 [ 1133.716623] PGD 8000001fbf7ee067 [ 1133.716624] P4D 8000001fbf7ee067 [ 1133.716624] PUD 1f6c1cf067 [ 1133.716625] PMD 0 [ 1133.716628] Oops: 0000 [#1] SMP PTI [ 1133.716630] CPU: 7 PID: 40473 Comm: modprobe Kdump: loaded Not tainted 4.19.0-rc5-00805-gca11cc92ccd2 #1167 [ 1133.716631] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM12.5 12/06/2017 [ 1133.716638] RIP: 0010:__skb_flow_dissect+0x83/0x1680 [ 1133.716639] Code: 04 00 00 41 0f b7 44 24 04 48 85 db 4d 8d 14 07 0f 84 01 02 00 00 48 8b 43 10 48 85 c0 0f 84 e5 01 00 00 48 8b 80 a8 04 00 00 <48> 8b 90 30 10 00 00 48 85 d2 0f 84 dd 01 00 00 31 c0 b9 05 00 00 [ 1133.716640] RSP: 0018:ffffc900303c7a80 EFLAGS: 00010282 [ 1133.716642] RAX: 0000000000000000 RBX: ffff881fea0b7400 RCX: 0000000000000000 [ 1133.716643] RDX: ffffc900303c7bb4 RSI: ffffffff8235c3e0 RDI: ffff881fea0b7400 [ 1133.716643] RBP: ffffc900303c7b80 R08: 0000000000000000 R09: 000000000000000e [ 1133.716644] R10: ffffc900303c7bb4 R11: ffff881fb6840400 R12: ffffffff8235c3e0 [ 1133.716645] R13: 0000000000000008 R14: 000000000000001e R15: ffffc900303c7bb4 [ 1133.716646] FS: 00007f54e75d3740(0000) GS:ffff881fff5c0000(0000) knlGS:0000000000000000 [ 1133.716648] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1133.716649] CR2: 0000000000001030 CR3: 0000001f6c226005 CR4: 00000000003606e0 [ 1133.716649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1133.716650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1133.716651] Call Trace: [ 1133.716660] ? sched_clock_cpu+0xc/0xa0 [ 1133.716662] ? sched_clock_cpu+0xc/0xa0 [ 1133.716665] ? log_store+0x1b5/0x260 [ 1133.716667] ? up+0x12/0x60 [ 1133.716669] ? skb_get_poff+0x4b/0xa0 [ 1133.716674] ? __kmalloc_reserve.isra.47+0x2e/0x80 [ 1133.716675] skb_get_poff+0x4b/0xa0 [ 1133.716680] bpf_skb_get_pay_offset+0xa/0x10 [ 1133.716686] ? test_bpf_init+0x578/0x1000 [test_bpf] [ 1133.716690] ? netlink_broadcast_filtered+0x153/0x3d0 [ 1133.716695] ? free_pcppages_bulk+0x324/0x600 [ 1133.716696] ? 0xffffffffa0279000 [ 1133.716699] ? do_one_initcall+0x46/0x1bd [ 1133.716704] ? kmem_cache_alloc_trace+0x144/0x1a0 [ 1133.716709] ? do_init_module+0x5b/0x209 [ 1133.716712] ? load_module+0x2136/0x25d0 [ 1133.716715] ? __do_sys_finit_module+0xba/0xe0 [ 1133.716717] ? __do_sys_finit_module+0xba/0xe0 [ 1133.716719] ? do_syscall_64+0x48/0x100 [ 1133.716724] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch fixes tes_bpf by using init_net in the dummy dev. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:09:45 +02:00
Andrey Ignatov	53d6eb08e9	bpftool: Fix bpftool net output Print `bpftool net` output to stdout instead of stderr. Only errors should be printed to stderr. Regular output should go to stdout and this is what all other subcommands of bpftool do, including --json and --pretty formats of `bpftool net` itself. Fixes: commit f6f3bac08ff9 ("tools/bpf: bpftool: add net support") Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-09-27 21:07:44 +02:00
Maciej Żenczykowski	1042caa79e	net-ipv4: remove 2 always zero parameters from ipv4_redirect() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:30:55 -07:00
Maciej Żenczykowski	d888f39666	net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:30:55 -07:00
Maxime Chevallier	da58a931f2	net: mvneta: Add support for 2500Mbps SGMII The mvneta controller can handle speeds up to 2500Mbps on the SGMII interface. This relies on serdes configuration, the lane must be configured at 3.125Gbps and we can't use in-band autoneg at that speed. The main issue when supporting that speed on this particular controller is that the link partner can send ethernet frames with a shortened preamble, which if not explicitly enabled in the controller will cause unexpected behaviours. This was tested on Armada 385, with the comphy configuration done in bootloader. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:27:09 -07:00
David S. Miller	c09c1474d8	Merge branch 'net-vhost-improve-performance-when-enable-busyloop' Tonghao Zhang says: ==================== net: vhost: improve performance when enable busyloop This patches improve the guest receive performance. On the handle_tx side, we poll the sock receive queue at the same time. handle_rx do that in the same way. For more performance report, see patch 4 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	441abde4cd	net: vhost: add rx busy polling in tx path This patch improves the guest receive performance. On the handle_tx side, we poll the sock receive queue at the same time. handle_rx do that in the same way. We set the poll-us=100us and use the netperf to test throughput and mean latency. When running the tests, the vhost-net kthread of that VM, is alway 100% CPU. The commands are shown as below. Rx performance is greatly improved by this patch. There is not notable performance change on tx with this series though. This patch is useful for bi-directional traffic. netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY" Topology: [Host] ->linux bridge -> tap vhost-net ->[Guest] TCP_STREAM: * Without the patch: 19842.95 Mbps, 6.50 us mean latency * With the patch: 37598.20 Mbps, 3.43 us mean latency Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	dc151282bb	net: vhost: factor out busy polling logic to vhost_net_busy_poll() Factor out generic busy polling logic and will be used for in tx path in the next patch. And with the patch, qemu can set differently the busyloop_timeout for rx queue. To avoid duplicate codes, introduce the helper functions: * sock_has_rx_data(changed from sk_has_rx_data) * vhost_net_busy_poll_try_queue Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	a6a67a2f34	net: vhost: replace magic number of lock annotation Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:55 -07:00
Tonghao Zhang	78139c94dc	net: vhost: lock the vqs one by one This patch changes the way that lock all vqs at the same, to lock them one by one. It will be used for next patch to avoid the deadlock. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:25:54 -07:00
Yafang Shao	af4325ecc2	tcp: expose sk_state in tcp_retransmit_skb tracepoint After sk_state exposed, we can get in which state this retransmission occurs. That could give us more detail for dignostic. For example, if this retransmission occurs in SYN_SENT state, it may also indicates that the syn packet may be dropped on the remote peer due to syn backlog queue full and then we could check the remote peer. BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 20:07:19 -07:00
YueHaibing	0a71515665	net: faraday: fix return type of ndo_start_xmit function The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:18:08 -07:00
YueHaibing	6323d57f33	net: smsc: fix return type of ndo_start_xmit function The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:15:17 -07:00
zhong jiang	880e1b2111	net: liquidio: list usage cleanup Trival cleanup, list_move_tail will implement the same function that list_del() + list_add_tail() will do. hence just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:12:10 -07:00
zhong jiang	631e871edc	net: qed: list usage cleanup Trival cleanup, list_move_tail will implement the same function that list_del() + list_add_tail() will do. hence just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-26 10:11:36 -07:00

1 2 3 4 5 ...

783660 Commits