Commit Graph

873523 Commits

Author SHA1 Message Date
Andre Guedes
110b2263db samples/bpf: Remove duplicate option from xdpsock
The '-f' option is shown twice in the usage(). This patch removes the
outdated version.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191114162847.221770-1-andre.guedes@intel.com
2019-11-15 22:29:17 +01:00
Ilya Leoshkevich
fcf3513139 s390/bpf: Make sure JIT passes do not increase code size
The upcoming s390 branch length extension patches rely on "passes do
not increase code size" property in order to consistently choose between
short and long branches. Currently this property does not hold between
the first and the second passes for register save/restore sequences, as
well as various code fragments that depend on SEEN_* flags.

Generate the code during the first pass conservatively: assume register
save/restore sequences have the maximum possible length, and that all
SEEN_* flags are set.

Also refuse to JIT if this happens anyway (e.g. due to a bug), as this
might lead to verifier bypass once long branches are introduced.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191114151820.53222-1-iii@linux.ibm.com
2019-11-15 22:25:54 +01:00
Ilya Leoshkevich
b7b3fc8dd9 bpf: Support doubleword alignment in bpf_jit_binary_alloc
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
not work: in such cases it silently aligns only to 4 bytes.

On s390, in order to load a constant from memory in a large (>512k) BPF
program, one must use lgrl instruction, whose memory operand must be
aligned on an 8-byte boundary.

This patch makes it possible to request 8-byte alignment from
bpf_jit_binary_alloc, and also makes it issue a warning when an
unsupported alignment is requested.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191115123722.58462-1-iii@linux.ibm.com
2019-11-15 22:25:00 +01:00
Anders Roxell
e47a179997 bpf, testing: Add missing object file to TEST_FILES
When installing kselftests to its own directory and run the
test_lwt_ip_encap.sh it will complain that test_lwt_ip_encap.o can't be
found. Same with the test_tc_edt.sh test it will complain that
test_tc_edt.o can't be found.

  $ ./test_lwt_ip_encap.sh
  starting egress IPv4 encap test
  Error opening object test_lwt_ip_encap.o: No such file or directory
  Object hashing failed!
  Cannot initialize ELF context!
  Failed to parse eBPF program: Invalid argument

Rework to add test_lwt_ip_encap.o and test_tc_edt.o to TEST_FILES so the
object file gets installed when installing kselftest.

Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191111161728.8854-1-anders.roxell@linaro.org
2019-11-11 22:35:23 +01:00
Yonghong Song
b7a0d65d80 bpf, testing: Workaround a verifier failure for test_progs
With latest llvm compiler, running test_progs will have the following
verifier failure for test_sysctl_loop1.o:

  libbpf: load bpf program failed: Permission denied
  libbpf: -- BEGIN DUMP LOG ---
  libbpf:
  invalid indirect read from stack var_off (0x0; 0xff)+196 size 7
  ...
  libbpf: -- END LOG --
  libbpf: failed to load program 'cgroup/sysctl'
  libbpf: failed to load object 'test_sysctl_loop1.o'

The related bytecode looks as below:

  0000000000000308 LBB0_8:
      97:       r4 = r10
      98:       r4 += -288
      99:       r4 += r7
     100:       w8 &= 255
     101:       r1 = r10
     102:       r1 += -488
     103:       r1 += r8
     104:       r2 = 7
     105:       r3 = 0
     106:       call 106
     107:       w1 = w0
     108:       w1 += -1
     109:       if w1 > 6 goto -24 <LBB0_5>
     110:       w0 += w8
     111:       r7 += 8
     112:       w8 = w0
     113:       if r7 != 224 goto -17 <LBB0_8>

And source code:

     for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
             ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
                               tcp_mem + i);
             if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
                     return 0;
             off += ret & MAX_ULONG_STR_LEN;
     }

Current verifier is not able to conclude that register w0 before '+'
at insn 110 has a range of 1 to 7 and thinks it is from 0 - 255. This
leads to more conservative range for w8 at insn 112, and later verifier
complaint.

Let us workaround this issue until we found a compiler and/or verifier
solution. The workaround in this patch is to make variable 'ret' volatile,
which will force a reload and then '&' operation to ensure better value
range. With this patch, I got the below byte code for the loop:

  0000000000000328 LBB0_9:
     101:       r4 = r10
     102:       r4 += -288
     103:       r4 += r7
     104:       w8 &= 255
     105:       r1 = r10
     106:       r1 += -488
     107:       r1 += r8
     108:       r2 = 7
     109:       r3 = 0
     110:       call 106
     111:       *(u32 *)(r10 - 64) = r0
     112:       r1 = *(u32 *)(r10 - 64)
     113:       if w1 s< 1 goto -28 <LBB0_5>
     114:       r1 = *(u32 *)(r10 - 64)
     115:       if w1 s> 7 goto -30 <LBB0_5>
     116:       r1 = *(u32 *)(r10 - 64)
     117:       w1 &= 7
     118:       w1 += w8
     119:       r7 += 8
     120:       w8 = w1
     121:       if r7 != 224 goto -21 <LBB0_9>

Insn 117 did the '&' operation and we got more precise value range
for 'w8' at insn 120. The test is happy then:

  #3/17 test_sysctl_loop1.o:OK

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191107170045.2503480-1-yhs@fb.com
2019-11-11 14:03:10 +01:00
Alexei Starovoitov
0d2ec5b51d Merge branch 'share-umem'
Magnus Karlsson says:

====================
This patch set extends libbpf and the xdpsock sample program to
demonstrate the shared umem mode (XDP_SHARED_UMEM) as well as Rx-only
and Tx-only sockets. This in order for users to have an example to use
as a blue print and also so that these modes will be exercised more
frequently.

Note that the user needs to supply an XDP program with the
XDP_SHARED_UMEM mode that distributes the packets over the sockets
according to some policy. There is an example supplied with the
xdpsock program, but there is no default one in libbpf similarly to
when XDP_SHARED_UMEM is not used. The reason for this is that I felt
that supplying one that would work for all users in this mode is
futile. There are just tons of ways to distribute packets, so whatever
I come up with and build into libbpf would be wrong in most cases.

This patch has been applied against commit 30ee348c12 ("Merge branch 'bpf-libbpf-fixes'")

Structure of the patch set:

Patch 1: Adds shared umem support to libbpf
Patch 2: Shared umem support and example XPD program added to xdpsock sample
Patch 3: Adds Rx-only and Tx-only support to libbpf
Patch 4: Uses Rx-only sockets for rxdrop and Tx-only sockets for txpush in
         the xdpsock sample
Patch 5: Add documentation entries for these two features
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-10 19:30:51 -08:00
Magnus Karlsson
57afa8b0cf xsk: Extend documentation for Rx|Tx-only sockets and shared umems
Add more documentation about the new Rx-only and Tx-only sockets in
libbpf and also how libbpf can now support shared umems. Also found
two pieces that could be improved in the text, that got fixed in this
commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1573148860-30254-6-git-send-email-magnus.karlsson@intel.com
2019-11-10 19:30:46 -08:00
Magnus Karlsson
661842c46d samples/bpf: Use Rx-only and Tx-only sockets in xdpsock
Use Rx-only sockets for the rxdrop sample and Tx-only sockets for the
txpush sample in the xdpsock application. This so that we exercise and
show case these socket types too.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1573148860-30254-5-git-send-email-magnus.karlsson@intel.com
2019-11-10 19:30:46 -08:00
Magnus Karlsson
a68977d269 libbpf: Allow for creating Rx or Tx only AF_XDP sockets
The libbpf AF_XDP code is extended to allow for the creation of Rx
only or Tx only sockets. Previously it returned an error if the socket
was not initialized for both Rx and Tx.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1573148860-30254-4-git-send-email-magnus.karlsson@intel.com
2019-11-10 19:30:46 -08:00
Magnus Karlsson
2e5d72c15f samples/bpf: Add XDP_SHARED_UMEM support to xdpsock
Add support for the XDP_SHARED_UMEM mode to the xdpsock sample
application. As libbpf does not have a built in XDP program for this
mode, we use an explicitly loaded XDP program. This also serves as an
example on how to write your own XDP program that can route to an
AF_XDP socket.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1573148860-30254-3-git-send-email-magnus.karlsson@intel.com
2019-11-10 19:30:45 -08:00
Magnus Karlsson
cbf07409d0 libbpf: Support XDP_SHARED_UMEM with external XDP program
Add support in libbpf to create multiple sockets that share a single
umem. Note that an external XDP program need to be supplied that
routes the incoming traffic to the desired sockets. So you need to
supply the libbpf_flag XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD and load
your own XDP program.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1573148860-30254-2-git-send-email-magnus.karlsson@intel.com
2019-11-10 19:30:45 -08:00
Alexei Starovoitov
472aeb386e Merge branch 'map-pinning'
Toke Høiland-Jørgensen says:

====================
This series fixes a few bugs in libbpf that I discovered while playing around
with the new auto-pinning code, and writing the first utility in xdp-tools[0]:

- If object loading fails, libbpf does not clean up the pinnings created by the
  auto-pinning mechanism.
- EPERM is not propagated to the caller on program load
- Netlink functions write error messages directly to stderr

In addition, libbpf currently only has a somewhat limited getter function for
XDP link info, which makes it impossible to discover whether an attached program
is in SKB mode or not. So the last patch in the series adds a new getter for XDP
link info which returns all the information returned via netlink (and which can
be extended later).

Finally, add a getter for BPF program size, which can be used by the caller to
estimate the amount of locked memory needed to load a program.

A selftest is added for the pinning change, while the other features were tested
in the xdp-filter tool from the xdp-tools repo. The 'new-libbpf-features' branch
contains the commits that make use of the new XDP getter and the corrected EPERM
error code.

[0] https://github.com/xdp-project/xdp-tools

Changelog:

v4:
  - Don't do any size checks on struct xdp_info, just copy (and/or zero)
    whatever size the caller supplied.

v3:
  - Pass through all kernel error codes on program load (instead of just EPERM).
  - No new bpf_object__unload() variant, just do the loop at the caller
  - Don't reject struct xdp_info sizes that are bigger than what we expect.
  - Add a comment noting that bpf_program__size() returns the size in bytes

v2:
  - Keep function names in libbpf.map sorted properly
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-10 19:26:35 -08:00
Toke Høiland-Jørgensen
1a734efe06 libbpf: Add getter for program size
This adds a new getter for the BPF program size (in bytes). This is useful
for a caller that is trying to predict how much memory will be locked by
loading a BPF object into the kernel.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/157333185272.88376.10996937115395724683.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen
473f4e133a libbpf: Add bpf_get_link_xdp_info() function to get more XDP information
Currently, libbpf only provides a function to get a single ID for the XDP
program attached to the interface. However, it can be useful to get the
full set of program IDs attached, along with the attachment mode, in one
go. Add a new getter function to support this, using an extendible
structure to carry the information. Express the old bpf_get_link_id()
function in terms of the new function.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157333185164.88376.7520653040667637246.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen
b6e99b010e libbpf: Use pr_warn() when printing netlink errors
The netlink functions were using fprintf(stderr, ) directly to print out
error messages, instead of going through the usual logging macros. This
makes it impossible for the calling application to silence or redirect
those error messages. Fix this by switching to pr_warn() in nlattr.c and
netlink.c.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/157333185055.88376.15999360127117901443.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen
4f33ddb4e3 libbpf: Propagate EPERM to caller on program load
When loading an eBPF program, libbpf overrides the return code for EPERM
errors instead of returning it to the caller. This makes it hard to figure
out what went wrong on load.

In particular, EPERM is returned when the system rlimit is too low to lock
the memory required for the BPF program. Previously, this was somewhat
obscured because the rlimit error would be hit on map creation (which does
return it correctly). However, since maps can now be reused, object load
can proceed all the way to loading programs without hitting the error;
propagating it even in this case makes it possible for the caller to react
appropriately (and, e.g., attempt to raise the rlimit before retrying).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/157333184946.88376.11768171652794234561.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen
9c4e395a1e selftests/bpf: Add tests for automatic map unpinning on load failure
This add tests for the different variations of automatic map unpinning on
load failure.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/157333184838.88376.8243704248624814775.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen
ec6d5f47bf libbpf: Unpin auto-pinned maps if loading fails
Since the automatic map-pinning happens during load, it will leave pinned
maps around if the load fails at a later stage. Fix this by unpinning any
pinned maps on cleanup. To avoid unpinning pinned maps that were reused
rather than newly pinned, add a new boolean property on struct bpf_map to
keep track of whether that map was reused or not; and only unpin those maps
that were not reused.

Fixes: 57a00f4164 ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/157333184731.88376.9992935027056165873.stgit@toke.dk
2019-11-10 19:26:30 -08:00
Daniel T. Lee
451d1dc886 samples: bpf: update map definition to new syntax BTF-defined map
Since, the new syntax of BTF-defined map has been introduced,
the syntax for using maps under samples directory are mixed up.
For example, some are already using the new syntax, and some are using
existing syntax by calling them as 'legacy'.

As stated at commit abd29c9314 ("libbpf: allow specifying map
definitions using BTF"), the BTF-defined map has more compatablility
with extending supported map definition features.

The commit doesn't replace all of the map to new BTF-defined map,
because some of the samples still use bpf_load instead of libbpf, which
can't properly create BTF-defined map.

This will only updates the samples which uses libbpf API for loading bpf
program. (ex. bpf_prog_load_xattr)

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-08 19:21:52 -08:00
Daniel T. Lee
afbe3c27d9 samples: bpf: Update outdated error message
Currently, under samples, several methods are being used to load bpf
program.

Since using libbpf is preferred solution, lots of previously used
'load_bpf_file' from bpf_load are replaced with 'bpf_prog_load_xattr'
from libbpf.

But some of the error messages still show up as 'load_bpf_file' instead
of 'bpf_prog_load_xattr'.

This commit fixes outdated errror messages under samples and fixes some
code style issues.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191107005153.31541-2-danieltimlee@gmail.com
2019-11-08 19:19:43 -08:00
Martin KaFai Lau
ed5941af3f bpf: Add cb access in kfree_skb test
Access the skb->cb[] in the kfree_skb test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191107180905.4097871-1-kafai@fb.com
2019-11-07 10:59:08 -08:00
Martin KaFai Lau
7e3617a72d bpf: Add array support to btf_struct_access
This patch adds array support to btf_struct_access().
It supports array of int, array of struct and multidimensional
array.

It also allows using u8[] as a scratch space.  For example,
it allows access the "char cb[48]" with size larger than
the array's element "char".  Another potential use case is
"u64 icsk_ca_priv[]" in the tcp congestion control.

btf_resolve_size() is added to resolve the size of any type.
It will follow the modifier if there is any.  Please
see the function comment for details.

This patch also adds the "off < moff" check at the beginning
of the for loop.  It is to reject cases when "off" is pointing
to a "hole" in a struct.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191107180903.4097702-1-kafai@fb.com
2019-11-07 10:59:08 -08:00
Daniel Borkmann
30ee348c12 Merge branch 'bpf-libbpf-fixes'
Andrii Nakryiko says:

====================
Github's mirror of libbpf got LGTM and Coverity statis analysis running
against it and spotted few real bugs and few potential issues. This patch
series fixes found issues.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-07 16:20:41 +01:00
Andrii Nakryiko
98e527af30 libbpf: Improve handling of corrupted ELF during map initialization
If we get ELF file with "maps" section, but no symbols pointing to it, we'll
end up with division by zero. Add check against this situation and exit early
with error. Found by Coverity scan against Github libbpf sources.

Fixes: bf82927125 ("libbpf: refactor map initialization")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107020855.3834758-6-andriin@fb.com
2019-11-07 16:20:38 +01:00
Andrii Nakryiko
994021a7e0 libbpf: Make btf__resolve_size logic always check size error condition
Perform size check always in btf__resolve_size. Makes the logic a bit more
robust against corrupted BTF and silences LGTM/Coverity complaining about
always true (size < 0) check.

Fixes: 69eaab04c6 ("btf: extract BTF type size calculation")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107020855.3834758-5-andriin@fb.com
2019-11-07 16:20:38 +01:00
Andrii Nakryiko
dd3ab12637 libbpf: Fix another potential overflow issue in bpf_prog_linfo
Fix few issues found by Coverity and LGTM.

Fixes: b053b439b7 ("bpf: libbpf: bpftool: Print bpf_line_info during prog dump")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107020855.3834758-4-andriin@fb.com
2019-11-07 16:20:38 +01:00
Andrii Nakryiko
4ee1135615 libbpf: Fix potential overflow issue
Fix a potential overflow issue found by LGTM analysis, based on Github libbpf
source code.

Fixes: 3d65014146 ("bpf: libbpf: Add btf_line_info support to libbpf")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107020855.3834758-3-andriin@fb.com
2019-11-07 16:20:37 +01:00
Andrii Nakryiko
3dc5e05982 libbpf: Fix memory leak/double free issue
Coverity scan against Github libbpf code found the issue of not freeing memory and
leaving already freed memory still referenced from bpf_program. Fix it by
re-assigning successfully reallocated memory sooner.

Fixes: 2993e0515b ("tools/bpf: add support to read .BTF.ext sections")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107020855.3834758-2-andriin@fb.com
2019-11-07 16:20:37 +01:00
Andrii Nakryiko
9656b346b2 libbpf: Fix negative FD close() in xsk_setup_xdp_prog()
Fix issue reported by static analysis (Coverity). If bpf_prog_get_fd_by_id()
fails, xsk_lookup_bpf_maps() will fail as well and clean-up code will attempt
close() with fd=-1. Fix by checking bpf_prog_get_fd_by_id() return result and
exiting early.

Fixes: 10a13bb40e ("libbpf: remove qidconf and better support external bpf programs.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107054059.313884-1-andriin@fb.com
2019-11-07 16:15:27 +01:00
Ilya Leoshkevich
dab2e9eb18 s390/bpf: Remove unused SEEN_RET0, SEEN_REG_AX and ret0_ip
We don't need them since commit e1cf4befa2 ("bpf, s390x: remove
ld_abs/ld_ind") and commit a3212b8f15 ("bpf, s390x: remove obsolete
exception handling from div/mod").

Also, use BIT(n) instead of 1 << n, because checkpatch says so.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107114033.90505-1-iii@linux.ibm.com
2019-11-07 16:10:01 +01:00
Ilya Leoshkevich
6ad2e1a007 s390/bpf: Wrap JIT macro parameter usages in parentheses
This change does not alter JIT behavior; it only makes it possible to
safely invoke JIT macros with complex arguments in the future.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107113211.90105-1-iii@linux.ibm.com
2019-11-07 16:07:28 +01:00
Ilya Leoshkevich
166f11d11f s390/bpf: Use kvcalloc for addrs array
A BPF program may consist of 1m instructions, which means JIT
instruction-address mapping can be as large as 4m. s390 has
FORCE_MAX_ZONEORDER=9 (for memory hotplug reasons), which means maximum
kmalloc size is 1m. This makes it impossible to JIT programs with more
than 256k instructions.

Fix by using kvcalloc, which falls back to vmalloc for larger
allocations. An alternative would be to use a radix tree, but that is
not supported by bpf_prog_fill_jited_linfo.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107141838.92202-1-iii@linux.ibm.com
2019-11-07 16:05:55 +01:00
Ilya Leoshkevich
7e22077d0c tools, bpf_asm: Warn when jumps are out of range
When compiling larger programs with bpf_asm, it's possible to
accidentally exceed jt/jf range, in which case it won't complain, but
rather silently emit a truncated offset, leading to a "happy debugging"
situation.

Add a warning to help detecting such issues. It could be made an error
instead, but this might break compilation of existing code (which might
be working by accident).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107100349.88976-1-iii@linux.ibm.com
2019-11-07 16:01:34 +01:00
Martin KaFai Lau
85d31dd070 bpf: Account for insn->off when doing bpf_probe_read_kernel
In the bpf interpreter mode, bpf_probe_read_kernel is used to read
from PTR_TO_BTF_ID's kernel object.  It currently missed considering
the insn->off.  This patch fixes it.

Fixes: 2a02759ef5 ("bpf: Add support for BTF pointers to interpreter")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191107014640.384083-1-kafai@fb.com
2019-11-06 21:52:52 -08:00
Andrii Nakryiko
ed57802121 libbpf: Simplify BPF_CORE_READ_BITFIELD_PROBED usage
Streamline BPF_CORE_READ_BITFIELD_PROBED interface to follow
BPF_CORE_READ_BITFIELD (direct) and BPF_CORE_READ, in general, i.e., just
return read result or 0, if underlying bpf_probe_read() failed.

In practice, real applications rarely check bpf_probe_read() result, because
it has to always work or otherwise it's a bug. So propagating internal
bpf_probe_read() error from this macro hurts usability without providing real
benefits in practice. This patch fixes the issue and simplifies usage,
noticeable even in selftest itself.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191106201500.2582438-1-andriin@fb.com
2019-11-06 13:54:59 -08:00
Andrii Nakryiko
65a052d537 selftests/bps: Clean up removed ints relocations negative tests
As part of 42765ede5c ("selftests/bpf: Remove too strict field offset relo
test cases"), few ints relocations negative (supposed to fail) tests were
removed, but not completely. Due to them being negative, some leftovers in
prog_tests/core_reloc.c went unnoticed. Clean them up.

Fixes: 42765ede5c ("selftests/bpf: Remove too strict field offset relo test cases")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191106173659.1978131-1-andriin@fb.com
2019-11-06 13:45:06 -08:00
Daniel Borkmann
f23c7ce341 Merge branch 'bpf-libbpf-bitfield-size-relo'
Andrii Nakryiko says:

====================
This patch set adds support for reading bitfields in a relocatable manner
through a set of relocations emitted by Clang, corresponding libbpf support
for those relocations, as well as abstracting details into
BPF_CORE_READ_BITFIELD/BPF_CORE_READ_BITFIELD_PROBED macro.

We also add support for capturing relocatable field size, so that BPF program
code can adjust its logic to actual amount of data it needs to operate on,
even if it changes between kernels. New convenience macro is added to
bpf_core_read.h (bpf_core_field_size(), in the same family of macro as
bpf_core_read() and bpf_core_field_exists()). Corresponding set of selftests
are added to excercise this logic and validate correctness in a variety of
scenarios.

Some of the overly strict logic of matching fields is relaxed to support wider
variety of scenarios. See patch #1 for that.

Patch #1 removes few overly strict test cases.
Patch #2 adds support for bitfield-related relocations.
Patch #3 adds some further adjustments to support generic field size
relocations and introduces bpf_core_field_size() macro.
Patch #4 tests bitfield reading.
Patch #5 tests field size relocations.

v1 -> v2:
  - added direct memory read-based macro and tests for bitfield reads.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-04 16:07:02 +01:00
Andrii Nakryiko
0b163565b9 selftests/bpf: Add field size relocation tests
Add test verifying correctness and logic of field size relocation support in
libbpf.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101222810.1246166-6-andriin@fb.com
2019-11-04 16:06:56 +01:00
Andrii Nakryiko
8b1cb1c960 selftest/bpf: Add relocatable bitfield reading tests
Add a bunch of selftests verifying correctness of relocatable bitfield reading
support in libbpf. Both bpf_probe_read()-based and direct read-based bitfield
macros are tested. core_reloc.c "test_harness" is extended to support raw
tracepoint and new typed raw tracepoints as test BPF program types.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101222810.1246166-5-andriin@fb.com
2019-11-04 16:06:56 +01:00
Andrii Nakryiko
94f060e984 libbpf: Add support for field size relocations
Add bpf_core_field_size() macro, capturing a relocation against field size.
Adjust bits of internal libbpf relocation logic to allow capturing size
relocations of various field types: arrays, structs/unions, enums, etc.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101222810.1246166-4-andriin@fb.com
2019-11-04 16:06:56 +01:00
Andrii Nakryiko
ee26dade0e libbpf: Add support for relocatable bitfields
Add support for the new field relocation kinds, necessary to support
relocatable bitfield reads. Provide macro for abstracting necessary code doing
full relocatable bitfield extraction into u64 value. Two separate macros are
provided:
- BPF_CORE_READ_BITFIELD macro for direct memory read-enabled BPF programs
(e.g., typed raw tracepoints). It uses direct memory dereference to extract
bitfield backing integer value.
- BPF_CORE_READ_BITFIELD_PROBED macro for cases where bpf_probe_read() needs
to be used to extract same backing integer value.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101222810.1246166-3-andriin@fb.com
2019-11-04 16:06:56 +01:00
Andrii Nakryiko
42765ede5c selftests/bpf: Remove too strict field offset relo test cases
As libbpf is going to gain support for more field relocations, including field
size, some restrictions about exact size match are going to be lifted. Remove
test cases that explicitly test such failures.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101222810.1246166-2-andriin@fb.com
2019-11-04 16:06:56 +01:00
David S. Miller
1574cf83c7 mlx5-updates-2019-11-01
Misc updates for mlx5 netdev and core driver
 
 1) Steering Core: Replace CRC32 internal implementation with standard
    kernel lib.
 2) Steering Core: Support IPv4 and IPv6 mixed matcher.
 3) Steering Core: Lockless FTE read lookups
 4) TC: Bit sized fields rewrite support.
 5) FPGA: Standalone FPGA support.
 6) SRIOV: Reset VF parameters configurations on SRIOV disable.
 7) netdev: Dump WQs wqe descriptors on CQE with error events.
 8) MISC Cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl28qcYACgkQSD+KveBX
 +j4+tgf9HM1EpNeeX/kfZGECMtxHgKuNC4NwpOExKU/OpvUxCNBZb1KXDjaeraE9
 8fLvgA/T2cEHfNojJ+S9rRb64KxGw/96ieYimc9aYkIH4L5YEXYcUHj44RRMNYpI
 miWUQgcWABRvO5JvDMXqG+NIr7ctyodA7Qb3vlFWJvSB8KLNViPfPfPHn+oVDnxR
 ZBp1CWZXnlwO1ZMYQ2TDY1l0csDIx+awxkYXL3SFqJKBzheDItmQ4Ybw/yh+Mfv3
 eS5DJo2rRADHsTDTKSbyaBzcTY1UEWhJW+stYlN0SP9YkH1y2Q3lDeNKQLfK4xkc
 YwYaCp8ZvLmOtLxwoujAIl2R5sjDpQ==
 =8kXy
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-11-01

Misc updates for mlx5 netdev and core driver

1) Steering Core: Replace CRC32 internal implementation with standard
   kernel lib.
2) Steering Core: Support IPv4 and IPv6 mixed matcher.
3) Steering Core: Lockless FTE read lookups
4) TC: Bit sized fields rewrite support.
5) FPGA: Standalone FPGA support.
6) SRIOV: Reset VF parameters configurations on SRIOV disable.
7) netdev: Dump WQs wqe descriptors on CQE with error events.
8) MISC Cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 19:23:49 -08:00
YueHaibing
a37ac8ae66 mISDN: remove unused variable 'faxmodulation_s'
drivers/isdn/hardware/mISDN/mISDNisar.c:30:17:
 warning: faxmodulation_s defined but not used [-Wunused-const-variable=]

It is never used, so can be removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 19:10:30 -08:00
Vincent Cheng
3a6ba7dc77 ptp: Add a ptp clock driver for IDT ClockMatrix.
The IDT ClockMatrix (TM) family includes integrated devices that provide
eight PLL channels.  Each PLL channel can be independently configured as a
frequency synthesizer, jitter attenuator, digitally controlled
oscillator (DCO), or a digital phase lock loop (DPLL).  Typically
these devices are used as timing references and clock sources for PTP
applications.  This patch adds support for the device.

Co-developed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:35:40 -08:00
Vincent Cheng
5c5e7aac63 dt-bindings: ptp: Add device tree binding for IDT ClockMatrix based PTP clock
Add device tree binding doc for the IDT ClockMatrix PTP clock.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:35:40 -08:00
Francesco Ruggeri
fac6fce9bd net: icmp6: provide input address for traceroute6
traceroute6 output can be confusing, in that it shows the address
that a router would use to reach the sender, rather than the address
the packet used to reach the router.
Consider this case:

        ------------------------ N2
         |                    |
       ------              ------  N3  ----
       | R1 |              | R2 |------|H2|
       ------              ------      ----
         |                    |
        ------------------------ N1
                  |
                 ----
                 |H1|
                 ----

where H1's default route is through R1, and R1's default route is
through R2 over N2.
traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2.

The script below can be used to reproduce this scenario.

traceroute6 output without this patch:

traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
 1  2000:101::1 (2000:101::1)  0.036 ms  0.008 ms  0.006 ms
 2  2000:101::2 (2000:101::2)  0.011 ms  0.008 ms  0.007 ms
 3  2000:103::4 (2000:103::4)  0.013 ms  0.010 ms  0.009 ms

traceroute6 output with this patch:

traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
 1  2000:101::1 (2000:101::1)  0.056 ms  0.019 ms  0.006 ms
 2  2000:102::2 (2000:102::2)  0.013 ms  0.008 ms  0.008 ms
 3  2000:103::4 (2000:103::4)  0.013 ms  0.009 ms  0.009 ms

#!/bin/bash
#
#        ------------------------ N2
#         |                    |
#       ------              ------  N3  ----
#       | R1 |              | R2 |------|H2|
#       ------              ------      ----
#         |                    |
#        ------------------------ N1
#                  |
#                 ----
#                 |H1|
#                 ----
#
# N1: 2000:101::/64
# N2: 2000:102::/64
# N3: 2000:103::/64
#
# R1's host part of address: 1
# R2's host part of address: 2
# H1's host part of address: 3
# H2's host part of address: 4
#
# For example:
# the IPv6 address of R1's interface on N2 is 2000:102::1/64
#
# Nets are implemented by macvlan interfaces (bridge mode) over
# dummy interfaces.
#

# Create net namespaces
ip netns add host1
ip netns add host2
ip netns add rtr1
ip netns add rtr2

# Create nets
ip link add net1 type dummy; ip link set net1 up
ip link add net2 type dummy; ip link set net2 up
ip link add net3 type dummy; ip link set net3 up

# Add interfaces to net1, move them to their nemaspaces
ip link add link net1 dev host1net1 type macvlan mode bridge
ip link set host1net1 netns host1
ip link add link net1 dev rtr1net1 type macvlan mode bridge
ip link set rtr1net1 netns rtr1
ip link add link net1 dev rtr2net1 type macvlan mode bridge
ip link set rtr2net1 netns rtr2

# Add interfaces to net2, move them to their nemaspaces
ip link add link net2 dev rtr1net2 type macvlan mode bridge
ip link set rtr1net2 netns rtr1
ip link add link net2 dev rtr2net2 type macvlan mode bridge
ip link set rtr2net2 netns rtr2

# Add interfaces to net3, move them to their nemaspaces
ip link add link net3 dev rtr2net3 type macvlan mode bridge
ip link set rtr2net3 netns rtr2
ip link add link net3 dev host2net3 type macvlan mode bridge
ip link set host2net3 netns host2

# Configure interfaces and routes in host1
ip netns exec host1 ip link set lo up
ip netns exec host1 ip link set host1net1 up
ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1
ip netns exec host1 ip -6 route add default via 2000:101::1

# Configure interfaces and routes in rtr1
ip netns exec rtr1 ip link set lo up
ip netns exec rtr1 ip link set rtr1net1 up
ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1
ip netns exec rtr1 ip link set rtr1net2 up
ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2
ip netns exec rtr1 ip -6 route add default via 2000:102::2
ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1

# Configure interfaces and routes in rtr2
ip netns exec rtr2 ip link set lo up
ip netns exec rtr2 ip link set rtr2net1 up
ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1
ip netns exec rtr2 ip link set rtr2net2 up
ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2
ip netns exec rtr2 ip link set rtr2net3 up
ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3
ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1

# Configure interfaces and routes in host2
ip netns exec host2 ip link set lo up
ip netns exec host2 ip link set host2net3 up
ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3
ip netns exec host2 ip -6 route add default via 2000:103::2

# Ping host2 from host1
ip netns exec host1 ping6 -c5 2000:103::4

# Traceroute host2 from host1
ip netns exec host1 traceroute6 2000:103::4

# Delete nets
ip link del net3
ip link del net2
ip link del net1

# Delete namespaces
ip netns del rtr2
ip netns del rtr1
ip netns del host2
ip netns del host1

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Original-patch-by: Honggang Xu <hxu@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:26:53 -08:00
Tuong Lien
06e7c70c6e tipc: improve message bundling algorithm
As mentioned in commit e95584a889 ("tipc: fix unlimited bundling of
small messages"), the current message bundling algorithm is inefficient
that can generate bundles of only one payload message, that causes
unnecessary overheads for both the sender and receiver.

This commit re-designs the 'tipc_msg_make_bundle()' function (now named
as 'tipc_msg_try_bundle()'), so that when a message comes at the first
place, we will just check & keep a reference to it if the message is
suitable for bundling. The message buffer will be put into the link
backlog queue and processed as normal. Later on, when another one comes
we will make a bundle with the first message if possible and so on...
This way, a bundle if really needed will always consist of at least two
payload messages. Otherwise, we let the first buffer go its way without
any need of bundling, so reduce the overheads to zero.

Moreover, since now we have both the messages in hand, we can even
optimize the 'tipc_msg_bundle()' function, make bundle of a very large
(size ~ MSS) and small messages which is not with the current algorithm
e.g. [1400-byte message] + [10-byte message] (MTU = 1500).

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:26:15 -08:00
Francesco Ruggeri
2adf81c0f7 net: icmp: use input address in traceroute
Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
primary address of the interface the packet was received on, even if
the path goes through a secondary address. In the example:

                    1.0.3.1/24
 ---- 1.0.1.3/24    1.0.1.1/24 ---- 1.0.2.1/24    1.0.2.4/24 ----
 |H1|--------------------------|R1|--------------------------|H2|
 ----            N1            ----            N2            ----

where 1.0.3.1/24 is R1's primary address on N1, traceroute from
H1 to H2 returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
 1  1.0.3.1 (1.0.3.1)  0.018 ms  0.006 ms  0.006 ms
 2  1.0.2.4 (1.0.2.4)  0.021 ms  0.007 ms  0.007 ms

After applying this patch, it returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
 1  1.0.1.1 (1.0.1.1)  0.033 ms  0.007 ms  0.006 ms
 2  1.0.2.4 (1.0.2.4)  0.011 ms  0.007 ms  0.007 ms

Original-patch-by: Bill Fenner <fenner@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:25:18 -08:00
David S. Miller
c219a16622 Merge branch 'optimize-openvswitch-flow-looking-up'
Tonghao Zhang says:

====================
optimize openvswitch flow looking up

This series patch optimize openvswitch for performance or simplify
codes.

Patch 1, 2, 4: Port Pravin B Shelar patches to
linux upstream with little changes.

Patch 5, 6, 7: Optimize the flow looking up and
simplify the flow hash.

Patch 8, 9: are bugfix.

The performance test is on Intel Xeon E5-2630 v4.
The test topology is show as below:

+-----------------------------------+
|   +---------------------------+   |
|   | eth0   ovs-switch    eth1 |   | Host0
|   +---------------------------+   |
+-----------------------------------+
      ^                       |
      |                       |
      |                       |
      |                       |
      |                       v
+-----+----+             +----+-----+
| netperf  | Host1       | netserver| Host2
+----------+             +----------+

We use netperf send the 64B packets, and insert 255+ flow-mask:
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
...
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
$
$ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18

* Without series patch, throughput 8.28Mbps
* With series patch, throughput 46.05Mbps

v6:
some coding style fixes

v5:
rewrite patch 8, release flow-mask when freeing flow

v4:
access ma->count with READ_ONCE/WRITE_ONCE API. More information,
see patch 5 comments.

v3:
update ma point when realloc mask_array in patch 5

v2:
simplify codes. e.g. use kfree_rcu instead of call_rcu
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00