IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Tests for BPF_CMPXCHG with both word and double word operands. As with
the tests for other atomic operations, these tests only check the result
of the arithmetic operation. The atomicity of the operations is not tested.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-14-johan.almbladh@anyfinetworks.com
Tests for each atomic arithmetic operation and BPF_XCHG, derived from
old BPF_XADD tests. The tests include BPF_W/DW and BPF_FETCH variants.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-13-johan.almbladh@anyfinetworks.com
Some JITs may need to convert a conditional jump instruction to
to short PC-relative branch and a long unconditional jump, if the
PC-relative offset exceeds offset field width in the CPU instruction.
This test triggers such branch conversion on the 32-bit MIPS JIT.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809091829.810076-11-johan.almbladh@anyfinetworks.com
A double word (64-bit) load/store may be implemented as two successive
32-bit operations, one for each word. Check that the order of those
operations is consistent with the machine endianness.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-10-johan.almbladh@anyfinetworks.com
32-bit JITs may implement complex ALU64 instructions using function calls.
The new tests check aspects related to this, such as register clobbering
and register argument re-ordering.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809091829.810076-9-johan.almbladh@anyfinetworks.com
This patch adds BPF_MUL tests for 64x32 and 64x64 multiply. Mainly
testing 32-bit JITs that implement ALU64 operations with two 32-bit
CPU registers per operand.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809091829.810076-8-johan.almbladh@anyfinetworks.com
This patch adds a number of tests for BPF_LSH, BPF_RSH amd BPF_ARSH
ALU64 operations with values that may trigger different JIT code paths.
Mainly testing 32-bit JITs that implement ALU64 operations with two
32-bit CPU registers per operand.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809091829.810076-7-johan.almbladh@anyfinetworks.com
This patch adds more tests of ALU32 shift operations BPF_LSH and BPF_RSH,
including the special case of a zero immediate. Also add corresponding
BPF_ARSH tests which were missing for ALU32.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-6-johan.almbladh@anyfinetworks.com
This patch adds tests of BPF_AND, BPF_OR and BPF_XOR with different
magnitude of the immediate value. Mainly checking 32-bit JIT sub-word
handling and zero/sign extension.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-5-johan.almbladh@anyfinetworks.com
This patch corrects the test description in a number of cases where
the description differed from what was actually tested and expected.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-4-johan.almbladh@anyfinetworks.com
Tests for ALU32 and ALU64 MOV with different sizes of the immediate
value. Depending on the immediate field width of the native CPU
instructions, a JIT may generate code differently depending on the
immediate value. Test that zero or sign extension is performed as
expected. Mainly for JIT testing.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-3-johan.almbladh@anyfinetworks.com
An eBPF JIT may implement JMP32 operations in a different way than JMP,
especially on 32-bit architectures. This patch adds a series of tests
for JMP32 operations, mainly for testing JITs.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210809091829.810076-2-johan.almbladh@anyfinetworks.com
A codeblock for handling nested vlan trips newbies into thinking it as
duplicate code. Explicitly add a comment to clarify.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809070046.32142-1-falakreyaz@gmail.com
Add a test suite to test XDP bonding implementation over a pair of
veth devices.
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210731055738.16820-8-joamaki@gmail.com
The program type cannot be deduced from 'tx' which causes an invalid
argument error when trying to load xdp_tx.o using the skeleton.
Rename the section name to "xdp" so that libbpf can deduce the type.
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210731055738.16820-7-joamaki@gmail.com
For the XDP bonding slave lookup to work in the NAPI poll context in which
the redudant rcu_read_lock() has been removed we have to follow the same
approach as in 694cea395fde ("bpf: Allow RCU-protected lookups to happen
from bh context") and modify the WARN_ON to also check rcu_read_lock_bh_held().
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210731055738.16820-6-joamaki@gmail.com
XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.
Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring. The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.
The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters. An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.
Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10". On top
of that, for ice, further work is in progress on improving the XDP_TX
numbers [4].
-----------------------| CPU |--| rxpck/s |--| txpck/s |----
without patch (1 dev):
XDP_DROP: 3.15% 48.6Mpps
XDP_TX: 3.12% 18.3Mpps 18.3Mpps
XDP_DROP (RSS): 9.47% 116.5Mpps
XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps
-----------------------
with patch, bond (1 dev):
XDP_DROP: 3.14% 46.7Mpps
XDP_TX: 3.15% 13.9Mpps 13.9Mpps
XDP_DROP (RSS): 10.33% 117.2Mpps
XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps
-----------------------
with patch, bond (2 devs):
XDP_DROP: 6.27% 92.7Mpps
XDP_TX: 6.26% 17.6Mpps 17.5Mpps
XDP_DROP (RSS): 11.38% 117.2Mpps
XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps
--------------------------------------------------------------
RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.
[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
[4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.
The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Link: https://lore.kernel.org/bpf/20210731055738.16820-3-joamaki@gmail.com
In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
Simon Horman says:
====================
This short series provides minor enhancements to the
sample code in samples/bpf/xdpsock_user.c.
Each change is explained more fully in its own commit message.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
There is a forward declaration of ip_fast_csum() just before its
implementation, remove the unneeded forward declaration.
While at it mark the implementation as static inline.
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/bpf/20210806122855.26115-3-simon.horman@corigine.com
The xdpsock sample application is a useful base for experiment's around
AF_XDP sockets. Compiling the sample outside of the kernel tree is made
harder then it has to be as the sample includes two headers and that are
not installed by 'make install_header' nor are usually part of
distributions kernel headers.
The first header asm/barrier.h is not used and can just be dropped.
The second linux/compiler.h are only needed for the decorator __force
and are only used in ip_fast_csum(), csum_fold() and
csum_tcpudp_nofold(). These functions are copied verbatim from
include/asm-generic/checksum.h and lib/checksum.c. While it's fine to
copy and use these functions in the sample application the decorator
brings no value and can be dropped together with the include.
With this change it's trivial to compile the xdpsock sample outside the
kernel tree from xdpsock_user.c and xdpsock.h.
$ gcc -o xdpsock xdpsock_user.c -lbpf -lpthread
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/bpf/20210806122855.26115-2-simon.horman@corigine.com
BPF programs for reference_tracking selftest use "fail_" prefix to notify that
they are expected to fail. This is really confusing and inconvenient when
trying to grep through test_progs output to find *actually* failed tests. So
rename the prefix from "fail_" to "err_".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210805230734.437914-1-andrii@kernel.org
Currently, this test is incorrectly printing the destination port in
place of the destination IP.
Fixes: 2767c97765cb ("selftests/bpf: Implement sample tcp/tcp6 bpf_iter programs")
Signed-off-by: Jose Blanquicet <josebl@microsoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210805164044.527903-1-josebl@microsoft.com
Rewrite to skel and ASSERT macros as well while we are at it.
v3:
- replace -f with -A to make it work with busybox ping.
-A is available on both busybox and iputils, from the man page:
On networks with low RTT this mode is essentially equivalent to
flood mode.
v2:
- don't check result of bpf_map__fd (Yonghong Song)
- remove from .gitignore (Andrii Nakryiko)
- move ping_command into network_helpers (Andrii Nakryiko)
- remove assert() (Andrii Nakryiko)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210804205524.3748709-1-sdf@google.com
Commit ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program
on cpumap") added the following option, but missed adding it to optstring:
- mprog-disable: disable loading XDP program on cpumap entries
Fix it and add the missing option character.
Fixes: ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap")
Signed-off-by: Matthew Cover <matthew.cover@stackpath.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210731005632.13228-1-matthew.cover@stackpath.com
During recent net into net-next merge ([0]) a piece of old logic ([1]) got
reintroduced accidentally while resolving merge conflict between bpf's [2]
and bpf-next's [3]. This check was removed in bpf-next tree to allow extra
ctx_in parameter passed for XDP test runs. Reinstating the check breaks
bpf_prog_test_run_xdp logic and causes a corresponding xdp_context_test_run
selftest failure. Fix by removing the check and allow ctx_in for XDP test
runs.
[0] 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
[1] 947e8b595b82 ("bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN")
[2] 5e21bb4e8125 ("bpf, test: fix NULL pointer dereference on invalid expected_attach_type")
[3] 47316f4a3053 ("bpf: Support input xdp_md context in BPF_PROG_TEST_RUN")
Fixes: 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
As of now, only AF_UNIX datagram socket supports sockmap. But
unix_proto is shared for all kinds of AF_UNIX sockets, so we
have to check the socket type in unix_bpf_update_proto() to
explicitly reject other types, otherwise they could be added
into sockmap, too.
Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210731195038.8084-1-xiyou.wangcong@gmail.com
Before, the interpreter allowed up to MAX_TAIL_CALL_CNT + 1 tail calls.
Now precisely MAX_TAIL_CALL_CNT is allowed, which is in line with the
behavior of the x86 JITs.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210728164741.350370-1-johan.almbladh@anyfinetworks.com
Don't populate the array states on the stack but instead it
static const. Makes the object code smaller by 79 bytes.
Before:
text data bss dec hex filename
21309 8304 192 29805 746d drivers/net/ethernet/mellanox/mlx4/qp.o
After:
text data bss dec hex filename
21166 8368 192 29726 741e drivers/net/ethernet/mellanox/mlx4/qp.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210801153742.147304-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array if_names on the stack but instead it
static const. Makes the object code smaller by 99 bytes.
Before:
text data bss dec hex filename
27886 10752 672 39310 998e ./drivers/net/ethernet/3com/3c509.o
After:
text data bss dec hex filename
27723 10816 672 39211 992b ./drivers/net/ethernet/3com/3c509.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801152650.146572-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array faf_bits on the stack but instead it
static const. Makes the object code smaller by 175 bytes.
Before:
text data bss dec hex filename
9645 4552 0 14197 3775 ../freescale/dpaa2/dpaa2-eth-devlink.o
After:
text data bss dec hex filename
9406 4616 0 14022 36c6 ../freescale/dpaa2/dpaa2-eth-devlink.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801152209.146359-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array random_data on the stack but instead it
static const. Makes the object code smaller by 66 bytes.
Before:
text data bss dec hex filename
52895 10976 0 63871 f97f ../qlogic/qlcnic/qlcnic_ethtool.o
After:
text data bss dec hex filename
52701 11104 0 63805 f93d ../qlogic//qlcnic/qlcnic_ethtool.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801151659.146113-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the const array name on the stack but instead it
static. Makes the object code smaller by 28 bytes. Add a missing
const to clean up a checkpatch warning.
Before:
text data bss dec hex filename
124565 31565 384 156514 26362 drivers/net/ethernet/marvell/sky2.o
After:
text data bss dec hex filename
124441 31661 384 156486 26346 drivers/net/ethernet/marvell/sky2.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801150647.145728-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array match_all_mac on the stack but instead it
static const. Makes the object code smaller by 75 bytes.
Before:
text data bss dec hex filename
46701 8960 64 55725 d9ad ../chelsio/cxgb4/cxgb4_filter.o
After:
text data bss dec hex filename
46338 9120 192 55650 d962 ../chelsio/cxgb4/cxgb4_filter.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801151205.145924-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana reported a refcount warning when booting over NFS:
[ 5.042532] ------------[ cut here ]------------
[ 5.047184] refcount_t: addition on 0; use-after-free.
[ 5.052324] WARNING: CPU: 7 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0xa4/0x150
...
[ 5.167201] Call trace:
[ 5.169635] refcount_warn_saturate+0xa4/0x150
[ 5.174067] fib_create_info+0xc00/0xc90
[ 5.177982] fib_table_insert+0x8c/0x620
[ 5.181893] fib_magic.isra.0+0x110/0x11c
[ 5.185891] fib_add_ifaddr+0xb8/0x190
[ 5.189629] fib_inetaddr_event+0x8c/0x140
fib_treeref needs to be set after kzalloc. The old code had a ++ which
led to the confusion when the int was replaced by a refcount_t.
Fixes: 79976892f7ea ("net: convert fib_treeref from int to refcount_t")
Signed-off-by: David Ahern <dsahern@kernel.org>
Reported-by: Ioana Ciornei <ciorneiioana@gmail.com>
Cc: Yajun Deng <yajun.deng@linux.dev>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20210802160221.27263-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate arrays on the stack but instead them static const.
Makes the object code smaller by 280 bytes.
Before:
text data bss dec hex filename
24142 4368 192 28702 701e ./drivers/net/phy/mscc/mscc_ptp.o
After:
text data bss dec hex filename
23830 4400 192 28422 6f06 ./drivers/net/phy/mscc/mscc_ptp.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801070155.139057-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't populate the array res_ids on the stack but instead it
static const. Makes the object code smaller by 14 bytes.
Before:
text data bss dec hex filename
50833 8314 256 59403 e80b ./drivers/net/netdevsim/fib.o
After:
text data bss dec hex filename
50755 8378 256 59389 e7fd ./drivers/net/netdevsim/fib.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801065328.138906-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:
$ pahole -C ip_msfilter net/ipv4/ip_sockglue.o
struct ip_msfilter {
union {
struct {
__be32 imsf_multiaddr_aux; /* 0 4 */
__be32 imsf_interface_aux; /* 4 4 */
__u32 imsf_fmode_aux; /* 8 4 */
__u32 imsf_numsrc_aux; /* 12 4 */
__be32 imsf_slist[1]; /* 16 4 */
}; /* 0 20 */
struct {
__be32 imsf_multiaddr; /* 0 4 */
__be32 imsf_interface; /* 4 4 */
__u32 imsf_fmode; /* 8 4 */
__u32 imsf_numsrc; /* 12 4 */
__be32 imsf_slist_flex[0]; /* 16 0 */
}; /* 0 16 */
}; /* 0 20 */
/* size: 20, cachelines: 1, members: 1 */
/* last cacheline: 20 bytes */
};
Also, refactor the code accordingly and make use of the struct_size()
and flex_array_size() helpers.
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nci_request() receives a callback function and unsigned long data
argument "opt" which is passed to the callback. Almost all of the
nci_request() callers pass pointer to a stack variable as data argument.
Only few pass scalar value (e.g. u8).
All such callbacks do not modify passed data argument and in previous
commit they were made as const. However passing pointers via unsigned
long removes the const annotation. The callback could simply cast
unsigned long to a pointer to writeable memory.
Use "const void *" as type of this "opt" argument to solve this and
prevent modifying the pointed contents. This is also consistent with
generic pattern of passing data arguments - via "void *". In few places
which pass scalar values, use casts via "unsigned long" to suppress any
warnings.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is desirable to reduce the surface of DSA_TAG_PROTO_NONE as much as
we can, because we now have options for switches without hardware
support for DSA tagging, and the occurrence in the mt7530 driver is in
fact quite gratuitout and easy to remove. Since ds->ops->get_tag_protocol()
is only called for CPU ports, the checks for a CPU port in
mtk_get_tag_protocol() are redundant and can be removed.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham says:
====================
cn10k: DWRR MTU and weights configuration
On OcteonTx2 DWRR quantum is directly configured into each of
the transmit scheduler queues. And PF/VF drivers were free to
config any value upto 2^24.
On CN10K, HW is modified, the quantum configuration at scheduler
queues is in terms of weight. And SW needs to setup a base DWRR MTU
at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
'DWRR MTU * weight' to get the quantum.
This patch series addresses this HW change on CN10K silicons,
both admin function and PF/VF drivers are modified.
Also added support to program DWRR MTU via devlink params.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
The DWRR MTU from admin function is retrieved via mbox.
On OcteaonTx2 silicon, admin function driver responds with DWRR
MTU as '1'. This helps to avoid silicon specific transmit
scheduler DWRR quantum/weight configuration logic.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTx2 DWRR quantum is directly configured into each of
the transmit scheduler queues. And PF/VF drivers were free to
config any value upto 2^24.
On CN10K, HW is modified, the quantum configuration at scheduler
queues is in terms of weight. And SW needs to setup a base DWRR MTU
at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
'DWRR MTU * weight' to get the quantum. For LBK traffic, value
programmed into NIX_AF_DWRR_RPM_MTU register is considered as
DWRR MTU.
This patch programs a default DWRR MTU of 8192 into HW and also
provides a way to change this via devlink params.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removed the 'raw gso min size - 1' test which
always fails now:
./in_netns.sh ./psock_snd -v -c -g -l "${mss}"
raw gso min size - 1 (expected to fail)
tx: 1524
rx: 1472
OK
After commit 7c6d2ecbda83 ("net: be more gentle about silly
gso requests coming from user"), we relaxed the min gso_size
check in virtio_net_hdr_to_skb().
So when a packet which is smaller then the gso_size,
GSO for this packet will not be set, the packet will be
send/recv successfully.
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some time ago, I reported a calltrace issue
"did not find a suitable aggregator", please see[1].
After a period of analysis and reproduction, I find
that this problem is caused by concurrency.
Before the problem occurs, the bond structure is like follows:
bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
\
port0
\
slaver1(eth1) - agg1.lag_ports -> NULL
\
port1
If we run 'ifenslave bond0 -d eth1', the process is like below:
excuting __bond_release_one()
|
bond_upper_dev_unlink()[step1]
| | |
| | bond_3ad_lacpdu_recv()
| | ->bond_3ad_rx_indication()
| | spin_lock_bh()
| | ->ad_rx_machine()
| | ->__record_pdu()[step2]
| | spin_unlock_bh()
| | |
| bond_3ad_state_machine_handler()
| spin_lock_bh()
| ->ad_port_selection_logic()
| ->try to find free aggregator[step3]
| ->try to find suitable aggregator[step4]
| ->did not find a suitable aggregator[step5]
| spin_unlock_bh()
| |
| |
bond_3ad_unbind_slave() |
spin_lock_bh()
spin_unlock_bh()
step1: already removed slaver1(eth1) from list, but port1 remains
step2: receive a lacpdu and update port0
step3: port0 will be removed from agg0.lag_ports. The struct is
"agg0.lag_ports -> port1" now, and agg0 is not free. At the
same time, slaver1/agg1 has been removed from the list by step1.
So we can't find a free aggregator now.
step4: can't find suitable aggregator because of step2
step5: cause a calltrace since port->aggregator is NULL
To solve this concurrency problem, put bond_upper_dev_unlink()
after bond_3ad_unbind_slave(). In this way, we can invalid the port
first and skip this port in bond_3ad_state_machine_handler(). This
eliminates the situation that the slaver has been removed from the
list but the port is still valid.
[1]https://lore.kernel.org/netdev/10374.1611947473@famine/
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>