1107105 Commits

Author SHA1 Message Date
Christophe JAILLET
5787db7c90 netfilter: ipvs: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21 00:55:39 +02:00
James Yonan
9d2f00fb0a netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
nf_nat_initialized() doesn't modify passed struct nf_conn,
so declare as const.

This is helpful for code readability and makes it possible
to call nf_nat_initialized() with a const struct nf_conn *.

Signed-off-by: James Yonan <james@openvpn.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-14 00:24:06 +02:00
Florian Westphal
6b77205374 netfilter: nf_tables: move nft_cmp_fast_mask to where its used
... and cast result to u32 so sparse won't complain anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
ffb3d9a30c netfilter: nf_tables: use correct integer types
Sparse tool complains about mixing of different endianess
types, so use the correct ones.

Add type casts where needed.

objdiff shows no changes except in nft_tunnel (type is changed).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
7278b3c1e4 netfilter: nf_tables: add and use BE register load-store helpers
Same as the existing ones, no conversions. This is just for sparse sake
only so that we no longer mix be16/u16 and be32/u32 types.

Alternative is to add __force __beX in various places, but this
seems nicer.

objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
d86473bf2f netfilter: nf_tables: use the correct get/put helpers
Switch to be16/32 and u16/32 respectively.  No code changes here,
the functions do the same thing, this is just for sparse checkers' sake.

objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
168141f7e0 netfilter: x_tables: use correct integer types
Sparse complains because __be32 and u32 are mixed without
conversions.  Use the correct types, no code changes.

Furthermore, xt_DSCP generates a bit truncation warning:
"cast truncates bits from constant value (ffffff03 becomes 3)"

The truncation is fine (and wanted). Add a private definition and use that
instead.

objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:45 +02:00
Florian Westphal
ec6f2ff0a3 netfilter: nfnetlink: add missing __be16 cast
Sparse flags this as suspicious, because this compares
integer with a be16 with no conversion.

Its a compat check for old userspace that sends host byte order,
so force a be16 cast here.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:45 +02:00
Zhang Jiaming
f72547473f netfilter: nft_set_bitmap: Fix spelling mistake
Change 'succesful' to 'successful'.
Change 'transation' to 'transaction'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:37 +02:00
Florian Westphal
d3f2d0a292 netfilter: h323: merge nat hook pointers into one
sparse complains about incorrect rcu usage.

Code uses the correct rcu access primitives, but the function pointers
lack rcu annotations.

Collapse all of them into a single structure, then annotate the pointer.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:16 +02:00
Florian Westphal
e14575fa75 netfilter: nf_conntrack: use rcu accessors where needed
Sparse complains about direct access to the 'helper' and timeout members.
Both have __rcu annotation, so use the accessors.

xt_CT is fine, accesses occur before the structure is visible to other
cpus.  Switch to rcu accessors there as well to reduce noise.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:15 +02:00
Florian Westphal
6976890e89 netfilter: nf_conntrack: add missing __rcu annotations
Access to the hook pointers use correct helpers but the pointers lack
the needed __rcu annotation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:15 +02:00
Vlad Buslov
b038177636 netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and
'stats' flow_table offload workqueue tasks. Counters are incremented before
scheduling new task and decremented when workqueue handler finishes
executing. These counters allow user to diagnose congestion on hardware
offload workqueues that can happen when either CPU is starved and workqueue
jobs are executed at lower rate than new ones are added or when
hardware/driver can't keep up with the rate.

Implement the described counters as percpu counters inside new struct
netns_ft which is stored inside struct net. Expose them via new procfs file
'/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack'
file.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:14 +02:00
Vlad Buslov
fc54d9065f net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
Following patches in series use the pointer to access flow table offload
debug variables.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:14 +02:00
Bill Wendling
b8acd43148 netfilter: conntrack: use correct format characters
When compiling with -Wformat, clang emits the following warnings:

net/netfilter/nf_conntrack_helper.c:168:18: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
                request_module(mod_name);
                               ^~~~~~~~

Use a string literal for the format string.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Bill Wendling <isanbard@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:14 +02:00
Jackie Liu
6be7915612 netfilter: conntrack: use fallthrough to cleanup
These cases all use the same function. we can simplify the code through
fallthrough.

$ size net/netfilter/nf_conntrack_core.o

        text	   data	    bss	    dec	    hex	filename
before  81601	  81430	    768	 163799	  27fd7	net/netfilter/nf_conntrack_core.o
after   80361	  81430	    768	 162559	  27aff	net/netfilter/nf_conntrack_core.o

Arch: aarch64
Gcc : gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:13 +02:00
Jilin Yuan
edb2c3476d fddi/skfp: fix repeated words in comments
Delete the redundant word 'test'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 14:12:54 +01:00
Jilin Yuan
1377a5b2d4 ethernet/via: fix repeated words in comments
Delete the redundant word 'driver'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 14:12:54 +01:00
sewookseo
e22aa14866 net: Find dst with sk's xfrm policy not ctl_sk
If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 13:39:56 +01:00
Jakub Kicinski
0076cad301 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-07-09

We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).

The main changes are:

1) Add new way for performing BTF type queries to BPF, from Daniel Müller.

2) Add inlining of calls to bpf_loop() helper when its function callback is
   statically known, from Eduard Zingerman.

3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.

4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
   hooks, from Stanislav Fomichev.

5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.

6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.

7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
   selftests, from Magnus Karlsson & Maciej Fijalkowski.

8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.

9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.

10) Sockmap optimizations around throughput of UDP transmissions which have been
    improved by 61%, from Cong Wang.

11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.

12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.

13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.

14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
    macro, from James Hilliard.

15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.

16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
  selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Disable SEC pragma macro on GCC
  bpf: Check attach_func_proto more carefully in check_return_code
  selftests/bpf: Add test involving restrict type qualifier
  bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
  MAINTAINERS: Add entry for AF_XDP selftests files
  selftests, xsk: Rename AF_XDP testing app
  bpf, docs: Remove deprecated xsk libbpf APIs description
  selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
  libbpf, riscv: Use a0 for RC register
  libbpf: Remove unnecessary usdt_rel_ip assignments
  selftests/bpf: Fix few more compiler warnings
  selftests/bpf: Fix bogus uninitialized variable warning
  bpftool: Remove zlib feature test from Makefile
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  selftests/bpf: Add type match test against kernel's task_struct
  selftests/bpf: Add nested type to type based tests
  ...
====================

Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09 12:24:16 -07:00
Linus Walleij
877d4e3ced ixp4xx_eth: Set MAC address from device tree
If there is a MAC address specified in the device tree, then
use it. This is already perfectly legal to specify in accordance
with the generic ethernet-controller.yaml schema.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:34:18 +01:00
Linus Walleij
b3ba206ce8 ixp4xx_eth: Fall back to random MAC address
If the firmware does not provide a MAC address to the driver,
fall back to generating a random MAC address.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:34:17 +01:00
Eric Dumazet
44ac441a51 af_unix: fix unix_sysctl_register() error path
We want to kfree(table) if @table has been kmalloced,
ie for non initial network namespace.

Fixes: 849d5aa3a1d8 ("af_unix: Do not call kmemdup() for init_net's sysctl table.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:27:33 +01:00
David S. Miller
be587adbf8 Merge branch 'mptcp-selftest-improvements-and-header-tweak'
Mat Martineau says:

====================
mptcp: Self test improvements and a header tweak

Patch 1 moves a definition to a header so it can be used in a struct
declaration.

Patch 2 adjusts a time threshold for a selftest that runs much slower on
debug kernels (and even more on slow CI infrastructure), to reduce
spurious failures.

Patches 3 & 4 improve userspace PM test coverage.

Patches 5 & 6 clean up output from a test script and selftest helper
tool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
65ebc6676d selftests: mptcp: update pm_nl_ctl usage header
The usage header of pm_nl_ctl command doesn't match with the context. So
this patch adds the missing userspace PM keywords 'ann', 'rem', 'csf',
'dsf', 'events' and 'listen' in it.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
507719cd7c selftests: mptcp: avoid Terminated messages in userspace_pm
There're some 'Terminated' messages in the output of userspace pm tests
script after killing './pm_nl_ctl events' processes:

Created network namespaces ns1, ns2         			[OK]
./userspace_pm.sh: line 166: 13735 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13737 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv4 MPTCP Connection ns2 => ns1    		[OK]
./userspace_pm.sh: line 166: 13753 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13755 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv6 MPTCP Connection ns2 => ns1    		[OK]
ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token    		[OK]

This patch adds a helper kill_wait(), in it using 'wait $pid 2>/dev/null'
commands after 'kill $pid' to avoid printing out these Terminated messages.
Use this helper instead of using 'kill $pid'.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
5e986ec468 selftests: mptcp: userspace pm subflow tests
This patch adds userspace pm subflow tests support for mptcp_join.sh
script. Add userspace pm create subflow and destroy test cases in
userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
97040cf980 selftests: mptcp: userspace pm address tests
This patch adds userspace pm tests support for mptcp_join.sh script. Add
userspace pm add_addr and rm_addr test cases in userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Paolo Abeni
d0d9c8f2df selftests: mptcp: tweak simult_flows for debug kernels
The mentioned test measures the transfer run-time to verify
that the user-space program is able to use the full aggregate B/W.

Even on (virtual) link-speed-bound tests, debug kernel can slow
down the transfer enough to cause sporadic test failures.

Instead of unconditionally raising the maximum allowed run-time,
tweak when the running kernel is a debug one, and use some simple/
rough heuristic to guess such scenarios.

Note: this intentionally avoids looking for /boot/config-<version> as
the latter file is not always available in our reference CI
environments.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:23 +01:00
Geliang Tang
f7657ff4a7 mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to
include/net/mptcp.h.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:23 +01:00
Yang Yingliang
9f7cb73ef6 bcm63xx_enet: change the driver variables to static
bcm63xx_enetsw_driver and bcm63xx_enet_driver are only used in
bcm63xx_enet.c now, change them to static.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220707135801.1483941-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:31:42 -07:00
Russell King (Oracle)
6d1ce9c038 net: phylink: fix SGMII inband autoneg enable
When we are operating in SGMII inband mode, it implies that there is a
PHY connected, and the ethtool advertisement for autoneg applies to
the PHY, not the SGMII link. When in 1000base-X mode, then this applies
to the 802.3z link and needs to be applied to the PCS.

Fix this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1o9Ng2-005Qbe-3H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:19:19 -07:00
Antoine Tenart
40ad0a52ef Documentation: add a description for net.core.high_order_alloc_disable
A description is missing for the net.core.high_order_alloc_disable
option in admin-guide/sysctl/net.rst ; add it. The above sysctl option
was introduced by commit ce27ec60648d ("net: add high_order_alloc_disable
sysctl/static key").

Thanks to Eric for running again the benchmark cited in the above
commit, showing this knob is now mostly of historical importance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:15:59 -07:00
Justin Stitt
5b47d23646 net: rxrpc: fix clang -Wformat warning
When building with Clang we encounter this warning:
| net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short'
| but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat]
| _leave(" = %d [set %hx]", ret, y);

y is a u32 but the format specifier is `%hx`. Going from unsigned int to
short int results in a loss of data. This is surely not intended
behavior. If it is intended, the warning should be suppressed through
other means.

This patch should get us closer to the goal of enabling the -Wformat
flag for Clang builds.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:15:11 -07:00
Jakub Kicinski
16bd188eae Merge branch 'tls-pad-strparser-internal-header-decrypt_ctx-etc'
Jakub Kicinski says:

====================
tls: pad strparser, internal header, decrypt_ctx etc.

A grab bag of non-functional refactoring to make the series
which will let us decrypt into a fresh skb smaller.

Patches in this series are not strictly required to get the
decryption into a fresh skb going, they are more in the "things
which had been annoying me for a while" category.
====================

Link: https://lore.kernel.org/r/20220708010314.1451462-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:49 -07:00
Jakub Kicinski
35560b7f06 tls: rx: make tls_wait_data() return an recvmsg retcode
tls_wait_data() sets the return code as an output parameter
and always returns ctx->recv_pkt on success.

Return the error code directly and let the caller read the skb
from the context. Use positive return code to indicate ctx->recv_pkt
is ready.

While touching the definition of the function rename it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
5879031423 tls: create an internal header
include/net/tls.h is getting a little long, and is probably hard
for driver authors to navigate. Split out the internals into a
header which will live under net/tls/. While at it move some
static inlines with a single user into the source files, add
a few tls_ prefixes and fix spelling of 'proccess'.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
03957d8405 tls: rx: coalesce exit paths in tls_decrypt_sg()
Jump to the free() call, instead of having to remember
to free the memory in multiple places.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
b89fec54fd tls: rx: wrap decrypt params in a struct
The max size of iv + aad + tail is 22B. That's smaller
than a single sg entry (32B). Don't bother with the
memory packing, just create a struct which holds the
max size of those members.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
50a07aa531 tls: rx: always allocate max possible aad size for decrypt
AAD size is either 5 or 13. Really no point complicating
the code for the 8B of difference. This will also let us
turn the chunked up buffer into a sane struct.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
2d91ecace6 strparser: pad sk_skb_cb to avoid straddling cachelines
sk_skb_cb lives within skb->cb[]. skb->cb[] straddles
2 cache lines, each containing 24B of data.
The first cache line does not contain much interesting
information for users of strparser, so pad things a little.
Previously strp_msg->full_len would live in the first cache
line and strp_msg->offset in the second.

We need to reorder the 8 byte temp_reg with struct tls_msg
to prevent a 4B hole which would push the struct over 48B.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:44 -07:00
Maxim Mikityanskiy
24bdfdd2ec selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member
BPF_F_CURRENT_NETNS are not exposed. This commit allows building the
xdp_synproxy selftest in such cases. Note that nf_conntrack must be
loaded before running the test if it's compiled as a module.

This commit also allows this selftest to be successfully compiled when
CONFIG_NF_CONNTRACK is disabled.

One unused local variable of type struct bpf_ct_opts is also removed.

Fixes: fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers")
Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
2022-07-08 15:58:45 -07:00
Daniel Müller
06cd4e9d5d bpf: Correctly propagate errors up from bpf_core_composites_match
This change addresses a comment made earlier [0] about a missing return
of an error when __bpf_core_types_match is invoked from
bpf_core_composites_match, which could have let to us erroneously
ignoring errors.

Regarding the typedef name check pointed out in the same context, it is
not actually an issue, because callers of the function perform a name
check for the root type anyway. To make that more obvious, let's add
comments to the function (similar to what we have for
bpf_core_types_are_compat, which is called in pretty much the same
context).

[0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
2022-07-08 15:31:43 -07:00
James Hilliard
18410251f6 libbpf: Disable SEC pragma macro on GCC
It seems the gcc preprocessor breaks with pragmas when surrounding
__attribute__.

Disable these pragmas on GCC due to upstream bugs see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400

Fixes errors like:
error: expected identifier or '(' before '#pragma'
  106 | SEC("cgroup/bind6")
      | ^~~

error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma'
  114 | char _license[] SEC("license") = "GPL";
      | ^~~

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com
2022-07-08 15:11:34 -07:00
Stanislav Fomichev
d1a6edecc1 bpf: Check attach_func_proto more carefully in check_return_code
Syzkaller reports the following crash:

  RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline]
  RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline]
  RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610

With the following reproducer:

  bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80)

Because we don't enforce expected_attach_type for XDP programs,
we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP'
part in check_return_code and follow up with testing
`prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto`
is NULL.

Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that
attach ..." condition. Also, don't skip return code check for
LSM/STRUCT_OPS.

The above actually brings an issue with existing selftest which
tries to return EPERM from void inet_csk_clone. Fix the
test (and move called_socket_clone to make sure it's not
incremented in case of an error) and add a new one to explicitly
verify this condition.

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
2022-07-08 23:01:26 +02:00
Sieng-Piaw Liew
67d7ebdeb2 net: ag71xx: switch to napi_build_skb() to reuse skbuff_heads
napi_build_skb() reuses NAPI skbuff_head cache in order to save some
cycles on freeing/allocating skbuff_heads on every new Rx or completed
Tx.
Use napi_consume_skb() to feed the cache with skbuff_heads of completed
Tx, so it's never empty. The budget parameter is added to indicate NAPI
context, as a value of zero can be passed in the case of netpoll.

Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 14:23:25 +01:00
Eric Dumazet
c2dd4059dc net: minor optimization in __alloc_skb()
TCP allocates 'fast clones' skbs for packets in tx queues.

Currently, __alloc_skb() initializes the companion fclone
field to SKB_FCLONE_CLONE, and leaves other fields untouched.

It makes sense to defer this init much later in skb_clone(),
because all fclone fields are copied and hot in cpu caches
at that time.

This removes one cache line miss in __alloc_skb(), cost seen
on an host with 256 cpus all competing on memory accesses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 14:21:08 +01:00
Hariprasad Kelam
8e15145792 octeontx2-af: Don't reset previous pfc config
Current implementation is such that driver first resets the
existing PFC config before applying new pfc configuration.
This creates a problem like once PF or VFs requests PFC config
previous pfc config by other PFVfs is getting reset.

This patch fixes the problem by removing unnecessary resetting
of PFC config. Also configure Pause quanta value to smaller as
current value is too high.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 14:05:22 +01:00
Daniel Müller
32e0d9b310 selftests/bpf: Add test involving restrict type qualifier
This change adds a type based test involving the restrict type qualifier
to the BPF selftests. On the btfgen path, this will verify that bpftool
correctly handles the corresponding RESTRICT BTF kind.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220706212855.1700615-3-deso@posteo.net
2022-07-08 14:27:03 +02:00
Daniel Müller
aad53f17f0 bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
This change adjusts bpftool's type marking logic, as used in conjunction
with TYPE_EXISTS relocations, to correctly recognize and handle the
RESTRICT BTF kind.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220623212205.2805002-1-deso@posteo.net/T/#m4c75205145701762a4b398e0cdb911d5b5305ffc
Link: https://lore.kernel.org/bpf/20220706212855.1700615-2-deso@posteo.net
2022-07-08 14:26:39 +02:00