IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Transform two checks in the 'ex' key parsing into netlink policies
removing extra if checks.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel will print several warnings in a short period of time
when it stalls. Like this:
First warning:
[ 7100.097547] ------------[ cut here ]------------
[ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out
[ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467
dev_watchdog+0x260/0x270
...
Second warning:
[ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000
softirq=367 3137/3673146 fqs=13844
[ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
[ 7147.756962] NMI backtrace for cpu 24
...
We calculate that the transmit queue start stall should occur before
7095s according to watchdog_timeo, the rcu start stall at 7087s.
These two times are close together, it is difficult to confirm which
happened first.
To let users know the exact time the stall started, print msecs when
the transmit queue time out.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most
appropriate helper I could find which strips away a VLAN header.
That's all I need it to do, but __skb_vlan_pop() has more logic, which
will become incompatible with the future revert of commit 6d1ccff62780
("net: reset mac header in dev_start_xmit()").
Namely, it performs a sanity check on skb_mac_header(), which will stop
being set after the above revert, so it will return an error instead of
removing the VLAN tag.
ocelot_xmit_get_vlan_info() gets called in 2 circumstances:
(1) the port is under a VLAN-aware bridge and the bridge sends
VLAN-tagged packets
(2) the port is under a VLAN-aware bridge and somebody else (an 8021q
upper) sends VLAN-tagged packets (using a VID that isn't in the
bridge vlan tables)
In case (1), there is actually no bug to defend against, because
br_dev_xmit() calls skb_reset_mac_header() and things continue to work.
However, in case (2), illustrated using the commands below, it can be
seen that our intervention is needed, since __skb_vlan_pop() complains:
$ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set $eth master br0 && ip link set $eth up
$ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up
$ ip addr add 192.168.100.1/24 dev $eth.100
I could fend off the checks in __skb_vlan_pop() with some
skb_mac_header_was_set() calls, but seeing how few callers of
__skb_vlan_pop() there are from TX paths, that seems rather
unproductive.
As an alternative solution, extract the bare minimum logic to strip a
VLAN header, and move it to a new helper named vlan_remove_tag(), close
to the definition of vlan_insert_tag(). Document it appropriately and
make ocelot_xmit_get_vlan_info() call this smaller helper instead.
Seeing that it doesn't appear illegal to test skb->protocol in the TX
path, I guess it would be a good for vlan_remove_tag() to also absorb
the vlan_set_encap_proto() function call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()")
will be reverted, it will no longer be true that skb->data points at
skb_mac_header(skb) - since the skb->mac_header will not be set - so
stop saying that, and just say that it points to the MAC header.
I've reviewed vlan_insert_tag() and it does not *actually* depend on
skb_mac_header(), so reword that to avoid the confusion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cosmetic patch which consolidates the code to use the helper
function offered by if_vlan.h.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use
skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's
located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to
get to the Ethernet header's MAC DA instead, helper which assumes this
header is located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_mac_header() will no longer be available in the TX path when
reverting commit 6d1ccff62780 ("net: reset mac header in
dev_start_xmit()"). As preparation for that, let's use
skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's
located at skb->data (assumption which holds true here).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to skb_eth_hdr() introduced in commit 96cc4b69581d ("macvlan: do
not assume mac_header is set in macvlan_broadcast()"), let's introduce a
skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get
to the VLAN header based on skb->data rather than based on the
skb_mac_header(skb).
We also consolidate the drivers that dereference skb->data to go through
this helper.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When converting from ASSERTCMP to WARN_ON, the tested condition must
be inverted, which was missed for this case.
This would cause an EIO error when trying to read an rxrpc token, for
instance when trying to display tokens with AuriStor's "tokens" command.
Fixes: 84924aac08a4 ("rxrpc: Fix checker warning")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Transform zero-length array into flexible-array member in struct
rxrpc_ackpacket.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
net/rxrpc/call_event.c:149:38: warning: array subscript i is outside array bounds of ‘uint8_t[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=]
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/263
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/ZAZT11n4q5bBttW0@work/
Signed-off-by: David S. Miller <davem@davemloft.net>
We use napi_threaded_poll() in order to reduce our softirq dependency.
We can add a followup of 821eba962d95 ("net: optimize napi_schedule_rps()")
to further remove the need of firing NET_RX_SOFTIRQ whenever
RPS/RFS are used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we call skb_defer_free_flush() from napi_threaded_poll(),
we can avoid to raise IPI from skb_attempt_defer_free()
when the list becomes too big.
This allows napi_threaded_poll() to rely less on softirqs,
and lowers latency caused by a too big list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We plan using skb_defer_free_flush() from napi_threaded_poll()
in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree_skb() can be called from hard irq handlers,
but skb_attempt_defer_free() is meant to be used
from process or BH contexts, and skb_defer_free_flush()
is meant to be called from BH contexts.
Not having to mask hard irq can save some cycles.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the rxrpc call set up by afs_make_call() receives an error whilst it is
transmitting the request, there's the possibility that it may get to the
point the rxrpc call is ended (after the error_kill_call label) just as the
call is queued for async processing.
This could manifest itself as call->rxcall being seen as NULL in
afs_deliver_to_call() when it tries to lock the call.
Fix this by splitting rxrpc_kernel_end_call() into a function to shut down
an rxrpc call and a function to release the caller's reference and calling
the latter only when we get to afs_put_call().
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-306@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Like commit ea30388baebc ("ipv6: Fix an uninit variable access bug in
__ip6_make_skb()"). icmphdr does not in skb linear region under the
scenario of SOCK_RAW socket. Access icmp_hdr(skb)->type directly will
trigger the uninit variable access bug.
Use a local variable icmp_type to carry the correct value in different
scenarios.
Fixes: 96793b482540 ("[IPV4]: Add ICMPMsgStats MIB (RFC 4293)")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZELn8wAKCRDbK58LschI
g1khAQC1nmXPuKjM4EAfFK8Ysb3KoF8ADmpE97n+/HEDydCagwD/bX0+NABR75Nh
ueGcoU1TcfcbshDzrH0s+C95owZDZw4=
=BeZM
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-04-21
We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).
The main changes are:
1) Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward,
from Florian Westphal.
2) Fix race between btf_put and btf_idr walk which caused a deadlock,
from Alexei Starovoitov.
3) Second big batch to migrate test_verifier unit tests into test_progs
for ease of readability and debugging, from Eduard Zingerman.
4) Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree, from Dave Marchevsky.
5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
selftests into libbpf-provided bpf_helpers.h header and improve
kfunc handling, from Andrii Nakryiko.
6) Support 64-bit pointers to kfuncs needed for archs like s390x,
from Ilya Leoshkevich.
7) Support BPF progs under getsockopt with a NULL optval,
from Stanislav Fomichev.
8) Improve verifier u32 scalar equality checking in order to enable
LLVM transformations which earlier had to be disabled specifically
for BPF backend, from Yonghong Song.
9) Extend bpftool's struct_ops object loading to support links,
from Kui-Feng Lee.
10) Add xsk selftest follow-up fixes for hugepage allocated umem,
from Magnus Karlsson.
11) Support BPF redirects from tc BPF to ifb devices,
from Daniel Borkmann.
12) Add BPF support for integer type when accessing variable length
arrays, from Feng Zhou.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
selftests/bpf: verifier/value_ptr_arith converted to inline assembly
selftests/bpf: verifier/value_illegal_alu converted to inline assembly
selftests/bpf: verifier/unpriv converted to inline assembly
selftests/bpf: verifier/subreg converted to inline assembly
selftests/bpf: verifier/spin_lock converted to inline assembly
selftests/bpf: verifier/sock converted to inline assembly
selftests/bpf: verifier/search_pruning converted to inline assembly
selftests/bpf: verifier/runtime_jit converted to inline assembly
selftests/bpf: verifier/regalloc converted to inline assembly
selftests/bpf: verifier/ref_tracking converted to inline assembly
selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
selftests/bpf: verifier/map_in_map converted to inline assembly
selftests/bpf: verifier/lwt converted to inline assembly
selftests/bpf: verifier/loops1 converted to inline assembly
selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
selftests/bpf: verifier/direct_packet_access converted to inline assembly
selftests/bpf: verifier/d_path converted to inline assembly
selftests/bpf: verifier/ctx converted to inline assembly
selftests/bpf: verifier/btf_ctx_access converted to inline assembly
selftests/bpf: verifier/bpf_get_stack converted to inline assembly
...
====================
Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a
xfrm4_fill_dst() call in between, causes the following BUG:
BUG: spinlock bad magic on CPU#0, fbxhostapd/732
lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9
Hardware name: Marvell Kirkwood (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x28/0x30
dump_stack_lvl from do_raw_spin_lock+0x20/0x80
do_raw_spin_lock from rt_del_uncached_list+0x30/0x64
rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc
xfrm4_dst_destroy from dst_destroy+0x5c/0xb0
dst_destroy from rcu_process_callbacks+0xc4/0xec
rcu_process_callbacks from __do_softirq+0xb4/0x22c
__do_softirq from call_with_stack+0x1c/0x24
call_with_stack from do_softirq+0x60/0x6c
do_softirq from __local_bh_enable_ip+0xa0/0xcc
Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved
rt_uncached and rt_uncached_list fields from rtable struct to dst
struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in
xfrm_alloc_dst().
Note that rt_uncached (list_head) was never properly initialized at
alloc time, but xfrm[46]_dst_destroy() is written in such a way that
it was not an issue thanks to the memset:
if (xdst->u.rt.dst.rt_uncached_list)
rt_del_uncached_list(&xdst->u.rt);
The route code does it the other way around: rt_uncached_list is
assumed to be valid IIF rt_uncached list_head is not empty:
void rt_del_uncached_list(struct rtable *rt)
{
if (!list_empty(&rt->dst.rt_uncached)) {
struct uncached_list *ul = rt->dst.rt_uncached_list;
spin_lock_bh(&ul->lock);
list_del_init(&rt->dst.rt_uncached);
spin_unlock_bh(&ul->lock);
}
}
This patch adds mandatory rt_uncached list_head initialization in
generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the
rest of the code.
Fixes: d288a162dd1c ("net: dst: Prevent false sharing vs. dst_entry:: __refcnt")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304162125.18b7bcdd-oliver.sang@intel.com
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
CC: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Link: https://lore.kernel.org/r/20230420182508.2417582-1-mbizon@freebox.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only
when use_action_miss is true so it assumes in other case that
the field is set to NULL by the caller. If not then the field
contains garbage and subsequent tcf_exts_destroy() call results
in a crash.
Ensure that the field .miss_cookie_node pointer is NULL when
use_action_miss parameter is false to avoid this potential scenario.
Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230420183634.1139391-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
if sch_fq is configured with "initial quantum" having values greater than
INT_MAX, the first assignment of "credit" does signed integer overflow to
a very negative value.
In this situation, the syzkaller script provided by Cristoph triggers the
CPU soft-lockup warning even with few sockets. It's not an infinite loop,
but "credit" wasn't probably meant to be minus 2Gb for each new flow.
Capping "initial quantum" to INT_MAX proved to fix the issue.
v2: validation of "initial quantum" is done in fq_policy, instead of open
coding in fq_change() _ suggested by Jakub Kicinski
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/377
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7b3a3c7e36d03068707a021760a194a8eb5ad41a.1682002300.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Relax netdev chain creation to allow for loading the ruleset, then
adding/deleting devices at a later stage. Hardware offload does not
support for this feature yet.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rename nft_flowtable_hooks_destroy() by nft_hooks_destroy() to prepare
for netdev chain device updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In most cases, table, name and handle is sufficient for userspace to
identify an object that has been deleted. Skipping unneeded fields in
the netlink attributes in the message saves bandwidth (ie. less chances
of hitting ENOBUFS).
Rules are an exception: the existing userspace monitor code relies on
the rule definition. This exception can be removed by implementing a
rule cache in userspace, this is already supported by the tracing
infrastructure.
Regarding flowtables, incremental deletion of devices is possible.
Skipping a full notification allows userspace to differentiate between
flowtable removal and incremental removal of devices.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Flowtable and netdev chains are bound to one or several netdevice,
extend netlink error reporting to specify the the netdevice that
triggers the error.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Remove EnterFunction and LeaveFunction.
These debugging macros seem well past their use-by date. And seem to
have little value these days. Removing them allows some trivial cleanup
of some exit paths for some functions. These are also included in this
patch. There is likely scope for further cleanup of both debugging and
unwind paths. But let's leave that for another day.
Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG
is enabled. Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Consistently use array_size() to calculate the size of ip_vs_conn_tab
in bytes.
Flagged by Coccinelle:
WARNING: array_size is already used (line 1498) to compute the same size
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options.
That structure looks like this:
struct ip_vs_sync_conn_options {
struct ip_vs_seq in_seq;
struct ip_vs_seq out_seq;
};
The source of the copy is the in_seq field of struct ip_vs_conn. Whose
type is struct ip_vs_seq. Thus we can see that the source - is not as
wide as the amount of data copied, which is the width of struct
ip_vs_sync_conn_option.
The copy is safe because the next field in is another struct ip_vs_seq.
Make use of struct_group() to annotate this.
Flagged by gcc-13 as:
In file included from ./include/linux/string.h:254,
from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/timex.h:5,
from ./include/linux/timex.h:67,
from ./include/linux/time32.h:13,
from ./include/linux/time.h:60,
from ./include/linux/stat.h:19,
from ./include/linux/module.h:13,
from net/netfilter/ipvs/ip_vs_sync.c:38:
In function 'fortify_memcpy_chk',
inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
529 | __read_overflow2_field(q_size_field, size);
|
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
pass it as argument instead. This reduces size of traceinfo to
16 bytes. Total stack usage:
nf_tables_core.c:252 nft_do_chain 304 static
While its possible to also pass basechain as argument, doing so
increases nft_do_chaininfo function size.
Unlike pktinfo/verdict/rule the basechain info isn't used in
the expression evaluation path. gcc places it on the stack, which
results in extra push/pop when it gets passed to the trace helpers
as argument rather than as part of the traceinfo structure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes:
nf_tables_core.c:256 nft_do_chain 312 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
pass it as argument. No change in object size.
stack usage decreases by 8 byte:
nf_tables_core.c:254 nft_do_chain 320 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This helper is inlined, so keep it as small as possible.
If the static key is true, there is only a very small chance
that info->trace is false:
1. tracing was enabled at this very moment, the static key was
updated to active right after nft_do_table was called.
2. tracing was disabled at this very moment.
trace->info is already false, the static key is about to
be patched to false soon.
In both cases, no event will be sent because info->trace
is false (checked in noinline slowpath). info->nf_trace is irrelevant.
The nf_trace update is redunant in this case, but this will only
happen for short duration, when static key flips.
text data bss dec hex filename
old: 2980 192 32 3204 c84 nf_tables_core.o
new: 2964 192 32 3188 c74i nf_tables_core.o
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We only need to validate tables that saw changes in the current
transaction.
The existing code revalidates all tables, but this isn't needed as
cross-table jumps are not allowed (chains have table scope).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ->cleanup callback needs to be removed, this doesn't work anymore as
the transaction mutex is already released in the ->abort function.
Just do it after a successful validation pass, this either happens
from commit or abort phases where transaction mutex is held.
Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that the rule trailer/end marker and the rcu head reside in the
same structure, we no longer need to save/restore the chain pointer
when performing/returning from a jump.
We can simply let the trace infra walk the evaluated rule until it
hits the end marker and then fetch the chain pointer from there.
When the rule is NULL (policy tracing), then chain and basechain
pointers were already identical, so just use the basechain.
This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit,
scripts/stackusage says:
nf_tables_core.c:251 nft_do_chain 328 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Walk the rule headers until the trailer one (last_bit flag set) instead
of stopping at last_rule address.
This avoids the need to store the address when jumping to another chain.
This cuts size of jumpstack array by one third, on 64bit from
384 to 256 bytes. Still, stack usage is still quite large:
scripts/stackusage:
nf_tables_core.c:258 nft_do_chain 496 static
Next patch will also remove chain pointer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In order to free the rules in a chain via call_rcu, the rule array used
to stash a rcu_head and space for a pointer at the end of the rule array.
When the current nft_rule_dp blob format got added in
2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout"), this results
in a double-trailer:
size (unsigned long)
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp (is_last=1) // Trailer
The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size,
so it can be located via start_addr + size.
Because the rcu_head is stored after 'start+size' as well this means the
is_last trailer is *aliased* to the rcu_head (struct nft_rules_old).
This is harmless, because at this time the nft_do_chain function never
evaluates/accesses the trailer, it only checks the address boundary:
for (; rule < last_rule; rule = nft_rule_next(rule)) {
...
But this way the last_rule address has to be stashed in the jump
structure to restore it after returning from a chain.
nft_do_chain stack usage has become way too big, so put it on a diet.
Without this patch is impossible to use
for (; !rule->is_last; rule = nft_rule_next(rule)) {
... because on free, the needed update of the rcu_head will clobber the
nft_rule_dp is_last bit.
Furthermore, also stash the chain pointer in the trailer, this allows
to recover the original chain structure from nf_tables_trace infra
without a need to place them in the jump struct.
After this patch it is trivial to diet the jump stack structure,
done in the next two patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
add glue code so a bpf program can be run using userspace-provided
netfilter state and packet/skb.
Default is to use ipv4:output hook point, but this can be overridden by
userspace. Userspace provided netfilter state is restricted, only hook and
protocol families can be overridden and only to ipv4/ipv6.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This is just to avoid ordering issues between multiple bpf programs,
this could be removed later in case it turns out to be too cautious.
bpf prog could still be shared with non-bpf hook, otherwise we'd have to
make conntrack hook registration fail just because a bpf program has
same priority.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This allows userspace ("nft list hooks") to show which bpf program
is attached to which hook.
Without this, user only knows bpf prog is attached at prio
x, y, z at INPUT and FORWARD, but can't tell which program is where.
v4: kdoc fixups (Simon Horman)
Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.
Invocation incurs an indirect call. This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.
This isn't done here to keep the size of this chunk down.
Verifier restricts verdicts to either DROP or ACCEPT.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add bpf_link support skeleton. To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.
Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.
Uapi example usage:
union bpf_attr attr = { };
attr.link_create.prog_fd = progfd;
attr.link_create.attach_type = 0; /* unused */
attr.link_create.netfilter.pf = PF_INET;
attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
attr.link_create.netfilter.priority = -128;
err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
... this would attach progfd to ipv4:input hook.
Such hook gets removed automatically if the calling program exits.
BPF_NETFILTER program invocation is added in followup change.
NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.
Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.
v4: arm32 cmpxchg only supports 32bit operand
s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmRCa3oACgkQ1V2XiooU
IOTw2BAAsMQ3tjhZqvQ4zxZbb770n9qawW5vR1Pjq/C63x07qQiXZf+VL41Ff77s
6acsyD5vYod3SDTR7Flx1xxFEgrQ4i7/81Jjt4VFqFmD3owKH6lm9R3H9Ae3v//5
uDDNrbVDwjP4ZPZx96A1Z2SLwlo8+K/IJb98rLlS2v/8IjvPZMy17Oh9/FThsxNw
LXuNuK6HTd+s2MkT8pUv3QWb/20Sb/ZVOEY6cUx2mD8DedmZBtUbzhisnnIVA+AX
tP/VK3cPW/PKoNCiDwXMnqpgKUk31L2fpaTGHW0CWsDq0qIEZkmEMxN7WWFMVK/G
7cq//ZGuQU+O09sM1rG51ImjG+2QISveUAKu99kb/mnlRtxNIhfQkWQDuO9LFqxI
kd31C2m4oJwsmWfRvitSmKmQs6i7Js+PL25FqxAXxq6VnDTG4Cg9ryBvicRkGhvO
+Ks2syeWflSQ9NVWWad30NsDyogQ0xy+7Lk/QIzb0hEWR0vGDWfyHNgu2z3QtQjd
ftcAw6u9LtLYV/01XFOEYMptpi8Ecdot8+rX4hz7NSPNQm1WnpbEinT4sBoDGNxe
9PoByIJ9lBeQgpWlbe7PTXBSIYF6p8gXg44N/LOpmaUXGs/h9IrNLXoakGlljcH8
uYLiHOh3lzgwY5Ex+UAMJlAjWoGVIVKyeRf08Bz4PBG5jBL30Qk=
=ck6R
-----END PGP SIGNATURE-----
Merge tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Set on IPS_CONFIRMED before change_status() otherwise EBUSY is
bogusly hit. This bug was introduced in the 6.3 release cycle.
2) Fix nfnetlink_queue conntrack support: Set/dump timeout
accordingly for unconfirmed conntrack entries. Make sure this
is done after IPS_CONFIRMED is set on. This is an old bug, it
happens since the introduction of this feature.
* tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: fix wrong ct->timeout value
netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()
====================
Link: https://lore.kernel.org/r/20230421105700.325438-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.
Major changes:
cfg80211/mac80211
* fix some Fine Time Measurement (FTM) frames not being bufferable
* flush frames before key removal to avoid potential unencrypted
transmission depending on the hardware design
iwlwifi
* preparation for Wi-Fi 7 EHT and multi-link support
rtw88
* SDIO bus support
* RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support
rtw89
* framework firmware backwards compatibility
brcmfmac
* Cypress 43439 SDIO support
mt76
* mt7921 P2P support
* mt7996 mesh A-MSDU support
* mt7996 EHT support
* mt7996 coredump support
wcn36xx
* support for pronto v3 hardware
ath11k
* PCIe DeviceTree bindings
* WCN6750: enable SAR support
ath10k
* convert DeviceTree bindings to YAML
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmRCaTURHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZvcRwf+NcLS4HbmqGZhBxl2LZVZ6AFCBM4ijDlO
pxdMiC4UxT+UApY1/9YXo0VS97M7paDJH+R/g1HcTvvKURHCmsdhYHm+R1MH+/uD
r8RfvJg4VtNnlUpsJh9jxt+e697KP15M7DF0sFlQzdIoTUl13Hp7YhI76zunAbAN
u1FBcVVJiCcJWbLolMzqAeBMUWUEG+GtHF6Zn5kChVU/p1nmwJMPUG3Qvb61a7Yc
BM1pQX8jQ8PBj+VrGPGvqX0BOdbxq0evauYScq2oTOhQ1fzTNWOsI1yI7AwApptR
itwQ2t1UK/C/EWpvWIBSd0nit1uwSx0Zsu/nSZlbKbrvIFwd5XnfwQ==
=Irrd
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.4
Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.
Major changes:
cfg80211/mac80211
- fix some Fine Time Measurement (FTM) frames not being bufferable
- flush frames before key removal to avoid potential unencrypted
transmission depending on the hardware design
iwlwifi
- preparation for Wi-Fi 7 EHT and multi-link support
rtw88
- SDIO bus support
- RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support
rtw89
- framework firmware backwards compatibility
brcmfmac
- Cypress 43439 SDIO support
mt76
- mt7921 P2P support
- mt7996 mesh A-MSDU support
- mt7996 EHT support
- mt7996 coredump support
wcn36xx
- support for pronto v3 hardware
ath11k
- PCIe DeviceTree bindings
- WCN6750: enable SAR support
ath10k
- convert DeviceTree bindings to YAML
* tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits)
wifi: rtw88: Update spelling in main.h
wifi: airo: remove ISA_DMA_API dependency
wifi: rtl8xxxu: Simplify setting the initial gain
wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear}
wifi: rtl8xxxu: Don't print the vendor/product/serial
wifi: rtw88: Fix memory leak in rtw88_usb
wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
wifi: rtw88: rtw8821c: Fix rfe_option field width
wifi: rtw88: usb: fix priority queue to endpoint mapping
wifi: rtw88: 8822c: add iface combination
wifi: rtw88: handle station mode concurrent scan with AP mode
wifi: rtw88: prevent scan abort with other VIFs
wifi: rtw88: refine reserved page flow for AP mode
wifi: rtw88: disallow PS during AP mode
wifi: rtw88: 8822c: extend reserved page number
wifi: rtw88: add port switch for AP mode
wifi: rtw88: add bitmap for dynamic port settings
wifi: rtw89: mac: use regular int as return type of DLE buffer request
wifi: mac80211: remove return value check of debugfs_create_dir()
...
====================
Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Packet sockets, like tap, can be used as the backend for kernel vhost.
In packet sockets, virtio net header size is currently hardcoded to be
the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
always the case: some virtio features, such as mrg_rxbuf, need virtio
net header to be 12-byte long.
Mergeable buffers, as a virtio feature, is worthy of supporting: packets
that are larger than one-mbuf size will be dropped in vhost worker's
handle_rx if mrg_rxbuf feature is not used, but large packets
cannot be avoided and increasing mbuf's size is not economical.
With this virtio feature enabled by virtio-user, packet sockets with
hardcoded 10-byte virtio net header will parse mac head incorrectly in
packet_snd by taking the last two bytes of virtio net header as part of
mac header.
This incorrect mac header parsing will cause packet to be dropped due to
invalid ether head checking in later under-layer device packet receiving.
By adding extra field vnet_hdr_sz with utilizing holes in struct
packet_sock to record currently used virtio net header size and supporting
extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
sockets can know the exact length of virtio net header that virtio user
gives.
In packet_snd, tpacket_snd and packet_recvmsg, instead of using
hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
corresponding packet_sock, and parse mac header correctly based on this
information to avoid the packets being mistakenly dropped.
Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com>
Co-developed-by: Anqi Shen <amy.saq@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new bridge port attribute that allows user space to enable
per-{Port, VLAN} neighbor suppression. Example:
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
false
# bridge link set dev swp1 neigh_vlan_suppress on
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
true
# bridge link set dev swp1 neigh_vlan_suppress off
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
false
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>