IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
While investigating a related syzbot report,
I found that whenever call to tcf_exts_init()
from u32_init_knode() is failing, we end up
with an elevated refcount on ht->refcnt
To avoid that, only increase the refcount after
all possible errors have been evaluated.
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some unknown reason qdisc_reset() is using
a convoluted way of freeing two lists of skbs.
Use __skb_queue_purge() instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220414011004.2378350-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the mentioned helper function by removing ternary operator. The
expression that is there outputs the boolean value by itself.
This helper might be used in the hot path so this simplification can
also be considered as micro optimization.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-15-maciej.fijalkowski@intel.com
Inspired by patch that made xdp_do_redirect() return values for XSKMAP
more meaningful, return -ENXIO instead of -EINVAL for socket being
unbound in xsk_rcv_check() as this is the usual value that is returned
for such event. In turn, it is now possible to easily distinguish what
went wrong, which is a bit harder when for both cases checked, -EINVAL
was returned.
Return codes can be counted in a nice way via bpftrace oneliner that
Jesper has shown:
bpftrace -e 'tracepoint:xdp:xdp_redirect* {@err[-args->err] = count();}'
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220413153015.453864-3-maciej.fijalkowski@intel.com
The error codes returned by xdp_do_redirect() when redirecting a frame
to an AF_XDP socket has not been very useful. A driver could not
distinguish between different errors. Prior this change the following
codes where used:
Socket not bound or incorrect queue/netdev: EINVAL
XDP frame/AF_XDP buffer size mismatch: ENOSPC
Could not allocate buffer (copy mode): ENOSPC
AF_XDP Rx buffer full: ENOSPC
After this change:
Socket not bound or incorrect queue/netdev: EINVAL
XDP frame/AF_XDP buffer size mismatch: ENOSPC
Could not allocate buffer (copy mode): ENOMEM
AF_XDP Rx buffer full: ENOBUFS
An AF_XDP zero-copy driver can now potentially determine if the
failure was due to a full Rx buffer, and if so stop processing more
frames, yielding to the userland AF_XDP application.
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220413153015.453864-2-maciej.fijalkowski@intel.com
Currently these two checks in ethnl_set_rings are added after rtnl_lock()
which will do useless works if the request is invalid.
So this patch moves these checks before the rtnl_lock() to avoid these
costs.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently tx push is a standard driver feature which controls use of a fast
path descriptor push. So this patch extends the ringparam APIs and data
structures to support set/get tx push by ethtool -G/g.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Given a sufficiently large number of actions, while copying and
reserving memory for a new action of a new flow, if next_offset is
greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
bytes increasing actions_len by req_size. This can then lead to an OOB
write access, especially when further actions need to be copied.
Fix it by rearranging the flow action size check.
KASAN splat below:
==================================================================
BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
Write of size 65360 at addr ffff888147e4001c by task handler15/836
CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
...
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x5a
print_report.cold+0x5e/0x5db
? __lock_text_start+0x8/0x8
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_report+0xb5/0x130
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_check_range+0xf5/0x1d0
memcpy+0x39/0x60
reserve_sfa_size+0x1ba/0x380 [openvswitch]
__add_action+0x24/0x120 [openvswitch]
ovs_nla_add_action+0xe/0x20 [openvswitch]
ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
? __kernel_text_address+0xe/0x30
? unwind_get_return_address+0x56/0xa0
? create_prof_cpu_mask+0x20/0x20
? ovs_ct_verify+0xf0/0xf0 [openvswitch]
? prep_compound_page+0x198/0x2a0
? __kasan_check_byte+0x10/0x40
? kasan_unpoison+0x40/0x70
? ksize+0x44/0x60
? reserve_sfa_size+0x75/0x380 [openvswitch]
__ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
? __zone_watermark_ok+0x420/0x420
? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
? __alloc_pages+0x1a9/0x3e0
? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
? unwind_next_frame+0x991/0x1e40
? __mod_node_page_state+0x99/0x120
? __mod_lruvec_page_state+0x2e3/0x470
? __kasan_kmalloc_large+0x90/0xe0
ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
...
Cc: stable@vger.kernel.org
Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
[ 82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[ 82.607380] Call Trace:
[ 82.609389] <TASK>
[ 82.611136] skb_push.cold.109+0x10/0x10
[ 82.614289] __gre6_xmit+0x41e/0x590
[ 82.617169] ip6gre_tunnel_xmit+0x344/0x3f0
[ 82.620526] dev_hard_start_xmit+0xf1/0x330
[ 82.623882] sch_direct_xmit+0xe4/0x250
[ 82.626961] __dev_queue_xmit+0x720/0xfe0
<...>
[ 82.633431] packet_sendmsg+0x96a/0x1cb0
[ 82.636568] sock_sendmsg+0x30/0x40
<...>
The following sequence of events caused the BUG:
1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
"tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
room, triggering the BUG.
As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.
Reproducer:
OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
ip netns add at_ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns at_ns0
ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip link set dev veth1 up mtu 1500
ip addr add dev veth1 172.16.1.200/24
ip netns exec at_ns0 ip addr add ::11/96 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip addr add dev veth1 ::22/96
ip link set dev veth1 up
ip netns exec at_ns0 \
ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
local ::11 remote ::22
ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
ip netns exec at_ns0 ip link set dev ip6gretap00 up
ip link add dev ip6gretap11 type ip6gretap external
ip addr add dev ip6gretap11 10.1.1.200/24
ip addr add dev ip6gretap11 fc80::200/24
ip link set dev ip6gretap11 up
tc qdisc add dev ip6gretap11 clsact
tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
ping6 -c 3 -w 10 -q ::11
Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not update tunnel->tun_hlen in data plane code. Use a local variable
instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2022-04-14
1) Fix the output interface for VRF cases in xfrm_dst_lookup.
From David Ahern.
2) Fix write out of bounds by doing COW on esp output when the
packet size is larger than a page.
From Sabrina Dubroca.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
packet_sock xmit could be dev_queue_xmit, which also returns negative
errors. So only checking positive errors is not enough, or userspace
sendmsg may return success while packet is not send out.
Move the net_xmit_errno() assignment in the braces as checkpatch.pl said
do not use assignment in if condition.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown
and fallback"), for a fallback connection, __smc_release() does not call
sock_put() if its state is already SMC_CLOSED.
When calling smc_shutdown() after falling back, its state is set to
SMC_CLOSED but does not call sock_put(), so this patch calls it.
Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent patch[1] from Eric Dumazet flipped the order in which the
keepalive timer and the keepalive worker were cancelled in order to fix a
syzbot reported issue[2]. Unfortunately, this enables the mirror image bug
whereby the timer races with rxrpc_exit_net(), restarting the worker after
it has been cancelled:
CPU 1 CPU 2
=============== =====================
if (rxnet->live)
<INTERRUPT>
rxnet->live = false;
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_queue_work(&rxnet->peer_keepalive_work);
del_timer_sync(&rxnet->peer_keepalive_timer);
Fix this by restoring the removed del_timer_sync() so that we try to remove
the timer twice. If the timer runs again, it should see ->live == false
and not restart the worker.
Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce page_pool APIs to report stats through ethtool and reduce
duplicated code in each driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter.
Current release - regressions:
- smc: fix af_ops of child socket pointing to released memory
- wifi: ath9k: fix usage of driver-private space in tx_info
Previous releases - regressions:
- ipv6: fix panic when forwarding a pkt with no in6 dev
- sctp: use the correct skb for security_sctp_assoc_request
- smc: fix NULL pointer dereference in smc_pnet_find_ib()
- sched: fix initialization order when updating chain 0 head
- phy: don't defer probe forever if PHY IRQ provider is missing
- dsa: revert "net: dsa: setup master before ports"
- dsa: felix: fix tagging protocol changes with multiple CPU ports
- eth: ice:
- fix use-after-free when freeing @rx_cpu_rmap
- revert "iavf: fix deadlock occurrence during resetting VF interface"
- eth: lan966x: stop processing the MAC entry is port is wrong
Previous releases - always broken:
- sched
- flower: fix parsing of ethertype following VLAN header
- taprio: check if socket flags are valid
- nfc: add flush_workqueue to prevent uaf
- veth: ensure eth header is in skb's linear part
- eth: stmmac: fix altr_tse_pcs function when using a fixed-link
- eth: macb: restart tx only if queue pointer is lagging
- eth: macvlan: fix leaking skb in source mode with nodst option
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmJX6CoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkMRIP/0CKmGetL0i1WQ0LD8regv+E6NizyXDB
+YCchHMMgYJ/aIRVps9GSWqD6ncU2gCSW27aCxHsN+Esw/HytrmsaHfS1SWRgIfb
6hn1CB3/Ojd1eXZOdXgVrlayhJKj//c0KdoHQoY+sQaLKNvULx5VfTq4y1yjyXy2
upfwW4JaQaUlkaTbdhjRhq5TuRY2JCwscc7EmJbwrqmqWG7nSEngrLU0XLHs8tWr
X/gQbI/wDepf/KidE59rLV7yMYlovCkoZBVUrN4oLZRqxUIBtU0a8nZNX5NFWiVD
KwTahrUh8eOizzSMEzjO0HpuGBWQFrmyC0eOb8KwixMJrRfDtqAWJKwHfxZtzHr/
JpUHlgvCN9iQoPYY0LZ8uU2CLFJM+p4hn1sOt/swoo23iAvCiJWWbRoAvsvwLbZ3
4pAsof8w6SRb34J/6Lcu78LECD/y62TJ27Ay82vdjrYbz1g8wb1wGlAMAbJ7K7sE
NQ4iB6wzd6UVF3Qm4qo7kRJT5TTY4TxZMQUKl1gj/OEV9hrJ8zF1MjRhxNWRC2R9
+A8Rw3weL8zYKoezCEEbogaDp+h7IdeNlbZE8r19p2CMiJ/Y+NQd//p4rSJ0Oqcw
1tOtg5x7LVVMiSz38uHk0HbiKJ9UbHLJJZ35N5oi/IDPg+LWFfar23I4BfuII2kB
Wo5Ii/0DMozb
=Ha9M
-----END PGP SIGNATURE-----
Merge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter.
Current release - regressions:
- smc: fix af_ops of child socket pointing to released memory
- wifi: ath9k: fix usage of driver-private space in tx_info
Previous releases - regressions:
- ipv6: fix panic when forwarding a pkt with no in6 dev
- sctp: use the correct skb for security_sctp_assoc_request
- smc: fix NULL pointer dereference in smc_pnet_find_ib()
- sched: fix initialization order when updating chain 0 head
- phy: don't defer probe forever if PHY IRQ provider is missing
- dsa: revert "net: dsa: setup master before ports"
- dsa: felix: fix tagging protocol changes with multiple CPU ports
- eth: ice:
- fix use-after-free when freeing @rx_cpu_rmap
- revert "iavf: fix deadlock occurrence during resetting VF
interface"
- eth: lan966x: stop processing the MAC entry is port is wrong
Previous releases - always broken:
- sched:
- flower: fix parsing of ethertype following VLAN header
- taprio: check if socket flags are valid
- nfc: add flush_workqueue to prevent uaf
- veth: ensure eth header is in skb's linear part
- eth: stmmac: fix altr_tse_pcs function when using a fixed-link
- eth: macb: restart tx only if queue pointer is lagging
- eth: macvlan: fix leaking skb in source mode with nodst option"
* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
net: dsa: felix: fix tagging protocol changes with multiple CPU ports
tun: annotate access to queue->trans_start
nfc: nci: add flush_workqueue to prevent uaf
net: dsa: realtek: don't parse compatible string for RTL8366S
net: dsa: realtek: fix Kconfig to assure consistent driver linkage
net: ftgmac100: access hardware register after clock ready
Revert "net: dsa: setup master before ports"
macvlan: Fix leaking skb in source mode with nodst option
netfilter: nf_tables: nft_parse_register can return a negative value
net: lan966x: Stop processing the MAC entry is port is wrong.
net: lan966x: Fix when a port's upper is changed.
net: lan966x: Fix IGMP snooping when frames have vlan tag
net: lan966x: Update lan966x_ptp_get_nominal_value
sctp: Initialize daddr on peeled off socket
net/smc: Fix af_ops of child socket pointing to released memory
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
net/smc: use memcpy instead of snprintf to avoid out of bounds read
net: macb: Restart tx only if queue pointer is lagging
...
When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
size of 0, which is supposed to be an indication that the corresponding
attribute should not be emitted. However, instead, the current code
reserves a 0-byte attribute.
The reason this does not show up as a citation on a kasan kernel is that
netdev_offload_xstats_get(), which is supposed to fill in the data, never
ends up getting called, because rtnl_offload_xstats_get_stats() notices
that the stats are not actually used and skips the call.
Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
response, confusing the userspace.
Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().
Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmJWb5kRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZvaZQgAlgXICEUZSq711XeKLMZ5Rv6z532ejcmT
hrgL8Dzru8NEx/+tdPwX/uhdUAzkwk0C7QwkbpzifE1nrQaqMwVFlKr2q0K43qeJ
xZoHCiMdShsClBlK9a/jZsa5/fnHFF9Pa0rNxCHGjQ46gRhkkwzf6fv722z/Zay9
MOKOv9+17Vrtpel2GlaMPsxRSQmzM4YSAbSfIz6HiUwt4VkuB2x9qxalHz7OfD27
fatWhwK1ZJui1tPRsB7dfrE8/0sg0t3Z2d7GarraFMCsaiCZRffYWz1OWy6P5ITo
8x4kgIewiggXBzpcsa8LQ9cleOE0gLZUdihJHgjWma3YX/tnns5Akg==
=hTYE
-----END PGP SIGNATURE-----
Merge tag 'wireless-2022-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v5.18
First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
kfree_skb_reason().
No new reasons are added.
Some paths are ignored, as they are not common, such as encapsulation
on non-final protocol.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
No new drop reasons are added.
Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
do. Will it be too general and hard to know what happened?
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in TLV encoded option header parsing with
kfree_skb_reason(). Following functions are involved:
ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()
Most skb drops during this process are regarded as 'InHdrErrors',
as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.
However, 'IP_INHDR' is a relatively general reason. Therefore, we
can use other reasons with higher priority in some cases. For example,
'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two call chains for ipv6_hop_jumbo(). The first one is:
ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()
On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.
The second call chain is:
ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()
And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.
Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to add the skb drop reasons support to icmpv6_param_prob(),
introduce the function icmpv6_param_prob_reason() and make
icmpv6_param_prob() an inline call to it. This new function will be
used in the following patches.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() which is used in ip6_forward() and ip_forward()
with kfree_skb_reason().
The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
the case that the length of the packet exceeds MTU and can't
fragment.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason().
No new reason is added.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eventually, I find out the handler function for inputting route lookup
fail: ip_error().
The drop reasons we used in ip_error() are almost corresponding to
IPSTATS_MIB_*, and following new reasons are introduced:
SKB_DROP_REASON_IP_INADDRERRORS
SKB_DROP_REASON_IP_INNOROUTES
Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and
SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding
to IPSTATS_MIB_*, we keep their name still.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fdb flush filtering based on destination ifindex and
vlan id. The ifindex must either match a port's device ifindex or the
bridge's. The vlan support is trivial since it's already validated by
rtnl_fdb_del, we just need to fill it in.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fdb flush filtering based on ndm flags and state. NDM
state and flags are mapped to bridge-specific flags and matched
according to the specified masks. NTF_USE is used to represent
added_by_user flag since it sets it on fdb add and we don't have a 1:1
mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
ignored.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ndm flags/state masks which will be used for bulk delete filtering.
All of these are used by the bridge and vxlan drivers. Also minimal attr
policy validation is added, it is up to ndo_fdb_del_bulk implementers to
further validate them.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to specify exactly which fdbs to be flushed. They are
described by a new structure - net_bridge_fdb_flush_desc. Currently it
can match on port/bridge ifindex, vlan id and fdb flags. It is used to
describe the existing dynamic fdb flush operation. Note that this flush
operation doesn't treat permanent entries in a special way (fdb_delete vs
fdb_delete_local), it will delete them regardless if any port is using
them, so currently it can't directly replace deletes which need to handle
that case, although we can extend it later for that too.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a minimal ndo_fdb_del_bulk implementation which flushes all entries.
Support for more fine-grained filtering will be added in the following
patches.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NLM_F_BULK is specified in a fdb del message we need to handle it
differently. First since this is a new call we can strictly validate the
passed attributes, at first only ifindex and vlan are allowed as these
will be the initially supported filter attributes, any other attribute
is rejected. The mac address is no longer mandatory, but we use it
to error out in older kernels because it cannot be specified with bulk
request (the attribute is not allowed) and then we have to dispatch
the call to ndo_fdb_del_bulk if the device supports it. The del bulk
callback can do further validation of the attributes if necessary.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to
verify that the delete operation allows bulk object deletion. Also emit
a warning if anyone tries to set it for non-delete kind.
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper which extracts the msg type's kind using the kind mask (0x3).
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnl kind names instead of using raw values. We'll need to
check for DEL kind later to validate bulk flag support.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae.
dsa_slave_change_mtu() updates the MTU of the DSA master and of the
associated CPU port, but only if it detects a change to the master MTU.
The blamed commit in the Fixes: tag below addressed a regression where
dsa_slave_change_mtu() would return early and not do anything due to
ds->ops->port_change_mtu() not being implemented.
However, that commit also had the effect that the master MTU got set up
to the correct value by dsa_master_setup(), but the associated CPU port's
MTU did not get updated. This causes breakage for drivers that rely on
the ->port_change_mtu() DSA call to account for the tagging overhead on
the CPU port, and don't set up the initial MTU during the setup phase.
Things actually worked before because they were in a fragile equilibrium
where dsa_slave_change_mtu() was called before dsa_master_setup() was.
So dsa_slave_change_mtu() could actually detect a change and update the
CPU port MTU too.
Restore the code to the way things used to work by reverting the reorder
of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did
not have a concrete motivation going for it anyway, it just looked
better.
Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Address the following coccicheck warning:
net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap()
by using swap() for the swapping of variable values and drop
the tmp (`addr`) variable that is not needed any more.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS 1.3 and ChaChaPoly don't carry IV in the packet.
The code before this change would copy out iv_size
worth of whatever followed the TLS header in the packet
and then for TLS 1.3 | ChaCha overwrite that with
the sequence number. Waste of cycles especially
with TLS 1.2 being close to dead and TLS 1.3 being
the common case.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IVs are 8 or 16 bytes, no point reading out the exact value
for quantities this small.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagating EINPROGRESS thru multiple layers of functions is
error prone. Use darg->async as an in/out argument, like we
use darg->zc today. On input it tells the code if async is
allowed, on output if it took place.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
async crypto handler will report the socket error no need
to report it again. We can, however, let the data we already
copied be reported to user space but we need to make sure
the error will be reported next time around.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
process_rx_list() only fails if it can't copy data to user
space. There is no point recording the error onto sk->sk_err
or giving up on the data which was read partially. Treat
the return value like a normal socket partial read.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If crypto didn't always invoke our callback for async
we'd not be clearing skb->sk and would crash in the
skb core when freeing it. This if must be dead code.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Async crypto never worked with TLS 1.3 and was explicitly disabled in
commit 8497ded2d16c ("net/tls: Disable async decrytion for tls1.3").
There's no need for us to handle TLS 1.3 padding in the async cb.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>