IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Christian Theune says:
I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script,
leaving me with a non-functional uplink on a remote router.
A 'rt' curve cannot be used as a inner curve (parent class), but we were
allowing such configurations since the qdisc was introduced. Such
configurations would trigger a UAF as Budimir explains:
The parent will have vttree_insert() called on it in init_vf(),
but will not have vttree_remove() called on it in update_vf()
because it does not have the HFSC_FSC flag set.
The qdisc always assumes that inner classes have the HFSC_FSC flag set.
This is by design as it doesn't make sense 'qdisc wise' for an 'rt'
curve to be an inner curve.
Budimir's original patch disallows users to add classes with a 'rt'
parent, but this is too strict as it breaks users that have been using
'rt' as a inner class. Another approach, taken by this patch, is to
upgrade the inner 'rt' into a 'sc', warning the user in the process.
It avoids the UAF reported by Budimir while also being more permissive
to bad scripts/users/code using 'rt' as a inner class.
Users checking the `tc class ls [...]` or `tc class get [...]` dumps would
observe the curve change and are potentially breaking with this change.
v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/
- Correct 'Fixes' tag and merge with revert (Jakub)
Cc: Christian Theune <ct@flyingcircus.io>
Cc: Budimir Markovic <markovicbudimir@gmail.com>
Fixes: b3d26c5702 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mii_bus API conversion to read_c45() and write_c45() did not cover
the mdio-mux driver before read() and write() were made C22-only.
This broke arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso.
The -EOPNOTSUPP from mdiobus_c45_read() is transformed by
get_phy_c45_devs_in_pkg() into -EIO, is further propagated to
of_mdiobus_register() and this makes the mdio-mux driver fail to probe
the entire child buses, not just the PHYs that cause access errors.
Fix the regression by introducing special c45 read and write accessors
to mdio-mux which forward the operation to the parent MDIO bus.
Fixes: db1a63aed8 ("net: phy: Remove fallback to old C45 method")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20231017143144.3212657-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In commit 75eefc6c59 ("tcp: tsq: add a shortcut in tcp_small_queue_check()")
we allowed to send an skb regardless of TSQ limits being hit if rtx queue
was empty or had a single skb, in order to better fill the pipe
when/if TX completions were slow.
Then later, commit 75c119afe1 ("tcp: implement rb-tree based
retransmit queue") accidentally removed the special case for
one skb in rtx queue.
Stefan Wahren reported a regression in single TCP flow throughput
using a 100Mbit fec link, starting from commit 65466904b0 ("tcp: adjust
TSO packet sizes based on min_rtt"). This last commit only made the
regression more visible, because it locked the TCP flow on a particular
behavior where TSQ prevented two skbs being pushed downstream,
adding silences on the wire between each TSO packet.
Many thanks to Stefan for his invaluable help !
Fixes: 75c119afe1 ("tcp: implement rb-tree based retransmit queue")
Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/
Reported-by: Stefan Wahren <wahrenst@gmx.net>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sometimes Tx is completed immediately after doorbell is updated, which
causes Tx completion routing to update completion bytes before the
same packet bytes are updated in sent bytes in transmit function, hence
hitting BUG_ON() in dql_completed(). To avoid this, update BQL
sent bytes before ringing doorbell.
Fixes: 37d79d0596 ("octeon_ep: add Tx/Rx processing and interrupt support")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231017105030.2310966-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_IPV6=n, and building with W=1:
In file included from include/trace/define_trace.h:102,
from include/trace/events/neigh.h:255,
from net/core/net-traces.c:51:
include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’:
include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
42 | struct in6_addr *pin6;
| ^~~~
include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
402 | { assign; } \
| ^~~~~~
include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
44 | PARAMS(assign), \
| ^~~~~~
include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
23 | TRACE_EVENT(neigh_create,
| ^~~~~~~~~~~
include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
41 | TP_fast_assign(
| ^~~~~~~~~~~~~~
In file included from include/trace/define_trace.h:103,
from include/trace/events/neigh.h:255,
from net/core/net-traces.c:51:
include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
42 | struct in6_addr *pin6;
| ^~~~
include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
51 | { assign; } \
| ^~~~~~
include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
44 | PARAMS(assign), \
| ^~~~~~
include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
23 | TRACE_EVENT(neigh_create,
| ^~~~~~~~~~~
include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
41 | TP_fast_assign(
| ^~~~~~~~~~~~~~
Indeed, the variable pin6 is declared and initialized unconditionally,
while it is only used and needlessly re-initialized when support for
IPv6 is enabled.
Fix this by dropping the unused variable initialization, and moving the
variable declaration inside the existing section protected by a check
for CONFIG_IPV6.
Fixes: fc651001d2 ("neighbor: Add tracepoint to __neigh_create")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Runtime power management support breaks Intel LTE modem where dmesg dump
showes timeout errors:
```
[ 72.027442] iosm 0000:01:00.0: msg timeout
[ 72.531638] iosm 0000:01:00.0: msg timeout
[ 73.035414] iosm 0000:01:00.0: msg timeout
[ 73.540359] iosm 0000:01:00.0: msg timeout
```
Furthermore, when shutting down with `poweroff` and modem attached, the
system rebooted instead of powering down as expected. The modem works
again only after power cycling.
Revert runtime power management support for IOSM driver as introduced by
commit e4f5073d53 ("net: wwan: iosm: enable runtime pm support for
7560").
Fixes: e4f5073d53 ("net: wwan: iosm: enable runtime pm support for 7560")
Reported-by: Martin <mwolf@adiumentum.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217996
Link: https://lore.kernel.org/r/267abf02-4b60-4a2e-92cd-709e3da6f7d3@gmail.com/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device flags are displayed incorrectly:
1) The comparison (i == F_FLOW_SEQ) is always false, because F_FLOW_SEQ
is equal to (1 << FLOW_SEQ_SHIFT) == 2048, and the maximum value
of the 'i' variable is (NR_PKT_FLAG - 1) == 17. It should be compared
with FLOW_SEQ_SHIFT.
2) Similarly to the F_IPSEC flag.
3) Also add spaces to the print end of the string literal "spi:%u"
to prevent the output from merging with the flag that follows.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 99c6d3d20d ("pktgen: Remove brute-force printing of flags")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmUuRZ4ACgkQrB3Eaf9P
W7e7eg//fgQux/sJoq2Qf7T8cvVt3NSpOLXB43NRsIb9nUyMNN/cYTtypYM/+0FM
f0zxmwtUkj9Mx/IL2nNzK/h8lMtJeKfy14tt8lAX2ux5oJltCcTiLgdp3C4HWxOh
9DrXMGIryr0apedkcStSzhRoJBd8giFViAQZE8NwO5EKG8WLJNVvw0YzlNgM0WOk
5y/sN1uXeUmUCYDkOGcdt6yIyV+GIo/nEg/XOY8LkaQDzKveDypydj0dPGL/Dj8U
MhczTCf1WdbTHh0dmITIdX/yZ8/bPNfV3EzAtAaYgceVh1DSq8/F9buRXz2oxxvh
r8Igv+640+SoWEtIsVoXHx7KIZ7LckasrJv1IoKRXGVV25LjnR+bxGVQNyUjdd6+
Cb11Q/m66fp/k7B+++b6eFVlKvr9O4jBogb3kfGpqMiwm6LNM+CJicdAfM8/GEgM
ejnVNSEaChhdgGvrXVbKhI2ACatwbPrt6d4PN0d9fpUbZJnuBZl4QHElLNSBznjO
xJUF8LvPjFGx6vj3YFnBg5f3uNH35nR2jQxPsx+yWdBEFqbxBt7yKr/pftwP1gxD
ZLHF8JbOT4J+96ClDRFnlWFdY+Phq0fg5S5WhPhNGQ4rj9rsjAS8ZMcHWZeA7iZ2
0w5P/DlFmmnw6CC+Xr+wQJlgB1tZ2sdC8nAK6gxQJ3eFVv4wnhk=
=yHtO
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2023-10-17
1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert.
From Dong Chenchen.
2) Fix data-races in the xfrm interfaces dev->stats fields.
From Eric Dumazet.
3) Fix a data-race in xfrm_gen_index.
From Eric Dumazet.
4) Fix an inet6_dev refcount underflow.
From Zhang Changzhong.
5) Check the return value of pskb_trim in esp_remove_trailer
for esp4 and esp6. From Ma Ke.
6) Fix a data-race in xfrm_lookup_with_ifid.
From Eric Dumazet.
* tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix a data-race in xfrm_lookup_with_ifid()
net: ipv4: fix return value check in esp_remove_trailer
net: ipv6: fix return value check in esp_remove_trailer
xfrm6: fix inet6_dev refcount underflow problem
xfrm: fix a data-race in xfrm_gen_index()
xfrm: interface: use DEV_STATS_INC()
net: xfrm: skip policies marked as dead while reinserting policies
====================
Link: https://lore.kernel.org/r/20231017083723.1364940-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We discovered from packet traces of slow loss recovery on kernels with
the default HZ=250 setting (and min_rtt < 1ms) that after reordering,
when receiving a SACKed sequence range, the RACK reordering timer was
firing after about 16ms rather than the desired value of roughly
min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer
calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels
with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the
exact same issue.
This commit fixes the TLP transmit timer and RACK reordering timer
floor calculation to more closely match the intended 2ms floor even on
kernels with HZ=250. It does this by adding in a new
TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies,
instead of the current approach of converting to jiffies and then
adding th TCP_TIMEOUT_MIN value of 2 jiffies.
Our testing has verified that on kernels with HZ=1000, as expected,
this does not produce significant changes in behavior, but on kernels
with the default HZ=250 the latency improvement can be large. For
example, our tests show that for HZ=250 kernels at low RTTs this fix
roughly halves the latency for the RACK reorder timer: instead of
mostly firing at 16ms it mostly fires at 8ms.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: bb4d991a28 ("tcp: adjust tail loss probe timeout")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The prefill function should have only removed the page count bias it
added. Fully freeing the page will cause gve_free_queue_page_list to
free a page the driver no longer owns.
Fixes: 82fd151d38 ("gve: Reduce alloc and copy costs in the GQ rx path")
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20231014014121.2843922-1-shailend@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
- Fix race when opening vhci device
- Avoid memcmp() out of bounds warning
- Correctly bounds check and pad HCI_MON_NEW_INDEX name
- Fix using memcmp when comparing keys
- Ignore error return for hci_devcd_register() in btrtl
- Always check if connection is alive before deleting
- Fix a refcnt underflow problem for hci_conn
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmUqBjIZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeVmD/95NHJd61l0JpQw9yYNjDc+
8C2tAa4rDrbooqfhgS3PPy4SQ9X07KO9VRKaQVEjnVahnfIsbE++sptBWjMSQz94
8cRde2NJs2uUnl0Eovv9hcE/dgscXu5Lgt/fCCOyY6YwbsyFW5FmYcNUSzOLu83i
6KuPigt2n/bFC7pnB/VROmO3WusNVFX1J8unZa/MvMK5wn6DQkJeZgJlFlgJSfeg
Tou4SwI8ormUsDAjV3+dVl7vZVNpyX4mv5cR6RlwHEtvcnkFIj6+ZE96M1BwOBrm
zbc1OyzrXt5jXuSORlOQEAfjZUab+ygrv1tCgpVkQf+vnIiREsN/57NS6J/kmbc9
tbOqfJwYokRwjKtXsQ2bc0Jm5kvA0j/9Mrep4E3UZpTv2LwjGTaN/wXGNu1qW8+d
LTgzqwmgx/PwqC4oiAFrT7guPDwoHzFqSwnvdhkdr04dH19zk7IjIm8ZrTruPDDk
3wiMOijCtx31RBlRPmfb2njostkS+ysrW1bw3vQmfs04djvcgar0MqKdauXIuBvE
QMB+dyvPwCFQ8OYHQwfuwYfdHSgG7v+f35IEeflB4OISbZI4aFWkQnjvcloFpnmP
m/vo+y86ORlxJiHlJypgwqxZ42nAxpGUCwxxhiMDeiqp4GzxUYcwK5RteOcDkDTb
/ukSEQz/suO3Fya2kTz4mw==
=SY+z
-----END PGP SIGNATURE-----
Merge tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix race when opening vhci device
- Avoid memcmp() out of bounds warning
- Correctly bounds check and pad HCI_MON_NEW_INDEX name
- Fix using memcmp when comparing keys
- Ignore error return for hci_devcd_register() in btrtl
- Always check if connection is alive before deleting
- Fix a refcnt underflow problem for hci_conn
* tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name
Bluetooth: avoid memcmp() out of bounds warning
Bluetooth: hci_sock: fix slab oob read in create_monitor_event
Bluetooth: btrtl: Ignore error return for hci_devcd_register()
Bluetooth: hci_event: Fix coding style
Bluetooth: hci_event: Fix using memcmp when comparing keys
Bluetooth: Fix a refcnt underflow problem for hci_conn
Bluetooth: hci_sync: always check if connection is alive before deleting
Bluetooth: Reject connection with the device which has same BD_ADDR
Bluetooth: hci_event: Ignore NULL link key
Bluetooth: ISO: Fix invalid context error
Bluetooth: vhci: Fix race when opening vhci device
====================
Link: https://lore.kernel.org/r/20231014031336.1664558-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the smc_listen_work(), if smc_listen_prfx_check() failed,
the real reason: SMC_CLC_DECL_DIFFPREFIX was dropped, and
SMC_CLC_DECL_NOSMCDEV was returned.
Althrough this is also kind of SMC_CLC_DECL_NOSMCDEV, but return
the real reason is much friendly for debugging.
Fixes: e49300a6bf ("net/smc: add listen processing for SMC-Rv2")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20231012123729.29307-1-dust.li@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From: Aaron Conole <aconole@redhat.com>
To: netdev@vger.kernel.org
Cc: dev@openvswitch.org, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org, Pravin B Shelar <pshelar@ovn.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Adrian Moreno <amorenoz@redhat.com>,
Eelco Chaudron <echaudro@redhat.com>,
shuah@kernel.org
Subject: [PATCH net v2 0/4] selftests: openvswitch: Minor fixes for some systems
Date: Wed, 11 Oct 2023 15:49:35 -0400 [thread overview]
Message-ID: <20231011194939.704565-1-aconole@redhat.com> (raw)
A number of corner cases were caught when trying to run the selftests on
older systems. Missed skip conditions, some error cases, and outdated
python setups would all report failures but the issue would actually be
related to some other condition rather than the selftest suite.
Address these individual cases.
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ct_tuple v4 data structure decode / encode routines were using
the v6 IP address decode and relying on default encode. This could
cause exceptions during encode / decode depending on how a ct4
tuple would appear in a netlink message.
Caught during code review.
Fixes: e52b07aa1a ("selftests: openvswitch: add flow dump support")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernels that don't have support for openvswitch drop reasons also
won't have the drop counter reasons, so we should skip the test
completely. It previously wasn't possible to build a test case
for this without polluting the datapath, so we introduce a mechanism
to clear all the flows from a datapath allowing us to test for
explicit drop actions, and then clear the flows to build the
original test case.
Fixes: 4242029164 ("selftests: openvswitch: add explicit drop testcase")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of fatal signal, or early abort at least cleanup the current
test case.
Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni reports that on some systems the pyroute2 version isn't
new enough to run the test suite. Ensure that we support a minimum
version of 0.6 for all cases (which does include the existing ones).
The 0.6.1 version was released in May of 2021, so should be
propagated to most installations at this point.
The alternative that Paolo proposed was to only skip when the
add-flow is being run. This would be okay for most cases, except
if a future test case is added that needs to do flow dump without
an associated add (just guessing). In that case, it could also be
broken and we would need additional skip logic anyway. Just draw
a line in the sand now.
Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://lore.kernel.org/lkml/8470c431e0930d2ea204a9363a60937289b7fdbe.camel@redhat.com/
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot reported two new paths to hit an internal WARNING using the
new virtio gso type VIRTIO_NET_HDR_GSO_UDP_L4.
RIP: 0010:skb_checksum_help+0x4a2/0x600 net/core/dev.c:3260
skb len=64521 gso_size=344
and
RIP: 0010:skb_warn_bad_offload+0x118/0x240 net/core/dev.c:3262
Older virtio types have historically had loose restrictions, leading
to many entirely impractical fuzzer generated packets causing
problems deep in the kernel stack. Ideally, we would have had strict
validation for all types from the start.
New virtio types can have tighter validation. Limit UDP GSO packets
inserted via virtio to the same limits imposed by the UDP_SEGMENT
socket interface:
1. must use checksum offload
2. checksum offload matches UDP header
3. no more segments than UDP_MAX_SEGMENTS
4. UDP GSO does not take modifier flags, notably SKB_GSO_TCP_ECN
Fixes: 860b7f27b8 ("linux/virtio_net.h: Support USO offload in vnet header.")
Reported-by: syzbot+01cdbc31e9c0ae9b33ac@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/0000000000005039270605eb0b7f@google.com/
Reported-by: syzbot+c99d835ff081ca30f986@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/0000000000005426680605eb0b9f@google.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver allocates the LL2 rx buffers from kmalloc()
area to construct the skb using slab_build_skb()
The required size allocation seems to have overlooked
for accounting both skb_shared_info size and device
placement padding bytes which results into the below
panic when doing skb_put() for a standard MTU sized frame.
skbuff: skb_over_panic: text:ffffffffc0b0225f len:1514 put:1514
head:ff3dabceaf39c000 data:ff3dabceaf39c042 tail:0x62c end:0x566
dev:<NULL>
…
skb_panic+0x48/0x4a
skb_put.cold+0x10/0x10
qed_ll2b_complete_rx_packet+0x14f/0x260 [qed]
qed_ll2_rxq_handle_completion.constprop.0+0x169/0x200 [qed]
qed_ll2_rxq_completion+0xba/0x320 [qed]
qed_int_sp_dpc+0x1a7/0x1e0 [qed]
This patch fixes this by accouting skb_shared_info and device
placement padding size bytes when allocating the buffers.
Cc: David S. Miller <davem@davemloft.net>
Fixes: 0a7fb11c23 ("qed: Add Light L2 support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code pattern of memcpy(dst, src, strlen(src)) is almost always
wrong. In this case it is wrong because it leaves memory uninitialized
if it is less than sizeof(ni->name), and overflows ni->name when longer.
Normally strtomem_pad() could be used here, but since ni->name is a
trailing array in struct hci_mon_new_index, compilers that don't support
-fstrict-flex-arrays=3 can't tell how large this array is via
__builtin_object_size(). Instead, open-code the helper and use sizeof()
since it will work correctly.
Additionally mark ni->name as __nonstring since it appears to not be a
%NUL terminated C string.
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Edward AD <twuufnxlz@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Fixes: 18f547f3fc ("Bluetooth: hci_sock: fix slab oob read in create_monitor_event")
Link: https://lore.kernel.org/lkml/202310110908.F2639D3276@keescook/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
bacmp() is a wrapper around memcpy(), which contain compile-time
checks for buffer overflow. Since the hci_conn_request_evt() also calls
bt_dev_dbg() with an implicit NULL pointer check, the compiler is now
aware of a case where 'hdev' is NULL and treats this as meaning that
zero bytes are available:
In file included from net/bluetooth/hci_event.c:32:
In function 'bacmp',
inlined from 'hci_conn_request_evt' at net/bluetooth/hci_event.c:3276:7:
include/net/bluetooth/bluetooth.h:364:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
364 | return memcmp(ba1, ba2, sizeof(bdaddr_t));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Add another NULL pointer check before the bacmp() to ensure the compiler
understands the code flow enough to not warn about it. Since the patch
that introduced the warning is marked for stable backports, this one
should also go that way to avoid introducing build regressions.
Fixes: 1ffc6f8cc3 ("Bluetooth: Reject connection with the device which has same BD_ADDR")
Cc: Kees Cook <keescook@chromium.org>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When accessing hdev->name, the actual string length should prevail
Reported-by: syzbot+c90849c50ed209d77689@syzkaller.appspotmail.com
Fixes: dcda165706 ("Bluetooth: hci_core: Fix build warnings")
Signed-off-by: Edward AD <twuufnxlz@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
If CONFIG_DEV_COREDUMP was not set, it would return -EOPNOTSUPP for
hci_devcd_register().
In this commit, ignore error return for hci_devcd_register().
Otherwise Bluetooth initialization will be failed.
Fixes: 044014ce85 ("Bluetooth: btrtl: Add Realtek devcoredump support")
Cc: stable@vger.kernel.org
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Closes: https://lore.kernel.org/all/ZRyqIn0_qqEFBPdy@debian.me/T/
Signed-off-by: Max Chou <max.chou@realtek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes the following code style problem:
ERROR: that open brace { should be on the previous line
+ if (!bacmp(&hdev->bdaddr, &ev->bdaddr))
+ {
Fixes: 1ffc6f8cc3 ("Bluetooth: Reject connection with the device which has same BD_ADDR")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
memcmp is not consider safe to use with cryptographic secrets:
'Do not use memcmp() to compare security critical data, such as
cryptographic secrets, because the required CPU time depends on the
number of equal bytes.'
While usage of memcmp for ZERO_KEY may not be considered a security
critical data, it can lead to more usage of memcmp with pairing keys
which could introduce more security problems.
Fixes: 455c2ff0a5 ("Bluetooth: Fix BR/EDR out-of-band pairing with only initiator data")
Fixes: 33155c4aae ("Bluetooth: hci_event: Ignore NULL link key")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmUoNpsACgkQSD+KveBX
+j4Dfgf/dz6LIjdqdjaMoa2sJVZVTAA+OthlzAvLIak2iXrYB8z1f5y83f20Jg12
Mlah4qvUkg0WyGcAYrbW7pQ0/Mec64KrQ8Zce3C0UoDFjAYEB19gCMC6adx18wqD
/jRl9JdOgnojCF5WccUd7yjvhVT1hrCKisk0eI8UDEji9V3d8qqdcgLA9whCdsij
sQfyWO2mV3EAy4QJxbscT5mkz+iOZfMf26drqFNsPIzIhQ9FB6zjZkN3ubDmxy5n
NtSRZpIJiP2H+CxDRiK7c8ZBZQW3G6R8A9+u5QVr3yp7y+PJAvOHby+y00FeOQwS
bFA7jnBc25DtxiCIM0E+2Z3/rBM3Uw==
=fmVP
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-10-12
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix VF representors reporting zero counters to "ip -s" command
net/mlx5e: Don't offload internal port if filter device is out device
net/mlx5e: Take RTNL lock before triggering netdev notifiers
net/mlx5e: XDP, Fix XDP_REDIRECT mpwqe page fragment leaks on shutdown
net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rq
net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq
net/mlx5: Handle fw tracer change ownership event based on MTRC
net/mlx5: Bridge, fix peer entry ageing in LAG mode
net/mlx5: E-switch, register event handler before arming the event
net/mlx5: Perform DMA operations in the right locations
====================
Link: https://lore.kernel.org/r/20231012195127.129585-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller says:
====================
Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice)
This series contains fixes for the i40e and ice drivers.
Jesse adds handling to the ice driver which resetis the device when loading
on a crash kernel, preventing stale transactions from causing machine check
exceptions which could prevent capturing crash data.
Mateusz fixes a bug in the ice driver 'Safe mode' logic for handling the
device when the DDP is missing.
Michal fixes a crash when probing the i40e driver in the event that HW
registers are reporting invalid/unexpected values.
The following are changes since commit a950a5921d:
net/smc: Fix pos miscalculation in statistics
I'm covering for Tony Nguyen while he's out, and don't have access to create
a pull request branch on his net-queue, so these are sent via mail only.
====================
Link: https://lore.kernel.org/r/20231011233334.336092-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One thing is broken in the safe mode, that is
ice_deinit_features() is being executed even
that ice_init_features() was not causing stack
trace during pci_unregister_driver().
Add check on the top of the function.
Fixes: 5b246e533d ("ice: split probe into smaller functions")
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Link: https://lore.kernel.org/r/20231011233334.336092-4-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the system boots into the crash dump kernel after a panic, the ice
networking device may still have pending transactions that can cause errors
or machine checks when the device is re-enabled. This can prevent the crash
dump kernel from loading the driver or collecting the crash data.
To avoid this issue, perform a function level reset (FLR) on the ice device
via PCIe config space before enabling it on the crash kernel. This will
clear any outstanding transactions and stop all queues and interrupts.
Restore the config space after the FLR, otherwise it was found in testing
that the driver wouldn't load successfully.
The following sequence causes the original issue:
- Load the ice driver with modprobe ice
- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
- Trigger a crash with echo c > /proc/sysrq-trigger
- Load the ice driver again (or let it load automatically) with modprobe ice
- The system crashes again during pcim_enable_device()
Fixes: 837f08fdec ("ice: Add basic driver framework for Intel(R) E800 Series")
Reported-by: Vishal Agrawal <vagrawal@redhat.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The hardware provides the indexes of the first and the last available
queue and VF. From the indexes, the driver calculates the numbers of
queues and VFs. In theory, a faulty device might say the last index is
smaller than the first index. In that case, the driver's calculation
would underflow, it would attempt to write to non-existent registers
outside of the ioremapped range and crash.
I ran into this not by having a faulty device, but by an operator error.
I accidentally ran a QE test meant for i40e devices on an ice device.
The test used 'echo i40e > /sys/...ice PCI device.../driver_override',
bound the driver to the device and crashed in one of the wr32 calls in
i40e_clear_hw.
Add checks to prevent underflows in the calculations of num_queues and
num_vfs. With this fix, the wrong device probing reports errors and
returns a failure without crashing.
Fixes: 838d41d92a ("i40e: clear all queues and interrupts")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Link: https://lore.kernel.org/r/20231011233334.336092-2-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUnsR0NHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AJaDD/9WqsWX8KK+QaXMNdhuY2TYicRUrzAObeCOkPzU
8M0S0pZiSsfZO4Tzlo30krSoLERnBYsedngEIzLHHlQ/V+65H8h9jY1p1oGZFKIg
5ui9ZQbcvWLqKWTOaCneZuEjsGxodWvVokVubSAWw9XExuYqbfo8WOOKBZzAX4mt
J7KznGlShGHithvssWrk8n2FQr8+ZqzRrJjpoNfQcKU1a+GEIiQvIIfm4f3P6baj
yiuP2dXWsUXuDK63JmMUK2k3w1e/XwjPpCZ+uyoNxPCggu0X0u//MpM71CNpGAZ4
g9V5dn+p6WsLh1gQ4QvWGgJqXPm8xU1mRux5WLnp5dHVwm+8bgGGy0O8QC+s9H63
15pZmeaGcTd6zp3FxbnCWcWFbcW1jsYoscLrMyqjtgDmGIOBxQfia2oCyaN3Z37M
vjRYtNFQbU6TbKZc0tKlh79geUykZL+ibtVuBiHkniZkAMOQXqUBkh3K1yfJ1j76
w/GjIm+wFKn0KKij27mzebwTJNLbXdXWem5bzUg6lhC0X57Mcf3+L8VbNWPv220x
drsI9uzw+YPE7neVbmXQ8hHyJKelG5cF0xgfykLT+6DnadiCpdrlrxwiMCP/POQ5
XIPkjgyi1T3k/5Sv4kab7cZxbq1A14URJHASidT/JN2NCYlMV1A1CaeE1vhrZ5Z4
BHgDRw==
=mZQY
-----END PGP SIGNATURE-----
Merge tag 'nf-23-10-12' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
Patch 1, from Pablo Neira Ayuso, fixes a performance regression
(since 6.4) when a large pending set update has to be canceled towards
the end of the transaction.
Patch 2 from myself, silences an incorrect compiler warning reported
with a few (older) compiler toolchains.
Patch 3, from Kees Cook, adds __counted_by annotation to
nft_pipapo set backend type. I took this for net instead of -next
given infra is already in place and no actual code change is made.
Patch 4, from Pablo Neira Ayso, disables timeout resets on
stateful element reset. The rest should only affect internal object
state, e.g. reset a quota or counter, but not affect a pending timeout.
Patches 5 and 6 fix NULL dereferences in 'inner header' match,
control plane doesn't test for netlink attribute presence before
accessing them. Broken since feature was added in 6.2, fixes from
Xingyuan Mo.
Last patch, from myself, fixes a bogus rule match when skb has
a 0-length mac header, in this case we'd fetch data from network
header instead of canceling rule evaluation. This is a day 0 bug,
present since nftables was merged in 3.13.
* tag 'nf-23-10-12' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_payload: fix wrong mac header matching
nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()
nf_tables: fix NULL pointer dereference in nft_inner_init()
netfilter: nf_tables: do not refresh timeout when resetting element
netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by
netfilter: nfnetlink_log: silence bogus compiler warning
netfilter: nf_tables: do not remove elements if set backend implements .abort
====================
Link: https://lore.kernel.org/r/20231012085724.15155-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ICSSG HW stats on TX side considers 8 preamble bytes as data bytes. Due
to this the tx_bytes of ICSSG interface doesn't match the rx_bytes of the
link partner. There is no public errata available yet.
As a workaround to fix this, decrease tx_bytes by 8 bytes for every tx
frame.
Fixes: c1e10d5dc7 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20231012064626.977466-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that the command values used for replies are correct. This is
only affecting generated userspace helpers, no change on kernel code.
Fixes: 7199c86247 ("netlink: specs: devlink: add commands that do per-instance dump")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231012115811.298129-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the netdevice is within a container and communicates externally
through network technologies such as VxLAN, we won't be able to find
routing information in the init_net namespace. To address this issue,
we need to add a struct net parameter to the smc_ib_find_route function.
This allow us to locate the routing information within the corresponding
net namespace, ensuring the correct completion of the SMC CLC interaction.
Fixes: e5c4744cfb ("net/smc: add SMC-Rv2 connection establishment")
Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20231011074851.95280-1-huangjie.albert@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As reported by Tom, .NET and applications build on top of it rely
on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP
socket.
The blamed commit below caused a regression, as such cancellation
can now fail.
As suggested by Eric, this change addresses the problem explicitly
causing blocking I/O operation to terminate immediately (with an error)
when a concurrent disconnect() is executed.
Instead of tracking the number of threads blocked on a given socket,
track the number of disconnect() issued on such socket. If such counter
changes after a blocking operation releasing and re-acquiring the socket
lock, error out the current operation.
Fixes: 4faeee0cf8 ("tcp: deny tcp_disconnect() when threads are waiting")
Reported-by: Tom Deseyn <tdeseyn@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the introduction of the ice driver the code has been
double-shifting the RSS enabling field, because the define already has
shifts in it and can't have the regular pattern of "a << shiftval &
mask" applied.
Most places in the code got it right, but one line was still wrong. Fix
this one location for easy backports to stable. An in-progress patch
fixes the defines to "standard" and will be applied as part of the
regular -next process sometime after this one.
Fixes: d76a60ba7a ("ice: Add support for VLANs and offloads")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231010203101.406248-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bcm_sf2_mdio_register(), the class_find_device() will call get_device()
to increment reference count for priv->master_mii_bus->dev if
of_mdio_find_bus() succeeds. If mdiobus_alloc() or mdiobus_register()
fails, it will call get_device() twice without decrement reference count
for the device. And it is the same if bcm_sf2_mdio_register() succeeds but
fails in bcm_sf2_sw_probe(), or if bcm_sf2_sw_probe() succeeds. If the
reference count has not decremented to zero, the dev related resource will
not be freed.
So remove the get_device() in bcm_sf2_mdio_register(), and call
put_device() if mdiobus_alloc() or mdiobus_register() fails and in
bcm_sf2_mdio_unregister() to solve the issue.
And as Simon suggested, unwind from errors for bcm_sf2_mdio_register() and
just return 0 if it succeeds to make it cleaner.
Fixes: 461cd1b03e ("net: dsa: bcm_sf2: Register our slave MDIO bus")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231011032419.2423290-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
selftests: fib_tests: Fixes for multipath list receive tests
Fix two issues in recently added FIB multipath list receive tests.
====================
Link: https://lore.kernel.org/r/20231010132113.3014691-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tests rely on the IPv{4,6} FIB trace points being triggered once for
each forwarded packet. If receive processing is deferred to the
ksoftirqd task these invocations will not be counted and the tests will
fail. Fix by specifying the '-a' flag to avoid perf from filtering on
the mausezahn task.
Before:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (.68) [FAIL]
# ./fib_tests.sh -t ipv6_mpath_list
IPv6 multipath list receive tests
TEST: Multipath route hit ratio (.27) [FAIL]
After:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (1.00) [ OK ]
# ./fib_tests.sh -t ipv6_mpath_list
IPv6 multipath list receive tests
TEST: Multipath route hit ratio (.99) [ OK ]
Fixes: 8ae9efb859 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test relies on the fib:fib_table_lookup trace point being triggered
once for each forwarded packet. If RP filter is not disabled, the trace
point will be triggered twice for each packet (for source validation and
forwarding), potentially masking actual bugs. Fix by explicitly
disabling RP filter.
Before:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (1.99) [ OK ]
After:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (.99) [ OK ]
Fixes: 8ae9efb859 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported a warning [0] introduced by commit c48ef9c4ae ("tcp: Fix
bind() regression for v4-mapped-v6 non-wildcard address.").
After the cited commit, a v4 socket's address matches the corresponding
v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa.
During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses
bhash and conflicts properly without checking bhash2 so that we need not
check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in
inet_bind2_bucket_match_addr(). However, the repro shows that we need
to check that in a no-conflict case.
The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls
listen() for the first socket:
from socket import *
s1 = socket()
s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
s1.bind(('127.0.0.1', 0))
s2 = socket(AF_INET6)
s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1]))
s1.listen()
The second socket should belong to the first socket's tb2, but the second
bind() creates another tb2 bucket because inet_bind2_bucket_find() returns
NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the
corresponding v4 address tb2.
bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X)
Then, listen() for the first socket calls inet_csk_get_port(), where the
v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered.
To avoid that, we need to check if v4-mapped-v6 sk address matches with
the corresponding v4 address tb2 in inet_bind2_bucket_match().
The same checks are needed in inet_bind2_bucket_addr_match() too, so we
can move all checks there and call it from inet_bind2_bucket_match().
Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr
and not of sockets in the bucket. This could be refactored later by
defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending
::ffff: when creating v4 tb2.
[0]:
WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Modules linked in:
CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31
RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000
RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38
RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200
R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100
FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256
__inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217
inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239
__sys_listen+0x194/0x270 net/socket.c:1866
__do_sys_listen net/socket.c:1875 [inline]
__se_sys_listen net/socket.c:1873 [inline]
__x64_sys_listen+0x53/0x80 net/socket.c:1873
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3a5bce3af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9
RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
</TASK>
Fixes: c48ef9c4ae ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.")
Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since 429e3d123d ("bonding: Fix extraction of ports from the packet
headers"), header offsets used to compute a hash in bond_xmit_hash() are
relative to skb->data and not skb->head. If the tail of the header buffer
of an skb really needs to be advanced and the operation is successful, the
pointer to the data must be returned (and not a pointer to the head of the
buffer).
Fixes: 429e3d123d ("bonding: Fix extraction of ports from the packet headers")
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous releases - regressions:
- af_packet: fix fortified memcpy() without flex array.
- tcp: fix crashes trying to free half-baked MTU probes
- xdp: fix zero-size allocation warning in xskq_create()
- can: sja1000: always restart the tx queue after an overrun
- eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp
- eth: nfp: avoid rmmod nfp crash issues
- eth: octeontx2-pf: fix page pool frag allocation warning
Previous releases - always broken:
- mctp: perform route lookups under a RCU read-side lock
- bpf: s390: fix clobbering the caller's backchain in the trampoline
- phy: lynx-28g: cancel the CDR check work item on the remove path
- dsa: qca8k: fix qca8k driver for Turris 1.x
- eth: ravb: fix use-after-free issue in ravb_tx_timeout_work()
- eth: ixgbe: fix crash with empty VF macvlan list
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUnw0USHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkN0EP/RKl317fLqlm6ZzRUMVP169CNRAgMaBG
7FIwxlCv4hfO2Rx09Mxu2wjDp+tBQKqBKaxfcwh8tEdLMqqCymOW2K5+tWVty8C8
TJJS+zggqLAo7DjXbnT8GBm5owHPLKGNxW6vRmnw9xraCD/nuV1wqolI2+l4IxB+
kqfliltepnJSakg0uXg7/uwAE87slBzX5VgB6K5JKLiiDMD8tYoAUmZzH8bMJd0l
Cl7+L+ucRfQkj0DPfuZM/FncM0el7oFB6imnKd36hD6vfDfCNxpyNBYG1yZ/61/N
7H3E595Hr9PA+YBZjja3UvQGbFXkyMHloQdYxmq4s0T2WHqKwRyjLlwPayMXvavn
OTJh2VAs68ivtti0ry5Nbgz4viiNfr32PLyZr6XySwCZ1/TCLjV4Cq9IYnaP3YeM
KA+CIl3d0asQdZuMXTBivmtF65Buawt9UX/gJzUst2mNdcqhV1RTNWDNWoFLQ0qW
gz8XN68V5LhbaaOq/Lat80krWgNLNZIlTNmSsE/Ie799w7dAHn/xvT6h+h5pF1XX
dhng9NK7RL7KVcI/9walArOnhz9ksGWc2+JPMQohuPM/ITMHW11oOUOX6NwAre5m
hBJKh+Rz7ylLDLn33C4qowUhxnJlqqm+rDCVDTmoYngEFQvhEl19mfndSsC8P/K/
xXQJ+diS/Jug
=orAS
-----END PGP SIGNATURE-----
Merge tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN and BPF.
We have a regression in TC currently under investigation, otherwise
the things that stand off most are probably the TCP and AF_PACKET
fixes, with both issues coming from 6.5.
Previous releases - regressions:
- af_packet: fix fortified memcpy() without flex array.
- tcp: fix crashes trying to free half-baked MTU probes
- xdp: fix zero-size allocation warning in xskq_create()
- can: sja1000: always restart the tx queue after an overrun
- eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp
- eth: nfp: avoid rmmod nfp crash issues
- eth: octeontx2-pf: fix page pool frag allocation warning
Previous releases - always broken:
- mctp: perform route lookups under a RCU read-side lock
- bpf: s390: fix clobbering the caller's backchain in the trampoline
- phy: lynx-28g: cancel the CDR check work item on the remove path
- dsa: qca8k: fix qca8k driver for Turris 1.x
- eth: ravb: fix use-after-free issue in ravb_tx_timeout_work()
- eth: ixgbe: fix crash with empty VF macvlan list"
* tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
rswitch: Fix imbalance phy_power_off() calling
rswitch: Fix renesas_eth_sw_remove() implementation
octeontx2-pf: Fix page pool frag allocation warning
nfc: nci: assert requested protocol is valid
af_packet: Fix fortified memcpy() without flex array.
net: tcp: fix crashes trying to free half-baked MTU probes
net/smc: Fix pos miscalculation in statistics
nfp: flower: avoid rmmod nfp crash issues
net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read
ethtool: Fix mod state of verbose no_mask bitset
net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()
mctp: perform route lookups under a RCU read-side lock
net: skbuff: fix kernel-doc typos
s390/bpf: Fix unwinding past the trampoline
s390/bpf: Fix clobbering the caller's backchain in the trampoline
net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp
net/smc: Fix dependency of SMC on ISM
ixgbe: fix crash with empty VF macvlan list
net/mlx5e: macsec: use update_pn flag instead of PN comparation
net: phy: mscc: macsec: reject PN update requests
...
AngeloGioacchino Del Regno is stepping in as co-maintainer for the
MediaTek SoC platform and starts by sending some dts fixes for
the mt8195 platform that had been pending for a while.
On the ixp4xx platform, Krzysztof Halasa steps down as co-maintainer,
reflecting that Linus Walleij has been handling this on his own
for the past few years.
Generic RISC-V kernels are now marked as incompatible with the
RZ/Five platform that requires custom hacks both for managing
its DMA bounce buffers and for addressing low virtual memory.
Finally, there is one bugfix for the AMDTEE firmware driver
to prevent a use-after-free bug.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmUn5QgACgkQYKtH/8kJ
UicWRw/+J+gYuPbjAO5A34KjcvE0/oHoX0CartiJLjGMSboXqjvlJOL2V37q9cTO
kt/all/wWYnyvr3L09jPKZY8J9stw6wgMpkPZpcAORkF/Vc8KNEvBBVVnTIZSlie
G6HSNW1S3qMPdt2mxjPWeO7aoKqq/lIuQoJDDAh3XQWYowy7++o6TreLs14UsGfv
+PRNm5dR+SGe5QC/vIJIn0U7bTD7PRQ7xEdv2LC+ANto+mbtdyVOKh16kcTnzO+2
NUHmBQvHqGS0Q1uN1hiXQocL9WA7vreVLk7ARbq/SLr1ccOsxJrxKj9LYPhoLq68
8oJCHR8RBAXxYInhiw2xR62KczTEVickNWlHR7aiWlQ+Bxha/YhpmUAzh/hrlvWg
edCBUSIxQW1CyLmbMxAqyHQn72F+sMM/LulhmftHuBcbF1YwNseAV67MKjoMSTr0
rjSiXpzdomCvgZxhJYujHLjugKh6jfLMRwPx+0P6qKebdm/y1a17kGtUf/NQ24bn
nDAeOAKWRRdEu4CjcoYkzVLgE6MlXUiSbSmpsPpDevge1qbcrfHgIATHech4oyDd
h2o8xIO37H4QB3s9w18g05OQRToRlBHPMxQhD+vlRy77Zd9BE7wZqKcwR9XjkyyX
+qPcNHVN0khxf+/NYiIE/Wn5Z57PL2vvgYoSp2L2Wi+UiYEZ0Ek=
=Ukoh
-----END PGP SIGNATURE-----
Merge tag 'soc-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"AngeloGioacchino Del Regno is stepping in as co-maintainer for the
MediaTek SoC platform and starts by sending some dts fixes for the
mt8195 platform that had been pending for a while.
On the ixp4xx platform, Krzysztof Halasa steps down as co-maintainer,
reflecting that Linus Walleij has been handling this on his own for
the past few years.
Generic RISC-V kernels are now marked as incompatible with the RZ/Five
platform that requires custom hacks both for managing its DMA bounce
buffers and for addressing low virtual memory.
Finally, there is one bugfix for the AMDTEE firmware driver to prevent
a use-after-free bug"
* tag 'soc-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
IXP4xx MAINTAINERS entries
arm64: dts: mediatek: mt8195: Set DSU PMU status to fail
arm64: dts: mediatek: fix t-phy unit name
arm64: dts: mediatek: mt8195-demo: update and reorder reserved memory regions
arm64: dts: mediatek: mt8195-demo: fix the memory size to 8GB
MAINTAINERS: Add Angelo as MediaTek SoC co-maintainer
soc: renesas: Make ARCH_R9A07G043 (riscv version) depend on NONPORTABLE
tee: amdtee: fix use-after-free vulnerability in amdtee_close_session