IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.
Fixes: 1eb8c695bd ("net/mlx4_en: Add accelerated RFS support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit made the fatally incorrect assumption that ports which
aren't in the FORWARDING STP state should not have packets forwarded
towards them, and that is all that needs to be done.
However, that logic alone permits BLOCKING ports to forward to
FORWARDING ports, which of course allows packet storms to occur when
there is an L2 loop.
The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
do for you", but "what can you do for the bridge". This way, only
FORWARDING ports forward to the other FORWARDING ports from the same
bridging domain, and we are still compatible with the idea of multiple
bridges.
Fixes: df291e54cc ("net: ocelot: support multiple bridges")
Suggested-by: Colin Foster <colin.foster@in-advantage.com>
Reported-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes multiple CLS_REPLACE calls are issued for the same connection.
rhashtable_insert_fast does not check for these duplicates, so multiple
hardware flow entries can be created.
Fix this by checking for an existing entry early
Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently autoloading for SPI devices does not use the DT ID table, it uses
SPI modalises. Supporting OF modalises is going to be difficult if not
impractical, an attempt was made but has been reverted, so ensure that
module autoloading works for this driver by adding the part name used in
the compatible to the list of SPI IDs.
Fixes: 96c8395e21 ("spi: Revert modalias changes")
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove myself as net/smc maintainer, as I am
leaving IBM soon and can not maintain net/smc anymore.
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller discovered memory leaks [1] that can be reduced to the
following commands:
# ip nexthop add id 1 blackhole
# devlink dev reload pci/0000:06:00.0
As part of the reload flow, mlxsw will unregister its netdevs and then
unregister from the nexthop notification chain. Before unregistering
from the notification chain, mlxsw will receive delete notifications for
nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
will not receive notifications for nexthops using netdevs that are not
dismantled as part of the reload flow. For example, the blackhole
nexthop above that internally uses the loopback netdev as its nexthop
device.
One way to fix this problem is to have listeners flush their nexthop
tables after unregistering from the notification chain. This is
error-prone as evident by this patch and also not symmetric with the
registration path where a listener receives a dump of all the existing
nexthops.
Therefore, fix this problem by replaying delete notifications for the
listener being unregistered. This is symmetric to the registration path
and also consistent with the netdev notification chain.
The above means that unregister_nexthop_notifier(), like
register_nexthop_notifier(), will have to take RTNL in order to iterate
over the existing nexthops and that any callers of the function cannot
hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
driver. To avoid a deadlock, change the latter to unregister its nexthop
listener without holding RTNL, making it symmetric to the registration
path.
[1]
unreferenced object 0xffff88806173d600 (size 512):
comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
hex dump (first 32 bytes):
41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa....
08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............
backtrace:
[<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
[<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline]
[<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline]
[<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
[<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline]
[<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
[<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
[<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
[<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
[<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
[<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
[<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline]
[<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
[<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
[<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
[<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
[<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
[<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
[<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
[<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline]
[<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline]
[<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
[<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
[<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
[<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline]
[<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline]
[<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499
Fixes: 2a014b200b ("mlxsw: spectrum_router: Add support for nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas explained in https://lore.kernel.org/r/87mtoeb4hb.ffs@tglx
that our handling of the hrtimer here is wrong: If the timer fires
late (e.g. due to vCPU scheduling, as reported by Dmitry/syzbot)
then it tries to actually rearm the timer at the next deadline,
which might be in the past already:
1 2 3 N N+1
| | | ... | |
^ intended to fire here (1)
^ next deadline here (2)
^ actually fired here
The next time it fires, it's later, but will still try to schedule
for the next deadline (now 3), etc. until it catches up with N,
but that might take a long time, causing stalls etc.
Now, all of this is simulation, so we just have to fix it, but
note that the behaviour is wrong even per spec, since there's no
value then in sending all those beacons unaligned - they should be
aligned to the TBTT (1, 2, 3, ... in the picture), and if we're a
bit (or a lot) late, then just resume at that point.
Therefore, change the code to use hrtimer_forward_now() which will
ensure that the next firing of the timer would be at N+1 (in the
picture), i.e. the next interval point after the current time.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com
Fixes: 01e59e467e ("mac80211_hwsim: hrtimer beacon")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210915112936.544f383472eb.I3f9712009027aa09244b65399bf18bf482a8c4f1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
WARNING: CPU: 1 PID: 9 at net/mac80211/sta_info.c:554
sta_info_insert_rcu+0x121/0x12a0
Modules linked in:
CPU: 1 PID: 9 Comm: kworker/u8:1 Not tainted 5.14.0-rc7+ #253
Workqueue: phy3 ieee80211_iface_work
RIP: 0010:sta_info_insert_rcu+0x121/0x12a0
...
Call Trace:
ieee80211_ibss_finish_sta+0xbc/0x170
ieee80211_ibss_work+0x13f/0x7d0
ieee80211_iface_work+0x37a/0x500
process_one_work+0x357/0x850
worker_thread+0x41/0x4d0
If an Ad-Hoc node receives packets with invalid source MAC address,
it hits a WARN_ON in sta_info_insert_check(), this can spam the log.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20210827144230.39944-1-yuehaibing@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the
end of skb_shinfo(head)->frag_list, and use it to bind other skb in
the end of this function. But when execute ieee80211_amsdu_aggregate()
->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of
skb_shinfo(head)->frag_list will be changed. However, the
ieee80211_amsdu_aggregate() not update frag_tail after call
pskb_expand_head(). That will cause the second skb can't bind to the
head skb appropriately.So we update the address of frag_tail to fix it.
Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com
[reword comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit d333322361 ("mac80211: do not use low data rates for
data frames with no ack flag").
Returning false early in rate_control_send_low breaks sending broadcast
packets, since rate control will not select a rate for it.
Before re-introducing a fixed version of this patch, we should probably also
make some changes to rate control to be more conservative in selecting rates
for no-ack packets and also prevent using probing rates on them, since we won't
get any feedback.
Fixes: d333322361 ("mac80211: do not use low data rates for data frames with no ack flag")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210906083559.9109-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit f8ade8dddb ("xsurf100: drop include of lib8390.c") accidentally
changed init/main.c. Revert that part.
Fixes: f8ade8dddb ("xsurf100: drop include of lib8390.c")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konrad's new job role is putting a serious cramp on him
being a responsive maintainer and as such he is handing off
the reins to Juergen, Roger, and Stefano.
Thank you!
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konrad's new job role is putting a serious cramp on him
being a responsive maintainer and as such he is handing off
the reins to Christoph Hellwig.
Thank you!
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ley Foon has left Intel and will no longer be able to maintain NIOS2.
Update the MAINTAINER's entry to Dinh Nguyen.
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As reported by Russell King the change to use OF style modaliases for DT
enumerated broke at least the spi-nor driver, the patch here reverts
that change to fix the regression. Sadly this will mean that anything
that started loading since the change to OF modaliases will run into
issues, there doesn't seem to be any approach which doesn't cause some
problems and thi seems like the least bad approach - gory details are in
the commit log for the change. I'm currently working through the SPI
drivers to add ID tables and missing IDs to tables which should address
things from the other end, this seems more straightforward and robust
than any other options.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmFLdIEACgkQJNaLcl1U
h9DRKgf/ZwDCBYGJuF25LRKxjGIOzKpkA+FGwBFvyQfvtp72HWc19c4UPnG5yQro
ZfzBNCX4nOFL7z1My2M0MRbPYJMQI5PGMu8di856y/+UHmS1PrQXRdq+b2F2UyXo
8qnhePU4sN+iP9RaQfCxBIistK0kNALvdWELlIVHBeMznVX3/touXNCzZKPSeW3/
uXK7Y9vinJ8jvSkCpc0XMAU1DcM0ldZzVYQ8v8bjblcHMHK/RYLpEbzklLWaFC+/
kp7oVUGTjcxT2IZ/t3IkfmOdUbnHLZhRlDkflmdesjS1adJR5KXj+fx+hFzkPZZd
ptwmajovs2OlM952KMXHa4Z1PcxOaA==
=dmjx
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi modalias fix from Mark Brown:
"Fix modalias issues
As reported by Russell King the change to use OF style modaliases for
DT enumerated broke at least the spi-nor driver, the patch here
reverts that change to fix the regression.
Sadly this will mean that anything that started loading since the
change to OF modaliases will run into issues, there doesn't seem to be
any approach which doesn't cause some problems and thi seems like the
least bad approach - gory details are in the commit log for the
change.
I'm currently working through the SPI drivers to add ID tables and
missing IDs to tables which should address things from the other end,
this seems more straightforward and robust than any other options"
* tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Revert modalias changes
- Fix crash in NLM TEST procedure
- NFSv4.1+ backchannel not restored after PATH_DOWN
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmFLUKcACgkQM2qzM29m
f5fCAhAAp66o6n49/fxOLWo+MftFlT1EY8NtFjyTh1x/o4R9S74qxTy3RC3GzRvk
oGOnkFvuiToyjcoeyb9yumYxO00Qf75hrTJvsXqnsbrLZOAKVuITn9MkQXBOXjCi
GDxQSRFg8ihz0vG4YbE/brnZR1fIMr7KSzXLwdXOs8mKvro7JmiiB87JOGhw9yon
W9+bFcnN2TynYsqmtHu987LvaIUE79dFfhrfj6bIobNQ25oqJoG5e1/M48/1MJol
DFPiWoErJ/S1c0lA8rbjIvtzgbXs84U88EXmFUVsxSXhepGui3Uh/cA49vu46icH
vze8fwHs6q3qzF7gE6jbslrrdQ/H6AZ6arhe27h4cVxdh0AouDuBat2xLY2I4TP3
DckfLbEsOqTJhfzqYnk+8ckOaBMpkfyDqG6SodIKglPoknNCtCp0/7NuYF0yMLe5
I6pO7JDgz7ySrbpm27ZMOpdwkLqqA1i8V9MPvimUsKTYJqlVBsc2RsdldQhunNbd
50InJarWQ+japkEl3WK3aJ5rTluiIWjcePT7wA76wP3PnZmcjweOiQMc8uuLlzPw
tOLRlHdpdZzeM3hGuI6KKsg8ZRbDB7L8YiaLkSwxl2qwJwDSB0xo7/WWwVzkyfdf
zdQ2cR9z70I2Bgxq/1lAPB8tXq+SEvu1qCYDFSTo3I9c0Y2frs4=
=L0c1
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Critical bug fixes:
- Fix crash in NLM TEST procedure
- NFSv4.1+ backchannel not restored after PATH_DOWN"
* tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
NLM: Fix svcxdr_encode_owner()
Highlights:
- amd-pmc fix for some suspend/resume issues
- intel-hid fix to avoid false-positive SW_TABLET_MODE=1 reporting
- some build error/warning fixes
- various DMI quirk additions
The following is an automated git shortlog grouped by driver:
amd-pmc:
- Increase the response register timeout
dell:
- fix DELL_WMI_PRIVACY dependencies & build error
gigabyte-wmi:
- add support for B550I Aorus Pro AX
lg-laptop:
- Correctly handle dmi_get_system_info() returning NULL
platform/x86/intel:
- hid: Add DMI switches allow list
- punit_ipc: Drop wrong use of ACPI_PTR()
touchscreen_dmi:
- Update info for the Chuwi Hi10 Plus (CWI527) tablet
- Add info for the Chuwi HiBook (CWI514) tablet
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmFJ5SwUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9zuPwf/ZGt6pXKBZLPQqZ0MpjUgBgUtjk70
ISEplVXx7A3zBJGRJCHBVVT7VYnx7rnRyq/jIukOxTHG6EfbQS88+oDXePSoNzNp
xkrY17xPBWjny1yYWcmkRMVYdyRBmF6jKEC3oLVW/ilot5+BYG4m9QNgQbSKmUnQ
eKGZ8O4r0COBeUJVmcPa4Gcd9LuLQMMEE2yE6Hi1la4fo2civ+QzSTdbLw28QhAd
QlD79pw744lhvBdXqzNZzyHAUhbbBxWIl79YJZnwScPN2EEqht3BIqehtd+uRm3l
OCvR7x14xmEfTm96rjrGwQFxEKPR3HWtwqJmb7W0ZeLJ4bsxdxYb3wHfug==
=5l9q
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"The first round of bug-fixes for platform-drivers-x86 for 5.15,
highlights:
- amd-pmc fix for some suspend/resume issues
- intel-hid fix to avoid false-positive SW_TABLET_MODE=1 reporting
- some build error/warning fixes
- various DMI quirk additions"
* tag 'platform-drivers-x86-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: gigabyte-wmi: add support for B550I Aorus Pro AX
platform/x86/intel: hid: Add DMI switches allow list
platform/x86: dell: fix DELL_WMI_PRIVACY dependencies & build error
platform/x86: amd-pmc: Increase the response register timeout
platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet
platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet
lg-laptop: Correctly handle dmi_get_system_info() returning NULL
platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
linux@prisktech.co.nz is defunct:
4.1.2 <linux@prisktech.co.nz>: Recipient address rejected: Domain not found
Remove it from MAINTAINERS and mark the ARM/VT8500 entry orphan.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.
Fixes: 64515dc899 ("qed: Add infrastructure for error detection and recovery")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: fixes 2021-09-21
This brings two fixes for deadlocks when a device is removed while it
has certain types of async work pending. And one additional fix for a
missing NULL check in an error case.
====================
Link: https://lore.kernel.org/r/20210921145217.1584654-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 0b9902c1fc ("s390/qeth: fix deadlock during recovery") removed
taking discipline_mutex inside qeth_do_reset(), fixing potential
deadlocks. An error path was missed though, that still takes
discipline_mutex and thus has the original deadlock potential.
Intermittent deadlocks were seen when a qeth channel path is configured
offline, causing a race between qeth_do_reset and ccwgroup_remove.
Call qeth_set_offline() directly in the qeth_do_reset() error case and
then a new variant of ccwgroup_set_offline(), without taking
discipline_mutex.
Fixes: b41b554c1e ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Problem: qeth_close_dev_handler is a worker that tries to acquire
card->discipline_mutex via drv->set_offline() in ccwgroup_set_offline().
Since commit b41b554c1e
("s390/qeth: fix locking for discipline setup / removal")
qeth_remove_discipline() is called under card->discipline_mutex and
cancels the work and waits for it to finish.
STOPLAN reception with reason code IPA_RC_VEPA_TO_VEB_TRANSITION is the
only situation that schedules close_dev_work. In that situation scheduling
qeth recovery will also result in an offline interface, when resetting the
isolation mode fails, if the external switch is still set to VEB.
And since commit 0b9902c1fc ("s390/qeth: fix deadlock during recovery")
qeth recovery does not aquire card->discipline_mutex anymore.
So we accept the longer pathlength of qeth_schedule_recovery in this
error situation and re-use the existing function.
As a side-benefit this changes the hwtrap to behave like during recovery
instead of like during a user-triggered set_offline.
Fixes: b41b554c1e ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When qeth_set_online() calls qeth_clear_working_pool_list() to roll
back after an error exit from qeth_hardsetup_card(), we are at risk of
accessing card->qdio.in_q before it was allocated by
qeth_alloc_qdio_queues() via qeth_mpc_initialize().
qeth_clear_working_pool_list() then dereferences NULL, and by writing to
queue->bufs[i].pool_entry scribbles all over the CPU's lowcore.
Resulting in a crash when those lowcore areas are used next (eg. on
the next machine-check interrupt).
Such a scenario would typically happen when the device is first set
online and its queues aren't allocated yet. An early IO error or certain
misconfigs (eg. mismatched transport mode, bad portno) then cause us to
error out from qeth_hardsetup_card() with card->qdio.in_q still being
NULL.
Fix it by checking the pointer for NULL before accessing it.
Note that we also have (rare) paths inside qeth_mpc_initialize() where
a configuration change can cause us to free the existing queues,
expecting that subsequent code will allocate them again. If we then
error out before that re-allocation happens, the same bug occurs.
Fixes: eff73e16ee ("s390/qeth: tolerate pre-filled RX buffer")
Reported-by: Stefan Raspl <raspl@linux.ibm.com>
Root-caused-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the v5.13 cycle we updated the SPI subsystem to generate OF style
modaliases for SPI devices, replacing the old Linux style modalises we
used to generate based on spi_device_id which are the DT style name with
the vendor removed. Unfortunately this means that we start only
reporting OF style modalises and not the old ones and there is nothing
that ensures that drivers list every possible OF compatible string in
their OF ID table. The result is that there are systems which have been
relying on loading modules based on the old style that are now broken,
as found by Russell King with spi-nor on Macchiatobin.
spi-nor is a particularly problematic case for this, it only lists a
single generic DT compatible jedec,spi-nor in the driver but supports a
huge raft of device specific compatibles, with a large set of part
numbers many of which are offered by multiple vendors. Russell's
searches of upstream device trees has turned up examples with vendor
names written in non-standard ways too. To make matters worse up until
8ff16cf77c ("Documentation: devicetree: m25p80: add "nor-jedec"
binding") the generic compatible was not part of the binding so there
are device trees out there written to that binding version which don't
list it all. The sheer number of parts supported together with our
previous approach of ignoring the vendor ID makes robustly fixing this
by adding compatibles to the spi-nor driver seem problematic, the
current DT binding document does not list all the parts supported by the
driver at the minute (further patches will fix this).
I've also investigated supporting both formats of modalias
simultaneously but that doesn't seem possible, especially without
breaking our userspace ABI which is obviously not viable.
Instead revert the relevant changes for now:
e09f2ab8ee ("spi: update modalias_show after of_device_uevent_modalias support")
3ce6c9e261 ("spi: add of_device_uevent_modalias support")
This will unfortunately mean that any system which had started having
modules autoload based on the OF compatibles for drivers that list
things there but not in the spi_device_ids will now not have those
modules load which is itself a regression. Since it affects a narrower
time window and the particularly problematic spi-nor driver may be
critical to system boot on smaller systems this seems the best of a
series of bad options. I will start an audit of SPI drivers to identify
and fix cases where things won't autoload using spi_device_id, this is
not great but seems to be the best way forward that anyone has been able
to identify.
Thanks to Russell for both his report and the additional diagnostic and
analysis work he has done here, the detailed research above was his
work.
Fixes: e09f2ab8ee ("spi: update modalias_show after of_device_uevent_modalias support")
Fixes: 3ce6c9e261 ("spi: add of_device_uevent_modalias support")
Reported-by: Russell King (Oracle) <linux@armlinux.org.uk>
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Andreas Schwab <schwab@suse.de>
Cc: Marco Felsch <m.felsch@pengutronix.de>
These issues can be used by an unprivileged local user to circumvent the verifier and gain root privileges.
v4.1+
s390/bpf: Fix optimizing out zero-extensions
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
v5.5+
s390/bpf: Fix branch shortening during codegen pass
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmFDDAoACgkQjYWKoQLX
FBiK/ggAi9LxNSrxNI4u83PY/ubZ+13XIkB59BtvShHnzSSw/e0Sdbr7dnedljaF
qWv9MUF9s1P88DRlXw/gHIckvIH77uvT28i5Omg6JfUnBLB0LaVC4QiwhmfbqBbV
psew9zV1z0/2YFvagCsc/XdF4QdlyLm3GKO/bEAeFXG/jDigkzBVB3clffIv7zm7
mD4uXgELil8PNv9tliRonUwoGG5P+0tF7KviTxi3aU7NLXtZxJ1Qt0N+HjRaDpwR
kxvSaukn61yWafUWlgbmGL8HGqhFqCk4SQzHSAZ8m87dl6KjxDiQUjvccn5xQPF2
M7JQU7INFApvK2QZFgW66jnTeqvUcQ==
=Qmcn
-----END PGP SIGNATURE-----
Merge tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 eBPF fixes from Vasily Gorbik:
"Johan Almbladh has implemented a number of new testcases for eBPF [1],
which uncovered three miscompilation issues in the s390 eBPF JIT"
Link: https://lore.kernel.org/bpf/20210902185229.1840281-1-johan.almbladh@anyfinetworks.com/ [1]
* tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/bpf: Fix optimizing out zero-extensions
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
s390/bpf: Fix branch shortening during codegen pass
In commit b7213ffa0e ("qnx4: avoid stringop-overread errors") I tried
to teach gcc about how the directory entry structure can be two
different things depending on a status flag. It made the code clearer,
and it seemed to make gcc happy.
However, Arnd points to a gcc bug, where despite using two different
members of a union, gcc then gets confused, and uses the size of one of
the members to decide if a string overrun happens. And not necessarily
the rigth one.
End result: with some configurations, gcc-11 will still complain about
the source buffer size being overread:
fs/qnx4/dir.c: In function 'qnx4_readdir':
fs/qnx4/dir.c:76:32: error: 'strnlen' specified bound [16, 48] exceeds source size 1 [-Werror=stringop-overread]
76 | size = strnlen(name, size);
| ^~~~~~~~~~~~~~~~~~~
fs/qnx4/dir.c:26:22: note: source object declared here
26 | char de_name;
| ^~~~~~~
because gcc will get confused about which union member entry is actually
getting accessed, even when the source code is very clear about it. Gcc
internally will have combined two "redundant" pointers (pointing to
different union elements that are at the same offset), and takes the
size checking from one or the other - not necessarily the right one.
This is clearly a gcc bug, but we can work around it fairly easily. The
biggest thing here is the big honking comment about why we do what we
do.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578#c6
Reported-and-tested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some devices, even non convertible ones, can send incorrect
SW_TABLET_MODE reports.
Add an allow list and accept such reports only from devices in it.
Bug reported for Dell XPS 17 9710 on:
https://gitlab.freedesktop.org/libinput/libinput/-/issues/662
Reported-by: Tobias Gurtzick <magic@wizardtales.com>
Suggested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Tobias Gurtzick <magic@wizardtales.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://lore.kernel.org/r/20210920160312.9787-1-jose.exposito89@gmail.com
[hdegoede@redhat.com: Check dmi_switches_auto_add_allow_list only once]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
When DELL_WMI=y, DELL_WMI_PRIVACY=y, and LEDS_TRIGGER_AUDIO=m, there
is a linker error since the LEDS trigger code is built as a loadable
module. This happens because DELL_WMI_PRIVACY is a bool that depends
on a tristate (LEDS_TRIGGER_AUDIO=m), which can be dangerous.
ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe':
dell-wmi-privacy.c:(.text+0x3df): undefined reference to `ledtrig_audio_get'
Fixes: 8af9fa37b8 ("platform/x86: dell-privacy: Add support for Dell hardware privacy")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Perry Yuan <Perry.Yuan@dell.com>
Cc: Dell.Client.Kernel@dell.com
Cc: platform-driver-x86@vger.kernel.org
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <mgross@linux.intel.com>
Link: https://lore.kernel.org/r/20210918044829.19222-1-rdunlap@infradead.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Vladimir Oltean says:
====================
Fix mdiobus users with devres
Commit ac3a68d566 ("net: phy: don't abuse devres in
devm_mdiobus_register()") by Bartosz Golaszewski has introduced two
classes of potential bugs by making the devres callback of
devm_mdiobus_alloc stop calling mdiobus_unregister.
The exact buggy circumstances are presented in the individual commit
messages. I have searched the tree for other occurrences, but at the
moment:
- for issue (a) I have no concrete proof that other buses except SPI and
I2C suffer from it, and the only SPI or I2C device drivers that call
of_mdiobus_alloc are the DSA drivers that leave a NULL
ds->slave_mii_bus and a non-NULL ds->ops->phy_read, aka ksz9477,
ksz8795, lan9303_i2c, vsc73xx-spi.
- for issue (b), all drivers which call of_mdiobus_alloc either use
of_mdiobus_register too, or call mdiobus_unregister sometime within
the ->remove path.
Although at this point I've seen enough strangeness caused by this
"device_del during ->shutdown" that I'm just going to copy the SPI and
I2C subsystem maintainers to this patch series, to get their feedback
whether they've had reports about things like this before. I don't think
other buses behave in this way, it forces SPI and I2C devices to have to
protect themselves from a really strange set of issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:
spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
-> spi_unregister_device(to_spi_device(dev));
-> device_del(&spi->dev);
So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:
i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
-> i2c_unregister_device(client);
-> device_unregister(&client->dev);
The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).
So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.
This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:
void mdiobus_free(struct mii_bus *bus)
{
/* For compatibility with error handling in drivers. */
if (bus->state == MDIOBUS_ALLOCATED) {
kfree(bus);
return;
}
BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
bus->state = MDIOBUS_RELEASED;
put_device(&bus->dev);
}
In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.
I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:
(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
located on a bus that unregisters its children on shutdown. After
the blamed patch, either both the alloc and the register should use
devres, or none should.
(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
mdiobus_unregister at all in the remove path. After the blamed
patch, nobody unregisters the MDIO bus anymore, so this is even more
buggy than the previous case which needs a specific bus
configuration to be seen, this one is an unconditional bug.
In this case, the Realtek drivers fall under category (b). To solve it,
we can register the MDIO bus under devres too, which restores the
previous behavior.
Fixes: ac3a68d566 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:
spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
-> spi_unregister_device(to_spi_device(dev));
-> device_del(&spi->dev);
So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:
i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
-> i2c_unregister_device(client);
-> device_unregister(&client->dev);
The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).
So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.
This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:
void mdiobus_free(struct mii_bus *bus)
{
/* For compatibility with error handling in drivers. */
if (bus->state == MDIOBUS_ALLOCATED) {
kfree(bus);
return;
}
BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
bus->state = MDIOBUS_RELEASED;
put_device(&bus->dev);
}
In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.
I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:
(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
located on a bus that unregisters its children on shutdown. After
the blamed patch, either both the alloc and the register should use
devres, or none should.
(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
mdiobus_unregister at all in the remove path. After the blamed
patch, nobody unregisters the MDIO bus anymore, so this is even more
buggy than the previous case which needs a specific bus
configuration to be seen, this one is an unconditional bug.
In this case, DSA falls into category (a), it tries to be helpful and
registers an MDIO bus on behalf of the switch, which might be on such a
bus. I've no idea why it does it under devres.
It does this on probe:
if (!ds->slave_mii_bus && ds->ops->phy_read)
alloc and register mdio bus
and this on remove:
if (ds->slave_mii_bus && ds->ops->phy_read)
unregister mdio bus
I _could_ imagine using devres because the condition used on remove is
different than the condition used on probe. So strictly speaking, DSA
cannot determine whether the ds->slave_mii_bus it sees on remove is the
ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
have solved that problem. But nonetheless, the existing code already
proceeds to unregister the MDIO bus, even though it might be
unregistering an MDIO bus it has never registered. So I can only guess
that no driver that implements ds->ops->phy_read also allocates and
registers ds->slave_mii_bus itself.
So in that case, if unregistering is fine, freeing must be fine too.
Stop using devres and free the MDIO bus manually. This will make devres
stop attempting to free a still registered MDIO bus on ->shutdown.
Fixes: ac3a68d566 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the blamed commit, dsa_tree_teardown_switches() was split into two
smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports.
However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports.
Fixes: a57d8c217a ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul says:
====================
net/smc: fixes 2021-09-20
Please apply the following patches for smc to netdev's net tree.
The first patch adds a missing error check, and the second patch
fixes a possible leak of a lock in a worker.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The abort_work is scheduled when a connection was detected to be
out-of-sync after a link failure. The work calls smc_conn_kill(),
which calls smc_close_active_abort() and that might end up calling
smc_close_cancel_work().
smc_close_cancel_work() cancels any pending close_work and tx_work but
needs to release the sock_lock before and acquires the sock_lock again
afterwards. So when the sock_lock was NOT acquired before then it may
be held after the abort_work completes. Thats why the sock_lock is
acquired before the call to smc_conn_kill() in __smc_lgr_terminate(),
but this is missing in smc_conn_abort_work().
Fix that by acquiring the sock_lock first and release it after the
call to smc_conn_kill().
Fixes: b286a0651e ("net/smc: handle incoming CDC validation message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot was able to trigger the following warning [1]
No repro found by syzbot yet but I was able to trigger similar issue
by having 2 scripts running in parallel, changing conntrack hash sizes,
and:
for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done
It would take more than 5 minutes for net_namespace structures
to be cleaned up.
This is because nf_ct_iterate_cleanup() has to restart everytime
a resize happened.
By adding a mutex, we can serialize hash resizes and cleanups
and also make get_next_corpse() faster by skipping over empty
buckets.
Even without resizes in the picture, this patch considerably
speeds up network namespace dismantles.
[1]
INFO: task syz-executor.0:8312 can't die for more than 144 seconds.
task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006
Call Trace:
context_switch kernel/sched/core.c:4955 [inline]
__schedule+0x940/0x26f0 kernel/sched/core.c:6236
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
__local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390
local_bh_enable include/linux/bottom_half.h:32 [inline]
get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline]
nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275
nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469
ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171
setup_net+0x639/0xa30 net/core/net_namespace.c:349
copy_net_ns+0x319/0x760 net/core/net_namespace.c:470
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226
ksys_unshare+0x445/0x920 kernel/fork.c:3128
__do_sys_unshare kernel/fork.c:3202 [inline]
__se_sys_unshare kernel/fork.c:3200 [inline]
__x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f63da68e739
RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000
RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80
R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000
Showing all locks held in the system:
1 lock held by khungtaskd/27:
#0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
2 locks held by kworker/u4:2/153:
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268
#1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272
1 lock held by systemd-udevd/2970:
1 lock held by in:imklog/6258:
#0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990
3 locks held by kworker/1:6/8158:
1 lock held by syz-executor.0/8312:
2 locks held by kworker/u4:13/9320:
1 lock held by syz-executor.5/10178:
1 lock held by syz-executor.4/10217:
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
iptables/nftables has two types of log modules:
1. backend, e.g. nf_log_syslog, which implement the functionality
2. frontend, e.g. xt_LOG or nft_log, which call the functionality
provided by backend based on nf_tables or xtables rule set.
Problem is that the request_module() call to load the backed in
nf_logger_find_get() might happen with nftables transaction mutex held
in case the call path is via nf_tables/nft_compat.
This can cause deadlocks (see 'Fixes' tags for details).
The chosen solution as to let modprobe deal with this by adding 'pre: '
soft dep tag to xt_LOG (to load the syslog backend) and xt_NFLOG (to
load nflog backend).
Eric reports that this breaks on systems with older modprobe that
doesn't support softdeps.
Another, similar issue occurs when someone either insmods xt_(NF)LOG
directly or unloads the backend module (possible if no log frontend
is in use): because the frontend module is already loaded, modprobe is
not invoked again so the softdep isn't evaluated.
Add a workaround: If nf_logger_find_get() returns -ENOENT and call
is not via nft_compat, load the backend explicitly and try again.
Else, let nft_compat ask for deferred request_module via nf_tables
infra.
Softdeps are kept in-place, so with newer modprobe the dependencies
are resolved from userspace.
Fixes: cefa31a9d4 ("netfilter: nft_log: perform module load from nf_tables")
Fixes: a38b5b56d6 ("netfilter: nf_log: add module softdeps")
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a leftover from the times when this function was wired up via
pernet_operations. Now its called when userspace asks for the table.
With CONFIG_NET_NS=n, iptable_raw_table_init memory has been discarded
already and we get a kernel crash.
Other tables are fine, __net_init annotation was removed already.
Fixes: fdacd57c79 ("netfilter: x_tables: never register tables by default")
Reported-by: youling 257 <youling257@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ipv4 and device notifiers are called with RTNL mutex held.
The table walk can take some time, better not block other RTNL users.
'ip a' has been reported to block for up to 20 seconds when conntrack table
has many entries and device down events are frequent (e.g., PPP).
Reported-and-tested-by: Martin Zaharinov <micron10@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
masq_inet6_event is called asynchronously from system work queue,
because the inet6 notifier is atomic and nf_iterate_cleanup can sleep.
The ipv4 and device notifiers call nf_iterate_cleanup directly.
This is legal, but these notifiers are called with RTNL mutex held.
A large conntrack table with many devices coming and going will have severe
impact on the system usability, with 'ip a' blocking for several seconds.
This change places the defer code into a helper and makes it more
generic so ipv4 and ifdown notifiers can be converted to defer the
cleanup walk as well in a follow patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzbot reports following UAF:
BUG: KASAN: use-after-free in memcmp+0x18f/0x1c0 lib/string.c:955
nla_strcmp+0xf2/0x130 lib/nlattr.c:836
nft_table_lookup.part.0+0x1a2/0x460 net/netfilter/nf_tables_api.c:570
nft_table_lookup net/netfilter/nf_tables_api.c:4064 [inline]
nf_tables_getset+0x1b3/0x860 net/netfilter/nf_tables_api.c:4064
nfnetlink_rcv_msg+0x659/0x13f0 net/netfilter/nfnetlink.c:285
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
Problem is that all get operations are lockless, so the commit_mutex
held by nft_rcv_nl_event() isn't enough to stop a parallel GET request
from doing read-accesses to the table object even after synchronize_rcu().
To avoid this, unlink the table first and store the table objects in
on-stack scratch space.
Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Reported-and-tested-by: syzbot+f31660cf279b0557160c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add 20k entries to the connection tracking table, once from the
data plane, once via ctnetlink.
In both cases, each entry lives in a different conntrack zone
and addresses/ports are identical.
Expectation is that insertions work and occurs in constant time:
PASS: added 10000 entries in 1215 ms (now 10000 total, loop 1)
PASS: added 10000 entries in 1214 ms (now 20000 total, loop 2)
PASS: inserted 20000 entries from packet path in 2434 ms total
PASS: added 10000 entries in 57631 ms (now 10000 total)
PASS: added 10000 entries in 58572 ms (now 20000 total)
PASS: inserted 20000 entries via ctnetlink in 116205 ms
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a script to exercise NAT port clash resolution with directional zones.
Add net namespaces that use the same IP address and connect them to a
gateway.
Gateway uses policy routing based on iif/mark and conntrack zones to
isolate the client namespaces. In server direction, same zone with NAT
to single address is used.
Then, connect to a server from each client netns, using identical
connection id, i.e. saddr:sport -> daddr:dport.
Expectation is for all connections to succeeed: NAT gatway is
supposed to do port reallocation for each of the (clashing) connections.
This is based on the description/use case provided in the commit message of
deedb59039 ("netfilter: nf_conntrack: add direction support for zones").
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similar to the conntrack change, also use the zone id for the nat source
lists if the zone id is valid in both directions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>