IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
qed_cxt_tables_alloc() accesses the freed pointer on failure
of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
which may lead to use-after-free. Fix this issue by setting
p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When compiling with gcc version 14.0.0 20231129 (experimental) and
CONFIG_FORTIFY_SOURCE=y, I've noticed the following warning:
...
In function 'fortify_memcpy_chk',
inlined from 'ax88796c_tx_fixup' at drivers/net/ethernet/asix/ax88796c_main.c:287:2:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
This call to 'memcpy()' is interpreted as an attempt to copy TX_OVERHEAD
(which is 8) bytes from 4-byte 'sop' field of 'struct tx_pkt_info' and
thus overread warning is issued. Since we actually want to copy both
'sop' and 'seg' fields at once, use the convenient 'struct_group()' here.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20231211090535.9730-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use pcie_capability_read_word() for reading LNKSTA and remove the
custom define that matches to PCI_EXP_LNKSTA.
As only single user for cap_offset remains, replace it with a call to
pci_pcie_cap(). Instead of e1000_adapter, make local variable out of
pci_dev because both users are interested in it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
WARN_ON, and a BUG.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmV4tMwACgkQ8vlZVpUN
gaNZRAf/ejQZne9iZck8SSV62mkR9E7EwN9J2+gkWFrlsyurErZlVsBA5yRB+i9A
V1v6DRGDnYFwKFNHJhR/RW9NhEwpYkX9Vo3miksSCq8rsAB1kjSs3xVrTBIYi/8c
ztw4ncyxW7RRFRmruzFfUEKriiyJzxJYx+EqbNsQHcl5ET6Y2/5zM0bChV9MwuN3
iS1Rm98RbHVrylzKbGG562MaGdJyUYvQ+mnRCgma1mTu6K9SWLJg211icLTsDhHg
XEB/QGWji2O7xOudcry8wLIpoR6rYPAhWfbkLekW1K9hjV3iXuJoVjj7eB9LctMf
FAXr8u0FKJI0iIQyrQrEEqIuh+jKBA==
=4zQL
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG"
* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix soft lockup in journal_finish_inode_data_buffers()
ext4: fix warning in ext4_dio_write_end_io()
jbd2: increase the journal IO's priority
jbd2: correct the printing of write_flags in jbd2_write_superblock()
ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
Make the flow for pci shutdown be the same to the pci remove.
iavf_shutdown was implementing an incomplete version
of iavf_remove. It misses several calls to the kernel like
iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
that might break the system on reboot or hibernation.
Implement the call of iavf_remove directly in iavf_shutdown to
close this gap.
Fixes below error messages (dmesg) during shutdown stress tests -
[685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
VF 0
[685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
VF 0
Reproduction:
1. Create one VF interface:
echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs
2. Run live dmesg on the host:
dmesg -wH
3. On SUT, script below steps into vf_namespace_assignment.sh
<#!/bin/sh> // Remove <>. Git removes # line
if=<VF name> (edit this per VF name)
loop=0
while true; do
echo test round $loop
let loop++
ip netns add ns$loop
ip link set dev $if up
ip link set dev $if netns ns$loop
ip netns exec ns$loop ip link set dev $if up
ip netns exec ns$loop ip link set dev $if netns 1
ip netns delete ns$loop
done
4. Run the script for at least 1000 iterations on SUT:
./vf_namespace_assignment.sh
Expected result:
No errors in dmesg.
Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ntuple-filter feature on/off:
Default is on. If turned off, the filters will be removed from both
PF and iavf list. The removal is irrespective of current filter state.
Steps to reproduce:
-------------------
1. Ensure ntuple is on.
ethtool -K enp8s0 ntuple-filters on
2. Create a filter to receive the traffic into non-default rx-queue like 15
and ensure traffic is flowing into queue into 15.
Now, turn off ntuple. Traffic should not flow to configured queue 15.
It should flow to default RX queue.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
New states introduced:
IAVF_FDIR_FLTR_DIS_REQUEST
IAVF_FDIR_FLTR_DIS_PENDING
IAVF_FDIR_FLTR_INACTIVE
Current FDIR state machines (SM) are not adequate to handle a few
scenarios in the link DOWN/UP event, reset event and ntuple-feature.
For example, when VF link goes DOWN and comes back UP administratively,
the expectation is that previously installed filters should also be
restored. But with current SM, filters are not restored.
So with new SM, during link DOWN filters are marked as INACTIVE in
the iavf list but removed from PF. After link UP, SM will transition
from INACTIVE to ADD_REQUEST to restore the filter.
Similarly, with VF reset, filters will be removed from the PF, but
marked as INACTIVE in the iavf list. Filters will be restored after
reset completion.
Steps to reproduce:
-------------------
1. Create a VF. Here VF is enp8s0.
2. Assign IP addresses to VF and link partner and ping continuously
from remote. Here remote IP is 1.1.1.1.
3. Check default RX Queue of traffic.
ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets"
4. Add filter - change default RX Queue (to 15 here)
ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5
5. Ensure filter gets added and traffic is received on RX queue 15 now.
Link event testing:
-------------------
6. Bring VF link down and up. If traffic flows to configured queue 15,
test is success, otherwise it is a failure.
Reset event testing:
--------------------
7. Reset the VF. If traffic flows to configured queue 15, test is success,
otherwise it is a failure.
Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZXhQgQAKCRDh3BK/laaZ
PHL9AQC0y7A+HLH6oXM8uI8rqC8e78qGdoGGl+Ppapae+BhO8gD+Or5B4yR0MZKR
Z/j0zMe57mmRMplxcwz/LXXCqeE9+w0=
=cn6g
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix a couple of potential crashes, one introduced in 6.6 and one
in 5.10
- Fix misbehavior of virtiofs submounts on memory pressure
- Clarify naming in the uAPI for a recent feature
* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
fuse: share lookup state between submount and its parent
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
e1000e has own copy of PCI Negotiated Link Width field defines. Use the
ones from include/uapi/linux/pci_regs.h instead of the custom ones and
remove the custom ones and convert to FIELD_GET().
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of
custom masking and shifting.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV3pUoACgkQiiy9cAdy
T1GQCgv/YURd8zz5k+GSOvUF2tCl6zW6h0NJQbWRIjgl4i7eGHZwIslgCI6kZIN1
AFrSyUj4tZQmFvh0aVWLZeWsoKETbSkOYkz2dC4X/lC8LJD3VAy3vzAhu4oSAWva
+pItQVlOTG0CcmMSTANSfw0sSsCwC2BHAUJnnu7ypgERI3wllOPtxE1xN9mT/8Bf
NxJDZa3jtZd2hC4Cda1NTYYEfaSGufEOzPZIW9/h5ftpRo0qtEZkKh9TPddBMGm4
yMnt1sSp4DHoW6xOyGOt+7kJAGA5NtP3/voLSjirG558Bb4HjWhBT+Dkxe6dUiXn
i9gi1bFJ/8gRulv1cTdOxTFGE+i9Wr4PzpG2g82qugYRTl3LqLoJBa8NH+WzKz+q
AX8EySFdlJtE++wTMNZB5hgFuJNGkzRi3YbjrQjvHFDQvaSVHvtayyhuEN+UcqAe
gWuj1PTDKy6cfkxFYPDEBtMgp1u4+72nWOxoYUE5LyvzkLCLjfgMKCDX03RlAvfZ
zB76cU/3
=yMkH
-----END PGP SIGNATURE-----
Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Memory leak fix (in lock error path)
- Two fixes for create with allocation size
- FIx for potential UAF in lease break error path
- Five directory lease (caching) fixes found during additional recent
testing
* tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
ksmbd: fix wrong allocation size update in smb2_open()
ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
ksmbd: lazy v2 lease break on smb2_write()
ksmbd: send v2 lease break notification for directory
ksmbd: downgrade RWH lease caching state to RH for directory
ksmbd: set v2 lease capability
ksmbd: set epoch in create context v2 lease
ksmbd: fix memory leak in smb2_lock()
There's issue when do io test:
WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x24/0x30
dump_stack+0xb0/0x100
watchdog_timer_fn+0x254/0x3f8
__hrtimer_run_queues+0x11c/0x380
hrtimer_interrupt+0xfc/0x2f8
arch_timer_handler_phys+0x38/0x58
handle_percpu_devid_irq+0x90/0x248
generic_handle_irq+0x3c/0x58
__handle_domain_irq+0x68/0xc0
gic_handle_irq+0x90/0x320
el1_irq+0xcc/0x180
queued_spin_lock_slowpath+0x1d8/0x320
jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
kjournald2+0xec/0x2f0 [jbd2]
kthread+0x134/0x138
ret_from_fork+0x10/0x18
Analyzed informations from vmcore as follows:
(1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
(2) Now is processing the 855th jbd2_inode;
(3) JBD2 task has TIF_NEED_RESCHED flag;
(4) There's no pags in address_space around the 855th jbd2_inode;
(5) There are some process is doing drop caches;
(6) Mounted with 'nodioread_nolock' option;
(7) 128 CPUs;
According to informations from vmcore we know 'journal->j_list_lock' spin lock
competition is fierce. So journal_finish_inode_data_buffers() maybe process
slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
will not call cond_resched(). So may lead to soft lockup.
journal_finish_inode_data_buffers
filemap_fdatawait_range_keep_errors
__filemap_fdatawait_range
while (index <= end)
nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
if (!nr_pages)
break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
for (i = 0; i < nr_pages; i++)
wait_on_page_writeback(page);
cond_resched();
To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
JingZao(京造) WKB603 keyboard is a rebranded product of Jamesdonkey RS2
keyboard, identified as "hfd.cn WKB603" in wired mode, "WKB603" in bluetooth
mode. Adding them to the list of non-apple keyboards fixes function key.
Signed-off-by: Yan Jun <jerrysteve1101@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and
stop applying workaround") introduced a regression for ThinkPad
TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses
the same workarounds) but slightly different so that there are
false-positives during detecting well-behaving firmware. This commit
restricts detecting well-behaving firmware to the only model which
known to have one and have stable enough quirks to not cause
false-positives.
Fixes: 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround")
Link: https://lore.kernel.org/linux-input/ZXRiiPsBKNasioqH@jekhomev/
Link: https://bbs.archlinux.org/viewtopic.php?pid=2135468#p2135468
Signed-off-by: Mikhail Khvainitski <me@khvoinitsky.org>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Linus Walleij says:
====================
net: dsa: realtek: Two RTL8366RB fixes
These minor fixes were found while digging into other
issues: a weirdly named variable and bogus MTU handling.
Fix it up.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
====================
Link: https://lore.kernel.org/r/20231209-rtl8366rb-mtu-fix-v1-0-df863e2b2b2a@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The MTU callbacks are in layer 1 size, so for example 1500
bytes is a normal setting. Cache this size, and only add
the layer 2 framing right before choosing the setting. On
the CPU port this will however include the DSA tag since
this is transmitted from the parent ethernet interface!
Add the layer 2 overhead such as ethernet and VLAN framing
and FCS before selecting the size in the register.
This will make the code easier to understand.
The rtl8366rb_max_mtu() callback returns a bogus MTU
just subtracting the CPU tag, which is the only thing
we should NOT subtract. Return the correct layer 1
max MTU after removing headers and checksum.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Because rose_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with rose_accept().
A use-after-free for skb occurs with the following flow.
```
rose_ioctl() -> skb_peek()
rose_accept() -> skb_dequeue() -> kfree_skb()
```
Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Because do_vcc_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with vcc_recvmsg().
A use-after-free for skb occurs with the following flow.
```
do_vcc_ioctl() -> skb_peek()
vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The driver is using iowriteXX()/ioreadXX() APIs which are LE IO
accessors simplified as
1. Convert given value _from_ CPU _to_ LE
2. Write it to the device as is
The dev_addr is a byte stream, but because the driver uses 16-bit
IO accessors, it wants to perform double conversion on BE CPUs,
but it took it wrong, as it effectivelly does two times _from_ CPU
_to_ LE. What it has to do is to consider dev_addr as an array of
LE16 and hence do _from_ LE _to_ CPU conversion, followed by implied
_from_ CPU _to_ LE in the iowrite16().
To achieve that, use get_unaligned_le16(). This will make it correct
and allows to avoid sparse warning as reported by LKP.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312030058.hfZPTXd7-lkp@intel.com/
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20231208153327.3306798-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela says:
====================
net/sched: conditional notification of events for cls and act
This is an optimization we have been leveraging on P4TC but we believe
it will benefit rtnl users in general.
It's common to allocate an skb, build a notification message and then
broadcast an event. In the absence of any user space listeners, these
resources (cpu and memory operations) are wasted. In cases where the subsystem
is lockless (such as in tc-flower) this waste is more prominent. For the
scenarios where the rtnl_lock is held it is not as prominent.
The idea is simple. Build and send the notification iif:
- The user requests via NLM_F_ECHO or
- Someone is listening to the rtnl group (tc mon)
On a simple test with tc-flower adding 1M entries, using just a single core,
there's already a noticeable difference in the cycles spent in tc_new_tfilter
with this patchset.
before:
- 43.68% tc_new_tfilter
+ 31.73% fl_change
+ 6.35% tfilter_notify
+ 1.62% nlmsg_notify
0.66% __tcf_qdisc_find.part.0
0.64% __tcf_chain_get
0.54% fl_get
+ 0.53% tcf_proto_lookup_ops
after:
- 39.20% tc_new_tfilter
+ 34.58% fl_change
0.69% __tcf_qdisc_find.part.0
0.67% __tcf_chain_get
+ 0.61% tcf_proto_lookup_ops
Note, the above test is using iproute2:tc which execs a shell.
We expect people using netlink directly to observe even greater
reductions.
The qdisc side needs some refactoring of the notification routines to fit in
this new model, so they will be sent in a later patchset.
====================
Link: https://lore.kernel.org/r/20231208192847.714940-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today tc-filter/chain events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed. This will help to alleviate
system pressure when filters are concurrently added without the rtnl
lock as in tc-flower.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-8-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This argument is never called while set to true, so remove it as there's
no need for it.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-7-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today tc-action events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use max() in a couple of places that are open coding it with the
ternary operator.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a convenience helper for routines handling conditional rtnl
events, that is code that might send a notification depending on
rtnl_has_listeners/rtnl_notify_needed.
Instead of:
if (skb)
rtnetlink_send(...)
Use:
rtnetlink_maybe_send(...)
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Building on the rtnl_has_listeners helper, add the rtnl_notify_needed
helper to check if we can bail out early in the notification routines.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today, rtnl code creates a new skb and unconditionally fills and
broadcasts it to the relevant group. For most operations this is okay
and doesn't waste resources in general.
When operations are done without the rtnl_lock, as in tc-flower, such
skb allocation, message fill and no-op broadcasting can happen in all
cores of the system, which contributes to system pressure and wastes
precious cpu cycles when no one will receive the built message.
Introduce this helper so rtnetlink operations can simply check if someone
is listening and then proceed if necessary.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix a rare emergency shutdown path bug: dropping journal pins after
the filesystem has mostly been torn down is not what we want.
- Fix some concurrency issues with the btree write buffer and journal
replay by not using the btree write buffer until journal replay is
finished
- A fixup from the prior patch to kill journal pre-reservations: at the
start of the btree update path, where previously we took a
pre-reservation, we do at least want to check the journal watermark.
- Fix a race between dropping device metadata and btree node writes,
which would re-add a pointer to a device that had just been dropped
- Fix one of the SCRU lock warnings, in
bch2_compression_stats_to_text().
- Partial fix for a rare transaction paths overflow, when indirect
extents had been split by background tasks, by not running certain
triggers when they're not needed.
- Fix for creating a snapshot with implicit source in a subdirectory of
the containing subvolume
- Don't unfreeze when we're emergency read-only
- Fix for rebalance spinning trying to compress unwritten extentns
- Another deleted_inodes fix, for directories
- Fix a rare deadlock (usually just an unecessary wait) when flushing
the journal with an open journal entry.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmV2T4EACgkQE6szbY3K
bnay1w/+PyH5qwE2gOy17rno6cWSNyKJELUkcqVNqrSTZpuA+TbMbcV8+oOeBnG1
9/ShwKRvwwNC4HVk6KySoTMo9lRkaZ5wX6DpEsOqxoN8aCp6kqiCUxr0inAAyVdu
O8FktP83eSX/vERWNlCeGLdi1KsCK0BWVbVMpkiVEO9QhLpS9eo1C8btstDIjbsv
TVGvKO7IpVgibSBwymQPKpZa6BGN4d6emLlgKStdpVVR1RwJW3eLJwi1EV2hSp1f
LBnTI5eD64pu+phEb4zE83JX932XAbxdBWaHlN1y3i4l6+sJDu63Y4R8bkbW+rnJ
cbiyYM5IuAH6MFbbh9rIW8kEIvjrX13mY94oGlK8ClCI9WX129jD5538tEH624U5
KnhCZpkuzeGC5CVXNAzdJ8NP/Aj9qtKvSyssG6R5ZTitQ1FnTZ391Wb2pIRgj9pm
yVfpJ/Q4cizVfSsKBvtr0U5I444zq50z+brKwegIoH8uMuGHKXcIgTUOu4q5pKDD
znjS9eFrQTN2li2HB3LMxuS94yUmozqwgxClMptynLsHVknQH7F3cAdD+mYbwW5Q
GUOd/QTlpskBYAUfBS8ewllowRjLGDJyrGvbR9Mvitk8CxOLRgoDipdh1K13jDMS
zCmG1eQgdbtPHTM6fqif8Bu8xtgK7p2r099dcBhhiWmRyLPo5Qw=
=l5sa
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs
Pull more bcachefs bugfixes from Kent Overstreet:
- Fix a rare emergency shutdown path bug: dropping journal pins after
the filesystem has mostly been torn down is not what we want.
- Fix some concurrency issues with the btree write buffer and journal
replay by not using the btree write buffer until journal replay is
finished
- A fixup from the prior patch to kill journal pre-reservations: at the
start of the btree update path, where previously we took a
pre-reservation, we do at least want to check the journal watermark.
- Fix a race between dropping device metadata and btree node writes,
which would re-add a pointer to a device that had just been dropped
- Fix one of the SCRU lock warnings, in
bch2_compression_stats_to_text().
- Partial fix for a rare transaction paths overflow, when indirect
extents had been split by background tasks, by not running certain
triggers when they're not needed.
- Fix for creating a snapshot with implicit source in a subdirectory of
the containing subvolume
- Don't unfreeze when we're emergency read-only
- Fix for rebalance spinning trying to compress unwritten extentns
- Another deleted_inodes fix, for directories
- Fix a rare deadlock (usually just an unecessary wait) when flushing
the journal with an open journal entry.
* tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Close journal entry if necessary when flushing all pins
bcachefs: Fix uninitialized var in bch2_journal_replay()
bcachefs: Fix deleted inode check for dirs
bcachefs: rebalance shouldn't attempt to compress unwritten extents
bcachefs: don't attempt rw on unfreeze when shutdown
bcachefs: Fix creating snapshot with implict source
bcachefs: Don't run indirect extent trigger unless inserting/deleting
bcachefs: Convert compression_stats to for_each_btree_key2
bcachefs: Fix bch2_extent_drop_ptrs() call
bcachefs: Fix a journal deadlock in replay
bcachefs; Don't use btree write buffer until journal replay is finished
bcachefs: Don't drop journal pins in exit path
If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL
server or fileserver), an asynchronous probe to one of its addresses may
fail immediately because sendmsg() returns an error. When this happens, a
refcount underflow can happen if certain events hit a very small window.
The way this occurs is:
(1) There are two levels of "call" object, the afs_call and the
rxrpc_call. Each of them can be transitioned to a "completed" state
in the event of success or failure.
(2) Asynchronous afs_calls are self-referential whilst they are active to
prevent them from evaporating when they're not being processed. This
reference is disposed of when the afs_call is completed.
Note that an afs_call may only be completed once; once completed
completing it again will do nothing.
(3) When a call transmission is made, the app-side rxrpc code queues a Tx
buffer for the rxrpc I/O thread to transmit. The I/O thread invokes
sendmsg() to transmit it - and in the case of failure, it transitions
the rxrpc_call to the completed state.
(4) When an rxrpc_call is completed, the app layer is notified. In this
case, the app is kafs and it schedules a work item to process events
pertaining to an afs_call.
(5) When the afs_call event processor is run, it goes down through the
RPC-specific handler to afs_extract_data() to retrieve data from rxrpc
- and, in this case, it picks up the error from the rxrpc_call and
returns it.
The error is then propagated to the afs_call and that is completed
too. At this point the self-reference is released.
(6) If the rxrpc I/O thread manages to complete the rxrpc_call within the
window between rxrpc_send_data() queuing the request packet and
checking for call completion on the way out, then
rxrpc_kernel_send_data() will return the error from sendmsg() to the
app.
(7) Then afs_make_call() will see an error and will jump to the error
handling path which will attempt to clean up the afs_call.
(8) The problem comes when the error handling path in afs_make_call()
tries to unconditionally drop an async afs_call's self-reference.
This self-reference, however, may already have been dropped by
afs_extract_data() completing the afs_call
(9) The refcount underflows when we return to afs_do_probe_vlserver() and
that tries to drop its reference on the afs_call.
Fix this by making afs_make_call() attempt to complete the afs_call rather
than unconditionally putting it. That way, if afs_extract_data() manages
to complete the call first, afs_make_call() won't do anything.
The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and
sticking an msleep() in rxrpc_send_data() after the 'success:' label to
widen the race window.
The error message looks something like:
refcount_t: underflow; use-after-free.
WARNING: CPU: 3 PID: 720 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
...
RIP: 0010:refcount_warn_saturate+0xba/0x110
...
afs_put_call+0x1dc/0x1f0 [kafs]
afs_fs_get_capabilities+0x8b/0xe0 [kafs]
afs_fs_probe_fileserver+0x188/0x1e0 [kafs]
afs_lookup_server+0x3bf/0x3f0 [kafs]
afs_alloc_server_list+0x130/0x2e0 [kafs]
afs_create_volume+0x162/0x400 [kafs]
afs_get_tree+0x266/0x410 [kafs]
vfs_get_tree+0x25/0xc0
fc_mount+0xe/0x40
afs_d_automount+0x1b3/0x390 [kafs]
__traverse_mounts+0x8f/0x210
step_into+0x340/0x760
path_openat+0x13a/0x1260
do_filp_open+0xaf/0x160
do_sys_openat2+0xaf/0x170
or something like:
refcount_t: underflow; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0x99/0xda
...
afs_put_call+0x4a/0x175
afs_send_vl_probes+0x108/0x172
afs_select_vlserver+0xd6/0x311
afs_do_cell_detect_alias+0x5e/0x1e9
afs_cell_detect_alias+0x44/0x92
afs_validate_fc+0x9d/0x134
afs_get_tree+0x20/0x2e6
vfs_get_tree+0x1d/0xc9
fc_mount+0xe/0x33
afs_d_automount+0x48/0x9d
__traverse_mounts+0xe0/0x166
step_into+0x140/0x274
open_last_lookups+0x1c1/0x1df
path_openat+0x138/0x1c3
do_filp_open+0x55/0xb4
do_sys_openat2+0x6c/0xb6
Fixes: 34fa47612bfe ("afs: Fix race in async call refcounting")
Reported-by: Bill MacAllister <bill@ca-zephyr.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052304
Suggested-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/2633992.1702073229@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
River reports boot hangs with v6.6 and v6.7, and the bisect points to
commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
which moves the memory allocation and kernel decompression from the
legacy decompressor (which executes *after* ExitBootServices()) to the
EFI stub, using boot services for allocating the memory. The memory
allocation succeeds but the subsequent call to decompress_kernel() never
returns, resulting in a failed boot and a hanging system.
As it turns out, this issue only occurs when physical address
randomization (KASLR) is enabled, and given that this is a feature we
can live without (virtual KASLR is much more important), let's disable
the physical part of KASLR when booting on AMI UEFI firmware claiming to
implement revision v2.0 of the specification (which was released in
2006), as this is the version these systems advertise.
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218173
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Eric Dumazet says:
====================
ipv6: more data-race annotations
Small follow up series, taking care of races around
np->mcast_oif and np->ucast_oif.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
np->ucast_oif is read locklessly in some contexts.
Make all accesses to this field lockless, adding appropriate
annotations.
This also makes setsockopt( IPV6_UNICAST_IF ) lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
np->mcast_oif is read locklessly in some contexts.
Make all accesses to this field lockless, adding appropriate
annotations.
This also makes setsockopt( IPV6_MULTICAST_IF ) lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit b8dbbbc535a9 ("net: rtnetlink: remove local list
in __linkwatch_run_queue()"). It's evidently broken when there's a
non-urgent work that gets added back, and then the loop can never
finish.
While reverting, add a note about that.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: b8dbbbc535a9 ("net: rtnetlink: remove local list in __linkwatch_run_queue()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation's default Pause Forward setting is causing
unnecessary network traffic. This patch disables Pause Forward to
address this issue.
Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The efi_relocate_kernel() may load the PIE kernel to anywhere, the
loaded address may not be equal to link address or
EFI_KIMG_PREFERRED_ADDRESS.
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Wang Yao <wangyao@lemote.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Hariprasad Kelam says:
====================
octeontx2: Fix issues with promisc/allmulti mode
When interface is configured in promisc/all multi mode, low network
performance observed. This series patches address the same.
Patch1: Change the promisc/all multi mcam entry action to unicast if
there are no trusted vfs associated with PF.
Patch2: Configures RSS flow algorithm in promisc/all multi mcam entries
to address flow distribution issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The RSS flow algorithm is not set up correctly for promiscuous or all
multi MCAM entries. This has an impact on flow distribution.
This patch fixes the issue by updating flow algorithm index in above
mentioned MCAM entries.
Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation is such that, promisc mcam entry action
is set as multicast even when there are no trusted VFs. multicast
action causes the hardware to copy packet data, which reduces
the performance.
This patch fixes this issue by setting the promisc mcam entry action to
unicast instead of multicast when there are no trusted VFs. The same
change is made for the 'allmulti' mcam entry action.
Fixes: ffd2f89ad05c ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device supports UDP hardware segmentation offload, which helps
improving the performance. Thus, this patch adds support for UDP
segmentation offload from the driver side.
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware ready value is 1, and get firmware ready status
function should explicitly test for that value. The firmware
ready value read will be 2 after driver load, and on unbind
till firmware rewrites the firmware ready back to 0, the value
seen by driver will be 2, which should be regarded as not ready.
Fixes: 10c073e40469 ("octeon_ep: defer probe if firmware not ready")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since outstanding journal buffers hold a journal pin, when flushing all
pins we need to close the current journal entry if necessary so its pin
can be released.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Allow jumbo frames by changing maximum MTU size and number of RX queues.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the driver would like to transmit a jumbo frame like 2KiB or more,
it should be split into multiple queues. In the near future, to support
this, add handling specific descriptor types F{START,MID,END}. However,
such jumbo frames will not happen yet because the maximum MTU size is
still default for now.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>