linux

iv/linux

Author	SHA1	Message	Date
Greg Kroah-Hartman	b704883aa8	Linux 5.4.141 Link: https://lore.kernel.org/r/20210813150523.364549385@linuxfoundation.org Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> v5.4.141	2021-08-15 13:08:06 +02:00
Nikolay Borisov	983d6a6b7e	btrfs: don't flush from btrfs_delayed_inode_reserve_metadata commit 4d14c5cde5c268a2bc26addecf09489cb953ef64 upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:06 +02:00
Nikolay Borisov	ea13f678a3	btrfs: export and rename qgroup_reserve_meta commit 80e9baed722c853056e0c5374f51524593cb1031 upstream Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:06 +02:00
Qu Wenruo	41a9b8f36d	btrfs: qgroup: don't commit transaction when we already hold the handle commit 6f23277a49e68f8a9355385c846939ad0b1261e7 upstream [BUG] When running the following script, btrfs will trigger an ASSERT(): #/bin/bash mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 1G" $mnt/file sync btrfs quota enable $mnt btrfs quota rescan -w $mnt # Manually set the limit below current usage btrfs qgroup limit 512M $mnt $mnt # Crash happens touch $mnt/file The dmesg looks like this: assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3230! invalid opcode: 0000 [#1] SMP PTI RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] btrfs_commit_transaction.cold+0x11/0x5d [btrfs] try_flush_qgroup+0x67/0x100 [btrfs] __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs] btrfs_delayed_update_inode+0xaa/0x350 [btrfs] btrfs_update_inode+0x9d/0x110 [btrfs] btrfs_dirty_inode+0x5d/0xd0 [btrfs] touch_atime+0xb5/0x100 iterate_dir+0xf1/0x1b0 __x64_sys_getdents64+0x78/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fb5afe588db [CAUSE] In try_flush_qgroup(), we assume we don't hold a transaction handle at all. This is true for data reservation and mostly true for metadata. Since data space reservation always happens before we start a transaction, and for most metadata operation we reserve space in start_transaction(). But there is an exception, btrfs_delayed_inode_reserve_metadata(). It holds a transaction handle, while still trying to reserve extra metadata space. When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we will join current transaction and commit, while we still have transaction handle from qgroup code. [FIX] Let's check current->journal before we join the transaction. If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means we are not holding a transaction, thus are able to join and then commit transaction. If current->journal is a valid transaction handle, we avoid committing transaction and just end it This is less effective than committing current transaction, as it won't free metadata reserved space, but we may still free some data space before new data writes. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634 Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
YueHaibing	38b8485b72	net: xilinx_emaclite: Do not print real IOMEM pointer commit d0d62baa7f505bd4c59cd169692ff07ec49dde37 upstream. Printing kernel pointers is discouraged because they might leak kernel memory layout. This fixes smatch warning: drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn: argument 4 to %08lX specifier is cast from pointer Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Filipe Manana	654c19a7e8	btrfs: fix lockdep splat when enabling and disabling qgroups commit a855fbe69229078cd8aecd8974fb996a5ca651e6 upstream When running test case btrfs/017 from fstests, lockdep reported the following splat: [ 1297.067385] ====================================================== [ 1297.067708] WARNING: possible circular locking dependency detected [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted [ 1297.068322] ------------------------------------------------------ [ 1297.068629] btrfs/189080 is trying to acquire lock: [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.069274] but task is already holding lock: [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.070219] which lock already depends on the new lock. [ 1297.071131] the existing dependency chain (in reverse order) is: [ 1297.071721] -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}: [ 1297.072375] lock_acquire+0xd8/0x490 [ 1297.072710] __mutex_lock+0xa3/0xb30 [ 1297.073061] btrfs_qgroup_inherit+0x59/0x6a0 [btrfs] [ 1297.073421] create_subvol+0x194/0x990 [btrfs] [ 1297.073780] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] [ 1297.074133] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] [ 1297.074498] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] [ 1297.074872] btrfs_ioctl+0x1a90/0x36f0 [btrfs] [ 1297.075245] __x64_sys_ioctl+0x83/0xb0 [ 1297.075617] do_syscall_64+0x33/0x80 [ 1297.075993] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.076380] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 1297.077166] check_prev_add+0x91/0xc60 [ 1297.077572] __lock_acquire+0x1740/0x3110 [ 1297.077984] lock_acquire+0xd8/0x490 [ 1297.078411] start_transaction+0x3c5/0x760 [btrfs] [ 1297.078853] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.079323] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.079789] __x64_sys_ioctl+0x83/0xb0 [ 1297.080232] do_syscall_64+0x33/0x80 [ 1297.080680] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.081139] other info that might help us debug this: [ 1297.082536] Possible unsafe locking scenario: [ 1297.083510] CPU0 CPU1 [ 1297.084005] ---- ---- [ 1297.084500] lock(&fs_info->qgroup_ioctl_lock); [ 1297.084994] lock(sb_internal#2); [ 1297.085485] lock(&fs_info->qgroup_ioctl_lock); [ 1297.085974] lock(sb_internal#2); [ 1297.086454] * DEADLOCK * [ 1297.087880] 3 locks held by btrfs/189080: [ 1297.088324] #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs] [ 1297.088799] #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.089284] #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.089771] stack backtrace: [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1 [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1297.092123] Call Trace: [ 1297.092629] dump_stack+0x8d/0xb5 [ 1297.093115] check_noncircular+0xff/0x110 [ 1297.093596] check_prev_add+0x91/0xc60 [ 1297.094076] ? kvm_clock_read+0x14/0x30 [ 1297.094553] ? kvm_sched_clock_read+0x5/0x10 [ 1297.095029] __lock_acquire+0x1740/0x3110 [ 1297.095510] lock_acquire+0xd8/0x490 [ 1297.095993] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.096476] start_transaction+0x3c5/0x760 [btrfs] [ 1297.096962] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097451] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097941] ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.098429] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.098904] ? do_user_addr_fault+0x20c/0x430 [ 1297.099382] ? kvm_clock_read+0x14/0x30 [ 1297.099854] ? kvm_sched_clock_read+0x5/0x10 [ 1297.100328] ? sched_clock+0x5/0x10 [ 1297.100801] ? sched_clock_cpu+0x12/0x180 [ 1297.101272] ? __x64_sys_ioctl+0x83/0xb0 [ 1297.101739] __x64_sys_ioctl+0x83/0xb0 [ 1297.102207] do_syscall_64+0x33/0x80 [ 1297.102673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.103148] RIP: 0033:0x7f773ff65d87 This is because during the quota enable ioctl we lock first the mutex qgroup_ioctl_lock and then start a transaction, and starting a transaction acquires a fs freeze semaphore (at the VFS level). However, every other code path, except for the quota disable ioctl path, we do the opposite: we start a transaction and then lock the mutex. So fix this by making the quota enable and disable paths to start the transaction without having the mutex locked, and then, after starting the transaction, lock the mutex and check if some other task already enabled or disabled the quotas, bailing with success if that was the case. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Conflicts: fs/btrfs/qgroup.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Qu Wenruo	c55442cdfd	btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT commit adca4d945c8dca28a85df45c5b117e6dac2e77f1 upstream commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT") tries to reduce the early EDQUOT problems by checking the qgroup free against threshold and tries to wake up commit kthread to free some space. The problem of that mechanism is, it can only free qgroup per-trans metadata space, can't do anything to data, nor prealloc qgroup space. Now since we have the ability to flush qgroup space, and implemented retry-after-EDQUOT behavior, such mechanism can be completely replaced. So this patch will cleanup such mechanism in favor of retry-after-EDQUOT. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Qu Wenruo	fdaf6a322f	btrfs: transaction: Cleanup unused TRANS_STATE_BLOCKED commit 3296bf562443a8ca35aaad959a76a49e9b412760 upstream The state was introduced in commit 4a9d8bdee368 ("Btrfs: make the state of the transaction more readable"), then in commit 302167c50b32 ("btrfs: don't end the transaction for delayed refs in throttle") the state is completely removed. So we can just clean up the state since it's only compared but never set. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Qu Wenruo	36af2de520	btrfs: qgroup: try to flush qgroup space when we get -EDQUOT commit c53e9653605dbf708f5be02902de51831be4b009 upstream [PROBLEM] There are known problem related to how btrfs handles qgroup reserved space. One of the most obvious case is the the test case btrfs/153, which do fallocate, then write into the preallocated range. # btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad) # --- tests/btrfs/153.out 2019-10-22 15:18:14.068965341 +0800 # +++ xfstests-dev/results//btrfs/153.out.bad 2020-07-01 20:24:40.730000089 +0800 # @@ -1,2 +1,5 @@ # QA output created by 153 # +pwrite: Disk quota exceeded # +/mnt/scratch/testfile2: Disk quota exceeded # +/mnt/scratch/testfile2: Disk quota exceeded # Silence is golden # ... # (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad' to see the entire diff) [CAUSE] Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"), we always reserve space no matter if it's COW or not. Such behavior change is mostly for performance, and reverting it is not a good idea anyway. For preallcoated extent, we reserve qgroup data space for it already, and since we also reserve data space for qgroup at buffered write time, it needs twice the space for us to write into preallocated space. This leads to the -EDQUOT in buffered write routine. And we can't follow the same solution, unlike data/meta space check, qgroup reserved space is shared between data/metadata. The EDQUOT can happen at the metadata reservation, so doing NODATACOW check after qgroup reservation failure is not a solution. [FIX] To solve the problem, we don't return -EDQUOT directly, but every time we got a -EDQUOT, we try to flush qgroup space: - Flush all inodes of the root NODATACOW writes will free the qgroup reserved at run_dealloc_range(). However we don't have the infrastructure to only flush NODATACOW inodes, here we flush all inodes anyway. - Wait for ordered extents This would convert the preallocated metadata space into per-trans metadata, which can be freed in later transaction commit. - Commit transaction This will free all per-trans metadata space. Also we don't want to trigger flush multiple times, so here we introduce a per-root wait list and a new root status, to ensure only one thread starts the flushing. Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Qu Wenruo	5c79287c2b	btrfs: qgroup: allow to unreserve range without releasing other ranges commit 263da812e87bac4098a4778efaa32c54275641db upstream [PROBLEM] Before this patch, when btrfs_qgroup_reserve_data() fails, we free all reserved space of the changeset. For example: ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M); If the last btrfs_qgroup_reserve_data() failed, it will release the entire [0, 3M) range. This behavior is kind of OK for now, as when we hit -EDQUOT, we normally go error handling and need to release all reserved ranges anyway. But this also means the following call is not possible: ret = btrfs_qgroup_reserve_data(); if (ret == -EDQUOT) { /* Do something to free some qgroup space */ ret = btrfs_qgroup_reserve_data(); } As if the first btrfs_qgroup_reserve_data() fails, it will free all reserved qgroup space. [CAUSE] This is because we release all reserved ranges when btrfs_qgroup_reserve_data() fails. [FIX] This patch will implement a new function, qgroup_unreserve_range(), to iterate through the ulist nodes, to find any nodes in the failure range, and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and decrease the extent_changeset::bytes_changed, so that we can revert to previous state. This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT happens. Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:05 +02:00
Nikolay Borisov	b7a722fd75	btrfs: make btrfs_qgroup_reserve_data take btrfs_inode commit 7661a3e033ab782366e0e1f4b6aad0df3555fcbd upstream There's only a single use of vfs_inode in a tracepoint so let's take btrfs_inode directly. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Nikolay Borisov	dfadea4061	btrfs: make qgroup_free_reserved_data take btrfs_inode commit df2cfd131fd33dbef1ce33be8b332b1f3d645f35 upstream It only uses btrfs_inode so can just as easily take it as an argument. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Miklos Szeredi	812f39ed5b	ovl: prevent private clone if bind mount is not allowed commit 427215d85e8d1476da1a86b8d67aceb485eb3631 upstream. Add the following checks from __do_loopback() to clone_private_mount() as well: - verify that the mount is in the current namespace - verify that there are no locked children Reported-by: Alois Wohlschlager <alois1@gmx-topmail.de> Fixes: c771d683a62e ("vfs: introduce clone_private_mount()") Cc: <stable@vger.kernel.org> # v3.18 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Pali Rohár	eeb4742501	ppp: Fix generating ppp unit id when ifname is not specified commit 3125f26c514826077f2a4490b75e9b1c7a644c42 upstream. When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has to choose interface name as this ioctl API does not support specifying it. Kernel in this case register new interface with name "ppp<id>" where <id> is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This applies also in the case when registering new ppp interface via rtnl without supplying IFLA_IFNAME. PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel assign to ppp interface, in case this ppp id is not already used by other ppp interface. In case user does not specify ppp unit id then kernel choose the first free ppp unit id. This applies also for case when creating ppp interface via rtnl method as it does not provide a way for specifying own ppp unit id. If some network interface (does not have to be ppp) has name "ppp<id>" with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails. And registering new ppp interface is not possible anymore, until interface which holds conflicting name is renamed. Or when using rtnl method with custom interface name in IFLA_IFNAME. As list of allocated / used ppp unit ids is not possible to retrieve from kernel to userspace, userspace has no idea what happens nor which interface is doing this conflict. So change the algorithm how ppp unit id is generated. And choose the first number which is not neither used as ppp unit id nor in some network interface with pattern "ppp<id>". This issue can be simply reproduced by following pppd call when there is no ppp interface registered and also no interface with name pattern "ppp<id>": pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty" Or by creating the one ppp interface (which gets assigned ppp unit id 0), renaming it to "ppp1" and then trying to create a new ppp interface (which will always fails as next free ppp unit id is 1, but network interface with name "ppp1" exists). This patch fixes above described issue by generating new and new ppp unit id until some non-conflicting id with network interfaces is generated. Signed-off-by: Pali Rohár <pali@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Luke D Jones	3460f3959d	ALSA: hda: Add quirk for ASUS Flow x13 commit 739d0959fbed23838a96c48fbce01dd2f6fb2c5f upstream. The ASUS GV301QH sound appears to work well with the quirk for ALC294_FIXUP_ASUS_DUAL_SPK. Signed-off-by: Luke D Jones <luke@ljones.dev> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210807025805.27321-1-luke@ljones.dev Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Longfang Liu	81d1a3f976	USB:ehci:fix Kunpeng920 ehci hardware problem commit 26b75952ca0b8b4b3050adb9582c8e2f44d49687 upstream. Kunpeng920's EHCI controller does not have SBRN register. Reading the SBRN register when the controller driver is initialized will get 0. When rebooting the EHCI driver, ehci_shutdown() will be called. if the sbrn flag is 0, ehci_shutdown() will return directly. The sbrn flag being 0 will cause the EHCI interrupt signal to not be turned off after reboot. this interrupt that is not closed will cause an exception to the device sharing the interrupt. Therefore, the EHCI controller of Kunpeng920 needs to skip the read operation of the SBRN register. Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/1617958081-17999-1-git-send-email-liulongfang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Lai Jiangshan	d28adaabbb	KVM: X86: MMU: Use the correct inherited permissions to get shadow page commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream. When computing the access permissions of a shadow page, use the effective permissions of the walk up to that point, i.e. the logic AND of its parents' permissions. Two guest PxE entries that point at the same table gfn need to be shadowed with different shadow pages if their parents' permissions are different. KVM currently uses the effective permissions of the last non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have full ("uwx") permissions, and the effective permissions are recorded only in role.access and merged into the leaves, this can lead to incorrect reuse of a shadow page and eventually to a missing guest protection page fault. For example, here is a shared pagetable: pgd[] pud[] pmd[] virtual address pointers /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--) /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-) pgd-\| (shared pmd[] as above) \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--) \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--) pud1 and pud2 point to the same pmd table, so: - ptr1 and ptr3 points to the same page. - ptr2 and ptr4 points to the same page. (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries) - First, the guest reads from ptr1 first and KVM prepares a shadow page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1. "u--" comes from the effective permissions of pgd, pud1 and pmd1, which are stored in pt->access. "u--" is used also to get the pagetable for pud1, instead of "uw-". - Then the guest writes to ptr2 and KVM reuses pud1 which is present. The hypervisor set up a shadow page for ptr2 with pt->access is "uw-" even though the pud1 pmd (because of the incorrect argument to kvm_mmu_get_page in the previous step) has role.access="u--". - Then the guest reads from ptr3. The hypervisor reuses pud1's shadow pmd for pud2, because both use "u--" for their permissions. Thus, the shadow pmd already includes entries for both pmd1 and pmd2. - At last, the guest writes to ptr4. This causes no vmexit or pagefault, because pud1's shadow page structures included an "uw-" page even though its role.access was "u--". Any kind of shared pagetable might have the similar problem when in virtual machine without TDP enabled if the permissions are different from different ancestors. In order to fix the problem, we change pt->access to be an array, and any access in it will not include permissions ANDed from child ptes. The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/ Remember to test it with TDP disabled. The problem had existed long before the commit 41074d07c78b ("KVM: MMU: Fix inherited permissions for emulated guest pte updates"), and it is hard to find which is the culprit. So there is no fixes tag here. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm - apply documentation changes to Documentation/virt/kvm/mmu.txt - adjusted context in arch/x86/kvm/paging_tmpl.h] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:04 +02:00
Wesley Cheng	5f4ab7e25f	usb: dwc3: gadget: Avoid runtime resume if disabling pullup [ Upstream commit cb10f68ad8150f243964b19391711aaac5e8ff42 ] If the device is already in the runtime suspended state, any call to the pullup routine will issue a runtime resume on the DWC3 core device. If the USB gadget is disabling the pullup, then avoid having to issue a runtime resume, as DWC3 gadget has already been halted/stopped. This fixes an issue where the following condition occurs: usb_gadget_remove_driver() -->usb_gadget_disconnect() -->dwc3_gadget_pullup(0) -->pm_runtime_get_sync() -> ret = 0 -->pm_runtime_put() [async] -->usb_gadget_udc_stop() -->dwc3_gadget_stop() -->dwc->gadget_driver = NULL ... dwc3_suspend_common() -->dwc3_gadget_suspend() -->DWC3 halt/stop routine skipped, driver_data == NULL This leads to a situation where the DWC3 gadget is not properly stopped, as the runtime resume would have re-enabled EP0 and event interrupts, and since we avoided the DWC3 gadget suspend, these resources were never disabled. Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded") Cc: stable <stable@vger.kernel.org> Acked-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1628058245-30692-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	1782c4af6b	usb: dwc3: gadget: Disable gadget IRQ during pullup disable [ Upstream commit 8212937305f84ef73ea81036dafb80c557583d4b ] Current sequence utilizes dwc3_gadget_disable_irq() alongside synchronize_irq() to ensure that no further DWC3 events are generated. However, the dwc3_gadget_disable_irq() API only disables device specific events. Endpoint events can still be generated. Briefly disable the interrupt line, so that the cleanup code can run to prevent device and endpoint events. (i.e. __dwc3_gadget_stop() and dwc3_stop_active_transfers() respectively) Without doing so, it can lead to both the interrupt handler and the pullup disable routine both writing to the GEVNTCOUNT register, which will cause an incorrect count being read from future interrupts. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1621571037-1424-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	54b7022f28	usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable [ Upstream commit 5aef629704ad4d983ecf5c8a25840f16e45b6d59 ] Ensure that dep->flags are cleared until after stop active transfers is completed. Otherwise, the ENDXFER command will not be executed during ep disable. Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers") Cc: stable <stable@vger.kernel.org> Reported-and-tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1616610664-16495-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	e36245a68e	usb: dwc3: gadget: Prevent EP queuing while stopping transfers [ Upstream commit f09ddcfcb8c569675066337adac2ac205113471f ] In the situations where the DWC3 gadget stops active transfers, once calling the dwc3_gadget_giveback(), there is a chance where a function driver can queue a new USB request in between the time where the dwc3 lock has been released and re-aquired. This occurs after we've already issued an ENDXFER command. When the stop active transfers continues to remove USB requests from all dep lists, the newly added request will also be removed, while controller still has an active TRB for it. This can lead to the controller accessing an unmapped memory address. Fix this by ensuring parameters to prevent EP queuing are set before calling the stop active transfers API. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1615507142-23097-1-git-send-email-wcheng@codeaurora.org Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	823f692508	usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup [ Upstream commit a1383b3537a7bea1c213baa7878ccc4ecf4413b5 ] usb_gadget_deactivate/usb_gadget_activate does not execute the UDC start operation, which may leave EP0 disabled and event IRQs disabled when re-activating the function. Move the enabling/disabling of USB EP0 and device event IRQs to be performed in the pullup routine. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Tested-by: Michael Tretter <m.tretter@pengutronix.de> Cc: stable <stable@vger.kernel.org> Reported-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1609282837-21666-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	25a0625fa9	usb: dwc3: gadget: Allow runtime suspend if UDC unbinded [ Upstream commit 77adb8bdf4227257e26b7ff67272678e66a0b250 ] The DWC3 runtime suspend routine checks for the USB connected parameter to determine if the controller can enter into a low power state. The connected state is only set to false after receiving a disconnect event. However, in the case of a device initiated disconnect (i.e. UDC unbind), the controller is halted and a disconnect event is never generated. Set the connected flag to false if issuing a device initiated disconnect to allow the controller to be suspended. Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1609283136-22140-2-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Wesley Cheng	5f081a928d	usb: dwc3: Stop active transfers before halting the controller [ Upstream commit ae7e86108b12351028fa7e8796a59f9b2d9e1774 ] In the DWC3 databook, for a device initiated disconnect or bus reset, the driver is required to send dependxfer commands for any pending transfers. In addition, before the controller can move to the halted state, the SW needs to acknowledge any pending events. If the controller is not halted properly, there is a chance the controller will continue accessing stale or freed TRBs and buffers. Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Reviewed-by: Thinh Nguyen <thinhn@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:03 +02:00
Masami Hiramatsu	396f29ea0c	tracing: Reject string operand in the histogram expression commit a9d10ca4986571bffc19778742d508cc8dd13e02 upstream. Since the string type can not be the target of the addition / subtraction operation, it must be rejected. Without this fix, the string type silently converted to digits. Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:02 +02:00
Alexandre Courbot	28276c280f	media: v4l2-mem2mem: always consider OUTPUT queue during poll commit 566463afdbc43c7744c5a1b89250fc808df03833 upstream. If poll() is called on a m2m device with the EPOLLOUT event after the last buffer of the CAPTURE queue is dequeued, any buffer available on OUTPUT queue will never be signaled because v4l2_m2m_poll_for_data() starts by checking whether dst_q->last_buffer_dequeued is set and returns EPOLLIN in this case, without looking at the state of the OUTPUT queue. Fix this by not early returning so we keep checking the state of the OUTPUT queue afterwards. Signed-off-by: Alexandre Courbot <gnurou@gmail.com> Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-15 13:08:02 +02:00
Sumit Garg	236aca7092	tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag [ Upstream commit 376e4199e327a5cf29b8ec8fb0f64f3d8b429819 ] Currently TEE_SHM_DMA_BUF flag has been inappropriately used to not register shared memory allocated for private usage by underlying TEE driver: OP-TEE in this case. So rather add a new flag as TEE_SHM_PRIV that can be utilized by underlying TEE drivers for private allocation and usage of shared memory. With this corrected, allow tee_shm_alloc_kernel_buf() to allocate a shared memory region without the backing of dma-buf. Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-15 13:08:02 +02:00
Sean Christopherson	5b774238e8	KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB [ Upstream commit 179c6c27bf487273652efc99acd3ba512a23c137 ] Use the raw ASID, not ASID-1, when nullifying the last used VMCB when freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by the raw ASID, thus KVM could get a false negative when checking for a different VMCB if KVM manages to reallocate the same ASID+VMCB combo for a new VM. Note, this cannot cause a functional issue _in the current code_, as pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the last_cpu variable. Thus it's theoretically possible that older versions of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID and VMCB exactly match those of a prior VM. Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-15 13:08:02 +02:00
Greg Kroah-Hartman	a998faa9c4	Linux 5.4.140 Link: https://lore.kernel.org/r/20210810172948.192298392@linuxfoundation.org Tested-by: Hulk Robot <hulkrobot@huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> v5.4.140	2021-08-12 13:21:05 +02:00
Mark Rutland	3c197fdd07	arm64: fix compat syscall return truncation commit e30e8d46cf605d216a799a28c77b8a41c328613a upstream. Due to inconsistencies in the way we manipulate compat GPRs, we have a few issues today: * For audit and tracing, where error codes are handled as a (native) long, negative error codes are expected to be sign-extended to the native 64-bits, or they may fail to be matched correctly. Thus a syscall which fails with an error may erroneously be identified as failing. * For ptrace, all compat return values should be sign-extended for consistency with 32-bit arm, but we currently only do this for negative return codes. * As we may transiently set the upper 32 bits of some compat GPRs while in the kernel, these can be sampled by perf, which is somewhat confusing. This means that where a syscall returns a pointer above 2G, this will be sign-extended, but will not be mistaken for an error as error codes are constrained to the inclusive range [-4096, -1] where no user pointer can exist. To fix all of these, we must consistently use helpers to get/set the compat GPRs, ensuring that we never write the upper 32 bits of the return code, and always sign-extend when reading the return code. This patch does so, with the following changes: * We re-organise syscall_get_return_value() to always sign-extend for compat tasks, and reimplement syscall_get_error() atop. We update syscall_trace_exit() to use syscall_get_return_value(). * We consistently use syscall_set_return_value() to set the return value, ensureing the upper 32 bits are never set unexpectedly. * As the core audit code currently uses regs_return_value() rather than syscall_get_return_value(), we special-case this for compat_user_mode(regs) such that this will do the right thing. Going forward, we should try to move the core audit code over to syscall_get_return_value(). Cc: <stable@vger.kernel.org> Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: weiyuchen <weiyuchen3@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> [Mark: trivial conflict resolution for v5.4.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:05 +02:00
Letu Ren	72fcaf6952	net/qla3xxx: fix schedule while atomic in ql_wait_for_drvr_lock and ql_adapter_reset [ Upstream commit 92766c4628ea349c8ddab0cd7bd0488f36e5c4ce ] When calling the 'ql_wait_for_drvr_lock' and 'ql_adapter_reset', the driver has already acquired the spin lock, so the driver should not call 'ssleep' in atomic context. This bug can be fixed by using 'mdelay' instead of 'ssleep'. Reported-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:05 +02:00
Prarit Bhargava	742e85fa9e	alpha: Send stop IPI to send to online CPUs [ Upstream commit caace6ca4e06f09413fb8f8a63319594cfb7d47d ] This issue was noticed while debugging a shutdown issue where some secondary CPUs are not being shutdown correctly. A fix for that [1] requires that secondary cpus be offlined using the cpu_online_mask so that the stop operation is a no-op if CPU HOTPLUG is disabled. I, like the author in [1] looked at the architectures and found that alpha is one of two architectures that executes smp_send_stop() on all possible CPUs. On alpha, smp_send_stop() sends an IPI to all possible CPUs but only needs to send them to online CPUs. Send the stop IPI to only the online CPUs. [1] https://lkml.org/lkml/2020/1/10/250 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:05 +02:00
Matteo Croce	26946d2139	virt_wifi: fix error on connect [ Upstream commit 17109e9783799be2a063b2bd861a508194b0a487 ] When connecting without first doing a scan, the BSS list is empty and __cfg80211_connect_result() generates this warning: $ iw dev wlan0 connect -w VirtWifi [ 15.371989] ------------[ cut here ]------------ [ 15.372179] WARNING: CPU: 0 PID: 92 at net/wireless/sme.c:756 __cfg80211_connect_result+0x402/0x440 [ 15.372383] CPU: 0 PID: 92 Comm: kworker/u2:2 Not tainted 5.13.0-kvm #444 [ 15.372512] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014 [ 15.372597] Workqueue: cfg80211 cfg80211_event_work [ 15.372756] RIP: 0010:__cfg80211_connect_result+0x402/0x440 [ 15.372818] Code: 48 2b 04 25 28 00 00 00 75 59 48 8b 3b 48 8b 76 10 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d 49 8d 65 f0 41 5d e9 d0 d4 fd ff 0f 0b <0f> 0b e9 f6 fd ff ff e8 f2 4a b4 ff e9 ec fd ff ff 0f 0b e9 19 fd [ 15.372966] RSP: 0018:ffffc900005cbdc0 EFLAGS: 00010246 [ 15.373022] RAX: 0000000000000000 RBX: ffff8880028e2400 RCX: ffff8880028e2472 [ 15.373088] RDX: 0000000000000002 RSI: 00000000fffffe01 RDI: ffffffff815335ba [ 15.373149] RBP: ffffc900005cbe00 R08: 0000000000000008 R09: ffff888002bdf8b8 [ 15.373209] R10: ffff88803ec208f0 R11: ffffffffffffe9ae R12: ffff88801d687d98 [ 15.373280] R13: ffff88801b5fe000 R14: ffffc900005cbdc0 R15: dead000000000100 [ 15.373330] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 15.373382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.373425] CR2: 000056421c468958 CR3: 000000001b458001 CR4: 0000000000170eb0 [ 15.373478] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.373529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.373580] Call Trace: [ 15.373611] ? cfg80211_process_wdev_events+0x10e/0x170 [ 15.373743] cfg80211_process_wdev_events+0x10e/0x170 [ 15.373783] cfg80211_process_rdev_events+0x21/0x40 [ 15.373846] cfg80211_event_work+0x20/0x30 [ 15.373892] process_one_work+0x1e9/0x340 [ 15.373956] worker_thread+0x4b/0x3f0 [ 15.374017] ? process_one_work+0x340/0x340 [ 15.374053] kthread+0x11f/0x140 [ 15.374089] ? set_kthread_struct+0x30/0x30 [ 15.374153] ret_from_fork+0x1f/0x30 [ 15.374187] ---[ end trace 321ef0cb7e9c0be1 ]--- wlan0 (phy #0): connected to 00:00:00:00:00:00 Add the fake bss just before the connect so that cfg80211_get_bss() finds the virtual network. As some code was duplicated, move it in a common function. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Link: https://lore.kernel.org/r/20210706154423.11065-1-mcroce@linux.microsoft.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:05 +02:00
Shreyansh Chouhan	17d7c9c940	reiserfs: check directory items on read from disk [ Upstream commit 13d257503c0930010ef9eed78b689cec417ab741 ] While verifying the leaf item that we read from the disk, reiserfs doesn't check the directory items, this could cause a crash when we read a directory item from the disk that has an invalid deh_location. This patch adds a check to the directory items read from the disk that does a bounds check on deh_location for the directory entries. Any directory entry header with a directory entry offset greater than the item length is considered invalid. Link: https://lore.kernel.org/r/20210709152929.766363-1-chouhan.shreyansh630@gmail.com Reported-by: syzbot+c31a48e6702ccb3d64c9@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:05 +02:00
Yu Kuai	bcad6ece2a	reiserfs: add check for root_inode in reiserfs_fill_super [ Upstream commit 2acf15b94d5b8ea8392c4b6753a6ffac3135cd78 ] Our syzcaller report a NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 116e95067 P4D 116e95067 PUD 1080b5067 PMD 0 Oops: 0010 [#1] SMP KASAN CPU: 7 PID: 592 Comm: a.out Not tainted 5.13.0-next-20210629-dirty #67 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-p4 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffff888114e779b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff110229cef39 RCX: ffffffffaa67e1aa RDX: 0000000000000000 RSI: ffff88810a58ee00 RDI: ffff8881233180b0 RBP: ffffffffac38e9c0 R08: ffffffffaa67e17e R09: 0000000000000001 R10: ffffffffb91c5557 R11: fffffbfff7238aaa R12: ffff88810a58ee00 R13: ffff888114e77aa0 R14: 0000000000000000 R15: ffff8881233180b0 FS: 00007f946163c480(0000) GS:ffff88839f1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000001099c1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __lookup_slow+0x116/0x2d0 ? page_put_link+0x120/0x120 ? __d_lookup+0xfc/0x320 ? d_lookup+0x49/0x90 lookup_one_len+0x13c/0x170 ? __lookup_slow+0x2d0/0x2d0 ? reiserfs_schedule_old_flush+0x31/0x130 reiserfs_lookup_privroot+0x64/0x150 reiserfs_fill_super+0x158c/0x1b90 ? finish_unfinished+0xb10/0xb10 ? bprintf+0xe0/0xe0 ? __mutex_lock_slowpath+0x30/0x30 ? __kasan_check_write+0x20/0x30 ? up_write+0x51/0xb0 ? set_blocksize+0x9f/0x1f0 mount_bdev+0x27c/0x2d0 ? finish_unfinished+0xb10/0xb10 ? reiserfs_kill_sb+0x120/0x120 get_super_block+0x19/0x30 legacy_get_tree+0x76/0xf0 vfs_get_tree+0x49/0x160 ? capable+0x1d/0x30 path_mount+0xacc/0x1380 ? putname+0x97/0xd0 ? finish_automount+0x450/0x450 ? kmem_cache_free+0xf8/0x5a0 ? putname+0x97/0xd0 do_mount+0xe2/0x110 ? path_mount+0x1380/0x1380 ? copy_mount_options+0x69/0x140 __x64_sys_mount+0xf0/0x190 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae This is because 'root_inode' is initialized with wrong mode, and it's i_op is set to 'reiserfs_special_inode_operations'. Thus add check for 'root_inode' to fix the problem. Link: https://lore.kernel.org/r/20210702040743.1918552-1-yukuai3@huawei.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:04 +02:00
Christoph Hellwig	e30a88f1f5	libata: fix ata_pio_sector for CONFIG_HIGHMEM [ Upstream commit ecef6a9effe49e8e2635c839020b9833b71e934c ] Data transfers are not required to be block aligned in memory, so they span two pages. Fix this by splitting the call to >sff_data_xfer into two for that case. This has been broken since the initial libata import before the damn of git, but was uncovered by the legacy ide driver removal. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210709130237.3730959-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-12 13:21:04 +02:00
Daniel Borkmann	a2671d96a3	bpf, selftests: Adjust few selftest result_unpriv outcomes commit 1bad6fd52be4ce12d207e2820ceb0f29ab31fc53 upstream. Given we don't need to simulate the speculative domain for registers with immediates anymore since the verifier uses direct imm-based rewrites instead of having to mask, we can also lift a few cases that were previously rejected. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> [OP: backport to 5.4, small context adjustment in stack_ptr.c] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:04 +02:00
Like Xu	4892b4f324	perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest commit df51fe7ea1c1c2c3bfdb81279712fdd2e4ea6c27 upstream. If we use "perf record" in an AMD Milan guest, dmesg reports a #GP warning from an unchecked MSR access error on MSR_F15H_PERF_CTLx: [] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000110076) at rIP: 0xffffffff8106ddb4 (native_write_msr+0x4/0x20) [] Call Trace: [] amd_pmu_disable_event+0x22/0x90 [] x86_pmu_stop+0x4c/0xa0 [] x86_pmu_del+0x3a/0x140 The AMD64_EVENTSEL_HOSTONLY bit is defined and used on the host, while the guest perf driver should avoid such use. Fixes: 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lkml.kernel.org/r/20210802070850.35295-1-likexu@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:04 +02:00
Arnd Bergmann	d6cf5342fa	soc: ixp4xx/qmgr: fix invalid __iomem access commit a8eee86317f11e97990d755d4615c1c0db203d08 upstream. Sparse reports a compile time warning when dereferencing an __iomem pointer: drivers/soc/ixp4xx/ixp4xx-qmgr.c:149:37: warning: dereference of noderef expression drivers/soc/ixp4xx/ixp4xx-qmgr.c:153:40: warning: dereference of noderef expression drivers/soc/ixp4xx/ixp4xx-qmgr.c:154:40: warning: dereference of noderef expression drivers/soc/ixp4xx/ixp4xx-qmgr.c:174:38: warning: dereference of noderef expression drivers/soc/ixp4xx/ixp4xx-qmgr.c:174:44: warning: dereference of noderef expression Use __raw_readl() here for consistency with the rest of the file. This should really get converted to some proper accessor, as the __raw functions are not meant to be used in drivers, but the driver has used these since the start, so for the moment, let's only fix the warning. Reported-by: kernel test robot <lkp@intel.com> Fixes: d4c9e9fc9751 ("IXP42x: Add QMgr support for IXP425 rev. A0 processors.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:04 +02:00
Dongliang Mu	a5bf7ef13e	spi: meson-spicc: fix memory leak in meson_spicc_remove commit 8311ee2164c5cd1b63a601ea366f540eae89f10e upstream. In meson_spicc_probe, the error handling code needs to clean up master by calling spi_master_put, but the remove function does not have this function call. This will lead to memory leak of spicc->master. Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Fixes: 454fa271bc4e("spi: Add Meson SPICC driver") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210720100116.1438974-1-mudongliangabcd@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:04 +02:00
Arnd Bergmann	27991c78d6	soc: ixp4xx: fix printing resources commit 8861452b2097bb0b5d0081a1c137fb3870b0a31f upstream. When compile-testing with 64-bit resource_size_t, gcc reports an invalid printk format string: In file included from include/linux/dma-mapping.h:7, from drivers/soc/ixp4xx/ixp4xx-npe.c:15: drivers/soc/ixp4xx/ixp4xx-npe.c: In function 'ixp4xx_npe_probe': drivers/soc/ixp4xx/ixp4xx-npe.c:694:18: error: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Werror=format=] dev_info(dev, "NPE%d at 0x%08x-0x%08x not available\n", Use the special %pR format string to print the resources. Fixes: 0b458d7b10f8 ("soc: ixp4xx: npe: Pass addresses as resources") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:04 +02:00
Will Deacon	07fd256d53	arm64: vdso: Avoid ISB after reading from cntvct_el0 commit 77ec462536a13d4b428a1eead725c4818a49f0b1 upstream. We can avoid the expensive ISB instruction after reading the counter in the vDSO gettime functions by creating a fake address hazard against a dummy stack read, just like we do inside the kernel. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20210318170738.7756-5-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chanho Park <chanho61.park@samsung.com>	2021-08-12 13:21:04 +02:00
Sean Christopherson	90e498ef3f	KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds commit d5aaad6f83420efb8357ac8e11c868708b22d0a9 upstream. Take a signed 'long' instead of an 'unsigned long' for the number of pages to add/subtract to the total number of pages used by the MMU. This fixes a zero-extension bug on 32-bit kernels that effectively corrupts the per-cpu counter used by the shrinker. Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus an unsigned 32-bit value on 32-bit kernels. As a result, the value used to adjust the per-cpu counter is zero-extended (unsigned -> signed), not sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to 4294967295 and effectively corrupts the counter. This was found by a staggering amount of sheer dumb luck when running kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick in while running tests and do_shrink_slab() logged an error about trying to free a negative number of objects. The truly lucky part is that the kernel just happened to be a slightly stale build, as the shrinker no longer yells about negative objects as of commit 18bb473e5031 ("mm: vmscan: shrink deferred objects proportional to priority"). vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460 Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210804214609.1096003-1-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Paolo Bonzini	2e1a80b934	KVM: Do not leak memory for duplicate debugfs directories commit 85cd39af14f498f791d8aab3fbd64cd175787f1a upstream. KVM creates a debugfs directory for each VM in order to store statistics about the virtual machine. The directory name is built from the process pid and a VM fd. While generally unique, it is possible to keep a file descriptor alive in a way that causes duplicate directories, which manifests as these messages: [ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present! Even though this should not happen in practice, it is more or less expected in the case of KVM for testcases that call KVM_CREATE_VM and close the resulting file descriptor repeatedly and in parallel. When this happens, debugfs_create_dir() returns an error but kvm_create_vm_debugfs() goes on to allocate stat data structs which are later leaked. The slow memory leak was spotted by syzkaller, where it caused OOM reports. Since the issue only affects debugfs, do a lookup before calling debugfs_create_dir, so that the message is downgraded and rate-limited. While at it, ensure kvm->debugfs_dentry is NULL rather than an error if it is not created. This fixes kvm_destroy_vm_debugfs, which was not checking IS_ERR_OR_NULL correctly. Cc: stable@vger.kernel.org Fixes: 536a6f88c49d ("KVM: Create debugfs dir and stat files for each VM") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Paolo Bonzini	43486cd739	KVM: x86: accept userspace interrupt only if no event is injected commit fa7a549d321a4189677b0cea86e58d9db7977f7b upstream. Once an exception has been injected, any side effects related to the exception (such as setting CR2 or DR6) have been taked place. Therefore, once KVM sets the VM-entry interruption information field or the AMD EVENTINJ field, the next VM-entry must deliver that exception. Pending interrupts are processed after injected exceptions, so in theory it would not be a problem to use KVM_INTERRUPT when an injected exception is present. However, DOSEMU is using run->ready_for_interrupt_injection to detect interrupt windows and then using KVM_SET_SREGS/KVM_SET_REGS to inject the interrupt manually. For this to work, the interrupt window must be delayed after the completion of the previous event injection. Cc: stable@vger.kernel.org Reported-by: Stas Sergeev <stsp2@yandex.ru> Tested-by: Stas Sergeev <stsp2@yandex.ru> Fixes: 71cc849b7093 ("KVM: x86: Fix split-irqchip vs interrupt injection window request") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Wei Shuyu	1b7b9713a5	md/raid10: properly indicate failure when ending a failed write request commit 5ba03936c05584b6f6f79be5ebe7e5036c1dd252 upstream. Similar to [1], this patch fixes the same bug in raid10. Also cleanup the comments. [1] commit 2417b9869b81 ("md/raid1: properly indicate failure when ending a failed write request") Cc: stable@vger.kernel.org Fixes: 7cee6d4e6035 ("md/raid10: end bio when the device faulty") Signed-off-by: Wei Shuyu <wsy@dogben.com> Acked-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Zheyu Ma	790cb68d35	pcmcia: i82092: fix a null pointer dereference bug commit e39cdacf2f664b09029e7c1eb354c91a20c367af upstream. During the driver loading process, the 'dev' field was not assigned, but the 'dev' field was referenced in the subsequent 'i82092aa_set_mem_map' function. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> CC: <stable@vger.kernel.org> [linux@dominikbrodowski.net: shorten commit message, add Cc to stable] Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Thomas Gleixner	42ac2c6348	timers: Move clearing of base::timer_running under base:: Lock commit bb7262b295472eb6858b5c49893954794027cd84 upstream. syzbot reported KCSAN data races vs. timer_base::timer_running being set to NULL without holding base::lock in expire_timers(). This looks innocent and most reads are clearly not problematic, but Frederic identified an issue which is: int data = 0; void timer_func(struct timer_list *t) { data = 1; } CPU 0 CPU 1 ------------------------------ -------------------------- base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock); if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk); ret = detach_if_pending(timer, base, true); base->running_timer = NULL; raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock); x = data; If the timer has previously executed on CPU 1 and then CPU 0 can observe base->running_timer == NULL and returns, assuming the timer has completed, but it's not guaranteed on all architectures. The comment for del_timer_sync() makes that guarantee. Moving the assignment under base->lock prevents this. For non-RT kernel it's performance wise completely irrelevant whether the store happens before or after taking the lock. For an RT kernel moving the store under the lock requires an extra unlock/lock pair in the case that there is a waiter for the timer, but that's not the end of the world. Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com Fixes: 030dcdd197d7 ("timers: Prepare support for PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Mario Kleiner	8211bb20da	serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts. commit 341abd693d10e5f337a51f140ae3e7a1ae0febf6 upstream. This attempts to fix a bug found with a serial port card which uses an MCS9922 chip, one of the 4 models for which MSI-X interrupts are currently supported. I don't possess such a card, and i'm not experienced with the serial subsystem, so this patch is based on what i think i found as a likely reason for failure, based on walking the user who actually owns the card through some diagnostic. The user who reported the problem finds the following in his dmesg output for the relevant ttyS4 and ttyS5: [ 0.580425] serial 0000:02:00.0: enabling device (0000 -> 0003) [ 0.601448] 0000:02:00.0: ttyS4 at I/O 0x3010 (irq = 125, base_baud = 115200) is a ST16650V2 [ 0.603089] serial 0000:02:00.1: enabling device (0000 -> 0003) [ 0.624119] 0000:02:00.1: ttyS5 at I/O 0x3000 (irq = 126, base_baud = 115200) is a ST16650V2 ... [ 6.323784] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd) [ 6.324128] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd) ... Output of setserial -a: /dev/ttyS4, Line 4, UART: 16650V2, Port: 0x3010, IRQ: 127 Baud_base: 115200, close_delay: 50, divisor: 0 closing_wait: 3000 Flags: spd_normal skip_test This suggests to me that the serial driver wants to register and share a MSI/MSI-X irq 128 with the xhci_hcd driver, whereas the xhci driver does not want to share the irq, as flags 0x00000080 (== IRQF_SHARED) from the serial port driver means to share the irq, and this mismatch ends in some failed irq init? With this setup, data reception works very unreliable, with dropped data, already at a transmission rate of only a 16 Bytes chunk every 1/120th of a second, ie. 1920 Bytes/sec, presumably due to rx fifo overflow due to mishandled or not used at all rx irq's? See full discussion thread with attempted diagnosis at: https://psychtoolbox.discourse.group/t/issues-with-iscan-serial-port-recording/3886 Disabling the use of MSI interrupts for the serial port pci card did fix the reliability problems. The user executed the following sequence of commands to achieve this: echo 0000:02:00.0 \| sudo tee /sys/bus/pci/drivers/serial/unbind echo 0000:02:00.1 \| sudo tee /sys/bus/pci/drivers/serial/unbind echo 0 \| sudo tee /sys/bus/pci/devices/0000:02:00.0/msi_bus echo 0 \| sudo tee /sys/bus/pci/devices/0000:02:00.1/msi_bus echo 0000:02:00.0 \| sudo tee /sys/bus/pci/drivers/serial/bind echo 0000:02:00.1 \| sudo tee /sys/bus/pci/drivers/serial/bind This resulted in the following log output: [ 82.179021] pci 0000:02:00.0: MSI/MSI-X disallowed for future drivers [ 87.003031] pci 0000:02:00.1: MSI/MSI-X disallowed for future drivers [ 98.537010] 0000:02:00.0: ttyS4 at I/O 0x3010 (irq = 17, base_baud = 115200) is a ST16650V2 [ 103.648124] 0000:02:00.1: ttyS5 at I/O 0x3000 (irq = 18, base_baud = 115200) is a ST16650V2 This patch attempts to fix the problem by disabling irq sharing when using MSI irq's. Note that all i know for sure is that disabling MSI irq's fixed the problem for the user, so this patch could be wrong and is untested. Please review with caution, keeping this in mind. Fixes: 8428413b1d14 ("serial: 8250_pci: Implement MSI(-X) support") Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de> Cc: stable <stable@vger.kernel.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Link: https://lore.kernel.org/r/20210729043306.18528-1-mario.kleiner.de@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:03 +02:00
Andy Shevchenko	f73dcb5d63	serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver commit 7f0909db761535aefafa77031062603a71557267 upstream. Elkhart Lake UARTs are PCI enumerated Synopsys DesignWare v4.0+ UART integrated with Intel iDMA 32-bit DMA controller. There is a specific driver to handle them, i.e. 8250_lpss. Hence, disable 8250_pci enumeration for these UARTs. Fixes: 1b91d97c66ef ("serial: 8250_lpss: Add ->setup() for Elkhart Lake ports") Fixes: 4f912b898dc2 ("serial: 8250_lpss: Enable HS UART on Elkhart Lake") Cc: stable <stable@vger.kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210713101739.36962-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:21:02 +02:00

1 2 3 4 5 ...

888198 Commits