linux

iv/linux

Author	SHA1	Message	Date
Alexander Stein	2db26fa7b9	ARM: dts: imx6ul: fix csi node compatible [ Upstream commit e0aca931a2c7c29c88ebf37f9c3cd045e083483d ] "fsl,imx6ul-csi" was never listed as compatible to "fsl,imx7-csi", neither in yaml bindings, nor previous txt binding. Remove the imx7 part. Fixes the dt schema check warning: csi@21c4000: compatible: 'oneOf' conditional failed, one must be fixed: ['fsl,imx6ul-csi', 'fsl,imx7-csi'] is too long Additional items are not allowed ('fsl,imx7-csi' was unexpected) 'fsl,imx8mm-csi' was expected Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:02 +02:00
Alexander Stein	5357c3b24c	ARM: dts: imx6ul: fix keypad compatible [ Upstream commit 7d15e0c9a515494af2e3199741cdac7002928a0e ] According to binding, the compatible shall only contain imx6ul and imx21 compatibles. Fixes the dt_binding_check warning: keypad@20b8000: compatible: 'oneOf' conditional failed, one must be fixed: ['fsl,imx6ul-kpp', 'fsl,imx6q-kpp', 'fsl,imx21-kpp'] is too long Additional items are not allowed ('fsl,imx6q-kpp', 'fsl,imx21-kpp' were unexpected) Additional items are not allowed ('fsl,imx21-kpp' was unexpected) 'fsl,imx21-kpp' was expected Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:02 +02:00
Alexander Stein	1351555856	ARM: dts: imx6ul: change operating-points to uint32-matrix [ Upstream commit edb67843983bbdf61b4c8c3c50618003d38bb4ae ] operating-points is a uint32-matrix as per opp-v1.yaml. Change it accordingly. While at it, change fsl,soc-operating-points as well, although there is no bindings file (yet). But they should have the same format. Fixes the dt_binding_check warning: cpu@0: operating-points:0: [696000, 1275000, 528000, 1175000, 396000, 1025000, 198000, 950000] is too long cpu@0: operating-points:0: Additional items are not allowed (528000, 1175000, 396000, 1025000, 198000, 950000 were unexpected) Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
Alexander Stein	ca367834a7	ARM: dts: imx6ul: add missing properties for sram [ Upstream commit 5655699cf5cff9f4c4ee703792156bdd05d1addf ] All 3 properties are required by sram.yaml. Fixes the dtbs_check warning: sram@900000: '#address-cells' is a required property sram@900000: '#size-cells' is a required property sram@900000: 'ranges' is a required property Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
Juri Lelli	8df06a2753	wait: Fix __wait_event_hrtimeout for RT/DL tasks [ Upstream commit cceeeb6a6d02e7b9a74ddd27a3225013b34174aa ] Changes to hrtimer mode (potentially made by __hrtimer_init_sleeper on PREEMPT_RT) are not visible to hrtimer_start_range_ns, thus not accounted for by hrtimer_start_expires call paths. In particular, __wait_event_hrtimeout suffers from this problem as we have, for example: fs/aio.c::read_events wait_event_interruptible_hrtimeout __wait_event_hrtimeout hrtimer_init_sleeper_on_stack <- this might "mode \|= HRTIMER_MODE_HARD" on RT if task runs at RT/DL priority hrtimer_start_range_ns WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard) fires since the latter doesn't see the change of mode done by init_sleeper Fix it by making __wait_event_hrtimeout call hrtimer_sleeper_start_expires, which is aware of the special RT/DL case, instead of hrtimer_start_range_ns. Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20220627095051.42470-1-juri.lelli@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
William Dean	f0d66189d1	irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() [ Upstream commit 71349cc85e5930dce78ed87084dee098eba24b59 ] The function ioremap() in gic_of_init() can fail, so its return value should be checked. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@163.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220723100128.2964304-1-williamsukatube@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
John Keeping	f066e01582	sched/core: Always flush pending blk_plug [ Upstream commit 401e4963bf45c800e3e9ea0d3a0289d738005fd4 ] With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two normal priority tasks (SCHED_OTHER, nice level zero): INFO: task kworker/u8:0:8 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000000 Workqueue: writeback wb_workfn (flush-7:0) [<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134) [<c08a3d84>] (schedule) from [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174) [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0) from [<c08a6708>] +(rt_mutex_slowlock.constprop.0+0xac/0x174) [<c08a6708>] (rt_mutex_slowlock.constprop.0) from [<c0374d60>] (fat_write_inode+0x34/0x54) [<c0374d60>] (fat_write_inode) from [<c0297304>] (__writeback_single_inode+0x354/0x3ec) [<c0297304>] (__writeback_single_inode) from [<c0297998>] (writeback_sb_inodes+0x250/0x45c) [<c0297998>] (writeback_sb_inodes) from [<c0297c20>] (__writeback_inodes_wb+0x7c/0xb8) [<c0297c20>] (__writeback_inodes_wb) from [<c0297f24>] (wb_writeback+0x2c8/0x2e4) [<c0297f24>] (wb_writeback) from [<c0298c40>] (wb_workfn+0x1a4/0x3e4) [<c0298c40>] (wb_workfn) from [<c0138ab8>] (process_one_work+0x1fc/0x32c) [<c0138ab8>] (process_one_work) from [<c0139120>] (worker_thread+0x22c/0x2d8) [<c0139120>] (worker_thread) from [<c013e6e0>] (kthread+0x16c/0x178) [<c013e6e0>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38) Exception stack(0xc10e3fb0 to 0xc10e3ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 INFO: task tar:2083 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tar state:D stack: 0 pid: 2083 ppid: 2082 flags:0x00000000 [<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134) [<c08a3d84>] (schedule) from [<c08a41b0>] (io_schedule+0x14/0x24) [<c08a41b0>] (io_schedule) from [<c08a455c>] (bit_wait_io+0xc/0x30) [<c08a455c>] (bit_wait_io) from [<c08a441c>] (__wait_on_bit_lock+0x54/0xa8) [<c08a441c>] (__wait_on_bit_lock) from [<c08a44f4>] (out_of_line_wait_on_bit_lock+0x84/0xb0) [<c08a44f4>] (out_of_line_wait_on_bit_lock) from [<c0371fb0>] (fat_mirror_bhs+0xa0/0x144) [<c0371fb0>] (fat_mirror_bhs) from [<c0372a68>] (fat_alloc_clusters+0x138/0x2a4) [<c0372a68>] (fat_alloc_clusters) from [<c0370b14>] (fat_alloc_new_dir+0x34/0x250) [<c0370b14>] (fat_alloc_new_dir) from [<c03787c0>] (vfat_mkdir+0x58/0x148) [<c03787c0>] (vfat_mkdir) from [<c0277b60>] (vfs_mkdir+0x68/0x98) [<c0277b60>] (vfs_mkdir) from [<c027b484>] (do_mkdirat+0xb0/0xec) [<c027b484>] (do_mkdirat) from [<c0100060>] (ret_fast_syscall+0x0/0x1c) Exception stack(0xc2e1bfa8 to 0xc2e1bff0) bfa0: 01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000 bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190 bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc Here the kworker is waiting on msdos_sb_info::s_lock which is held by tar which is in turn waiting for a buffer which is locked waiting to be flushed, but this operation is plugged in the kworker. The lock is a normal struct mutex, so tsk_is_pi_blocked() will always return false on !RT and thus the behaviour changes for RT. It seems that the intent here is to skip blk_flush_plug() in the case where a non-preemptible lock (such as a spinlock) has been converted to a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT schedule flag. But sched_submit_work() is only called from schedule() which is never called in this scenario, so the check can simply be deleted. Looking at the history of the -rt patchset, in fact this change was present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b ("sched/core: Rework the __schedule() preempt argument"). As described in [1]: The schedule process must distinguish between blocking on a regular sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock and rwlock): - rwsem and mutex must flush block requests (blk_schedule_flush_plug()) even if blocked on a lock. This can not deadlock because this also happens for non-RT. There should be a warning if the scheduling point is within a RCU read section. - spinlock and rwlock must not flush block requests. This will deadlock if the callback attempts to acquire a lock which is already acquired. Similarly to being preempted, there should be no warning if the scheduling point is within a RCU read section. and with the tsk_is_pi_blocked() in the scheduler path, we hit the first issue. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
Samuel Holland	f116c621dd	genirq: GENERIC_IRQ_IPI depends on SMP [ Upstream commit 0f5209fee90b4544c58b4278d944425292789967 ] The generic IPI code depends on the IRQ affinity mask being allocated and initialized. This will not be the case if SMP is disabled. Fix up the remaining driver that selected GENERIC_IRQ_IPI in a non-SMP config. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-3-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
Samuel Holland	00ffa95ed6	irqchip/mips-gic: Only register IPI domain when SMP is enabled [ Upstream commit 8190cc572981f2f13b6ffc26c7cfa7899e5d3ccc ] The MIPS GIC irqchip driver may be selected in a uniprocessor configuration, but it unconditionally registers an IPI domain. Limit the part of the driver dealing with IPIs to only be compiled when GENERIC_IRQ_IPI is enabled, which corresponds to an SMP configuration. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-2-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:01 +02:00
Antonio Borneo	f9842ec683	genirq: Don't return error on missing optional irq_request_resources() [ Upstream commit 95001b756467ecc9f5973eb5e74e97699d9bbdf1 ] Function irq_chip::irq_request_resources() is reported as optional in the declaration of struct irq_chip. If the parent irq_chip does not implement it, we should ignore it and return. Don't return error if the functions is missing. Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220512160544.13561-1-antonio.borneo@foss.st.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Chen Yu	079651c6cf	sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg [ Upstream commit 70fb5ccf2ebb09a0c8ebba775041567812d45f86 ] [Problem Statement] select_idle_cpu() might spend too much time searching for an idle CPU, when the system is overloaded. The following histogram is the time spent in select_idle_cpu(), when running 224 instances of netperf on a system with 112 CPUs per LLC domain: @usecs: [0] 533 \| \| [1] 5495 \| \| [2, 4) 12008 \| \| [4, 8) 239252 \| \| [8, 16) 4041924 \|@@@@@@@@@@@@@@ \| [16, 32) 12357398 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [32, 64) 14820255 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [64, 128) 13047682 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [128, 256) 8235013 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [256, 512) 4507667 \|@@@@@@@@@@@@@@@ \| [512, 1K) 2600472 \|@@@@@@@@@ \| [1K, 2K) 927912 \|@@@ \| [2K, 4K) 218720 \| \| [4K, 8K) 98161 \| \| [8K, 16K) 37722 \| \| [16K, 32K) 6715 \| \| [32K, 64K) 477 \| \| [64K, 128K) 7 \| \| netperf latency usecs: ======= case load Lat_99th std% TCP_RR thread-224 257.39 ( 0.21) The time spent in select_idle_cpu() is visible to netperf and might have a negative impact. [Symptom analysis] The patch [1] from Mel Gorman has been applied to track the efficiency of select_idle_sibling. Copy the indicators here: SIS Search Efficiency(se_eff%): A ratio expressed as a percentage of runqueues scanned versus idle CPUs found. A 100% efficiency indicates that the target, prev or recent CPU of a task was idle at wakeup. The lower the efficiency, the more runqueues were scanned before an idle CPU was found. SIS Domain Search Efficiency(dom_eff%): Similar, except only for the slower SIS patch. SIS Fast Success Rate(fast_rate%): Percentage of SIS that used target, prev or recent CPUs. SIS Success rate(success_rate%): Percentage of scans that found an idle CPU. The test is based on Aubrey's schedtests tool, including netperf, hackbench, schbench and tbench. Test on vanilla kernel: schedstat_parse.py -f netperf_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% TCP_RR 28 threads 99.978 18.535 99.995 100.000 TCP_RR 56 threads 99.397 5.671 99.964 100.000 TCP_RR 84 threads 21.721 6.818 73.632 100.000 TCP_RR 112 threads 12.500 5.533 59.000 100.000 TCP_RR 140 threads 8.524 4.535 49.020 100.000 TCP_RR 168 threads 6.438 3.945 40.309 99.999 TCP_RR 196 threads 5.397 3.718 32.320 99.982 TCP_RR 224 threads 4.874 3.661 25.775 99.767 UDP_RR 28 threads 99.988 17.704 99.997 100.000 UDP_RR 56 threads 99.528 5.977 99.970 100.000 UDP_RR 84 threads 24.219 6.992 76.479 100.000 UDP_RR 112 threads 13.907 5.706 62.538 100.000 UDP_RR 140 threads 9.408 4.699 52.519 100.000 UDP_RR 168 threads 7.095 4.077 44.352 100.000 UDP_RR 196 threads 5.757 3.775 35.764 99.991 UDP_RR 224 threads 5.124 3.704 28.748 99.860 schedstat_parse.py -f schbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% normal 1 mthread 99.152 6.400 99.941 100.000 normal 2 mthreads 97.844 4.003 99.908 100.000 normal 3 mthreads 96.395 2.118 99.917 99.998 normal 4 mthreads 55.288 1.451 98.615 99.804 normal 5 mthreads 7.004 1.870 45.597 61.036 normal 6 mthreads 3.354 1.346 20.777 34.230 normal 7 mthreads 2.183 1.028 11.257 21.055 normal 8 mthreads 1.653 0.825 7.849 15.549 schedstat_parse.py -f hackbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% process-pipe 1 group 99.991 7.692 99.999 100.000 process-pipe 2 groups 99.934 4.615 99.997 100.000 process-pipe 3 groups 99.597 3.198 99.987 100.000 process-pipe 4 groups 98.378 2.464 99.958 100.000 process-pipe 5 groups 27.474 3.653 89.811 99.800 process-pipe 6 groups 20.201 4.098 82.763 99.570 process-pipe 7 groups 16.423 4.156 77.398 99.316 process-pipe 8 groups 13.165 3.920 72.232 98.828 process-sockets 1 group 99.977 5.882 99.999 100.000 process-sockets 2 groups 99.927 5.505 99.996 100.000 process-sockets 3 groups 99.397 3.250 99.980 100.000 process-sockets 4 groups 79.680 4.258 98.864 99.998 process-sockets 5 groups 7.673 2.503 63.659 92.115 process-sockets 6 groups 4.642 1.584 58.946 88.048 process-sockets 7 groups 3.493 1.379 49.816 81.164 process-sockets 8 groups 3.015 1.407 40.845 75.500 threads-pipe 1 group 99.997 0.000 100.000 100.000 threads-pipe 2 groups 99.894 2.932 99.997 100.000 threads-pipe 3 groups 99.611 4.117 99.983 100.000 threads-pipe 4 groups 97.703 2.624 99.937 100.000 threads-pipe 5 groups 22.919 3.623 87.150 99.764 threads-pipe 6 groups 18.016 4.038 80.491 99.557 threads-pipe 7 groups 14.663 3.991 75.239 99.247 threads-pipe 8 groups 12.242 3.808 70.651 98.644 threads-sockets 1 group 99.990 6.667 99.999 100.000 threads-sockets 2 groups 99.940 5.114 99.997 100.000 threads-sockets 3 groups 99.469 4.115 99.977 100.000 threads-sockets 4 groups 87.528 4.038 99.400 100.000 threads-sockets 5 groups 6.942 2.398 59.244 88.337 threads-sockets 6 groups 4.359 1.954 49.448 87.860 threads-sockets 7 groups 2.845 1.345 41.198 77.102 threads-sockets 8 groups 2.871 1.404 38.512 74.312 schedstat_parse.py -f tbench_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% loopback 28 threads 99.976 18.369 99.995 100.000 loopback 56 threads 99.222 7.799 99.934 100.000 loopback 84 threads 19.723 6.819 70.215 100.000 loopback 112 threads 11.283 5.371 55.371 99.999 loopback 140 threads 0.000 0.000 0.000 0.000 loopback 168 threads 0.000 0.000 0.000 0.000 loopback 196 threads 0.000 0.000 0.000 0.000 loopback 224 threads 0.000 0.000 0.000 0.000 According to the test above, if the system becomes busy, the SIS Search Efficiency(se_eff%) drops significantly. Although some benchmarks would finally find an idle CPU(success_rate% = 100%), it is doubtful whether it is worth it to search the whole LLC domain. [Proposal] It would be ideal to have a crystal ball to answer this question: How many CPUs must a wakeup path walk down, before it can find an idle CPU? Many potential metrics could be used to predict the number. One candidate is the sum of util_avg in this LLC domain. The benefit of choosing util_avg is that it is a metric of accumulated historic activity, which seems to be smoother than instantaneous metrics (such as rq->nr_running). Besides, choosing the sum of util_avg would help predict the load of the LLC domain more precisely, because SIS_PROP uses one CPU's idle time to estimate the total LLC domain idle time. In summary, the lower the util_avg is, the more select_idle_cpu() should scan for idle CPU, and vice versa. When the sum of util_avg in this LLC domain hits 85% or above, the scan stops. The reason to choose 85% as the threshold is that this is the imbalance_pct(117) when a LLC sched group is overloaded. Introduce the quadratic function: y = SCHED_CAPACITY_SCALE - p * x^2 and y'= y / SCHED_CAPACITY_SCALE x is the ratio of sum_util compared to the CPU capacity: x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE) y' is the ratio of CPUs to be scanned in the LLC domain, and the number of CPUs to scan is calculated by: nr_scan = llc_weight * y' Choosing quadratic function is because: [1] Compared to the linear function, it scans more aggressively when the sum_util is low. [2] Compared to the exponential function, it is easier to calculate. [3] It seems that there is no accurate mapping between the sum of util_avg and the number of CPUs to be scanned. Use heuristic scan for now. For a platform with 112 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 112 111 108 102 93 81 65 47 25 1 0 ... For a platform with 16 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 16 15 15 14 13 11 9 6 3 0 0 ... Furthermore, to minimize the overhead of calculating the metrics in select_idle_cpu(), borrow the statistics from periodic load balance. As mentioned by Abel, on a platform with 112 CPUs per LLC, the sum_util calculated by periodic load balance after 112 ms would decay to about 0.5 * 0.5 * 0.5 * 0.7 = 8.75%, thus bringing a delay in reflecting the latest utilization. But it is a trade-off. Checking the util_avg in newidle load balance would be more frequent, but it brings overhead - multiple CPUs write/read the per-LLC shared variable and introduces cache contention. Tim also mentioned that, it is allowed to be non-optimal in terms of scheduling for the short-term variations, but if there is a long-term trend in the load behavior, the scheduler can adjust for that. When SIS_UTIL is enabled, the select_idle_cpu() uses the nr_scan calculated by SIS_UTIL instead of the one from SIS_PROP. As Peter and Mel suggested, SIS_UTIL should be enabled by default. This patch is based on the util_avg, which is very sensitive to the CPU frequency invariance. There is an issue that, when the max frequency has been clamp, the util_avg would decay insanely fast when the CPU is idle. Commit addca285120b ("cpufreq: intel_pstate: Handle no_turbo in frequency invariance") could be used to mitigate this symptom, by adjusting the arch_max_freq_ratio when turbo is disabled. But this issue is still not thoroughly fixed, because the current code is unaware of the user-specified max CPU frequency. [Test result] netperf and tbench were launched with 25% 50% 75% 100% 125% 150% 175% 200% of CPU number respectively. Hackbench and schbench were launched by 1, 2 ,4, 8 groups. Each test lasts for 100 seconds and repeats 3 times. The following is the benchmark result comparison between baseline:vanilla v5.19-rc1 and compare:patched kernel. Positive compare% indicates better performance. Each netperf test is a: netperf -4 -H 127.0.1 -t TCP/UDP_RR -c -C -l 100 netperf.throughput ======= case load baseline(std%) compare%( std%) TCP_RR 28 threads 1.00 ( 0.34) -0.16 ( 0.40) TCP_RR 56 threads 1.00 ( 0.19) -0.02 ( 0.20) TCP_RR 84 threads 1.00 ( 0.39) -0.47 ( 0.40) TCP_RR 112 threads 1.00 ( 0.21) -0.66 ( 0.22) TCP_RR 140 threads 1.00 ( 0.19) -0.69 ( 0.19) TCP_RR 168 threads 1.00 ( 0.18) -0.48 ( 0.18) TCP_RR 196 threads 1.00 ( 0.16) +194.70 ( 16.43) TCP_RR 224 threads 1.00 ( 0.16) +197.30 ( 7.85) UDP_RR 28 threads 1.00 ( 0.37) +0.35 ( 0.33) UDP_RR 56 threads 1.00 ( 11.18) -0.32 ( 0.21) UDP_RR 84 threads 1.00 ( 1.46) -0.98 ( 0.32) UDP_RR 112 threads 1.00 ( 28.85) -2.48 ( 19.61) UDP_RR 140 threads 1.00 ( 0.70) -0.71 ( 14.04) UDP_RR 168 threads 1.00 ( 14.33) -0.26 ( 11.16) UDP_RR 196 threads 1.00 ( 12.92) +186.92 ( 20.93) UDP_RR 224 threads 1.00 ( 11.74) +196.79 ( 18.62) Take the 224 threads as an example, the SIS search metrics changes are illustrated below: vanilla patched 4544492 +237.5% 15338634 sched_debug.cpu.sis_domain_search.avg 38539 +39686.8% 15333634 sched_debug.cpu.sis_failed.avg 128300000 -87.9% 15551326 sched_debug.cpu.sis_scanned.avg 5842896 +162.7% 15347978 sched_debug.cpu.sis_search.avg There is -87.9% less CPU scans after patched, which indicates lower overhead. Besides, with this patch applied, there is -13% less rq lock contention in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested .try_to_wake_up.default_wake_function.woken_wake_function. This might help explain the performance improvement - Because this patch allows the waking task to remain on the previous CPU, rather than grabbing other CPUs' lock. Each hackbench test is a: hackbench -g $job --process/threads --pipe/sockets -l 1000000 -s 100 hackbench.throughput ========= case load baseline(std%) compare%( std%) process-pipe 1 group 1.00 ( 1.29) +0.57 ( 0.47) process-pipe 2 groups 1.00 ( 0.27) +0.77 ( 0.81) process-pipe 4 groups 1.00 ( 0.26) +1.17 ( 0.02) process-pipe 8 groups 1.00 ( 0.15) -4.79 ( 0.02) process-sockets 1 group 1.00 ( 0.63) -0.92 ( 0.13) process-sockets 2 groups 1.00 ( 0.03) -0.83 ( 0.14) process-sockets 4 groups 1.00 ( 0.40) +5.20 ( 0.26) process-sockets 8 groups 1.00 ( 0.04) +3.52 ( 0.03) threads-pipe 1 group 1.00 ( 1.28) +0.07 ( 0.14) threads-pipe 2 groups 1.00 ( 0.22) -0.49 ( 0.74) threads-pipe 4 groups 1.00 ( 0.05) +1.88 ( 0.13) threads-pipe 8 groups 1.00 ( 0.09) -4.90 ( 0.06) threads-sockets 1 group 1.00 ( 0.25) -0.70 ( 0.53) threads-sockets 2 groups 1.00 ( 0.10) -0.63 ( 0.26) threads-sockets 4 groups 1.00 ( 0.19) +11.92 ( 0.24) threads-sockets 8 groups 1.00 ( 0.08) +4.31 ( 0.11) Each tbench test is a: tbench -t 100 $job 127.0.0.1 tbench.throughput ====== case load baseline(std%) compare%( std%) loopback 28 threads 1.00 ( 0.06) -0.14 ( 0.09) loopback 56 threads 1.00 ( 0.03) -0.04 ( 0.17) loopback 84 threads 1.00 ( 0.05) +0.36 ( 0.13) loopback 112 threads 1.00 ( 0.03) +0.51 ( 0.03) loopback 140 threads 1.00 ( 0.02) -1.67 ( 0.19) loopback 168 threads 1.00 ( 0.38) +1.27 ( 0.27) loopback 196 threads 1.00 ( 0.11) +1.34 ( 0.17) loopback 224 threads 1.00 ( 0.11) +1.67 ( 0.22) Each schbench test is a: schbench -m $job -t 28 -r 100 -s 30000 -c 30000 schbench.latency_90%_us ======== case load baseline(std%) compare%( std%) normal 1 mthread 1.00 ( 31.22) -7.36 ( 20.25)* normal 2 mthreads 1.00 ( 2.45) -0.48 ( 1.79) normal 4 mthreads 1.00 ( 1.69) +0.45 ( 0.64) normal 8 mthreads 1.00 ( 5.47) +9.81 ( 14.28) Consider the Standard Deviation, this -7.36% regression might not be valid. Also, a OLTP workload with a commercial RDBMS has been tested, and there is no significant change. There were concerns that unbalanced tasks among CPUs would cause problems. For example, suppose the LLC domain is composed of 8 CPUs, and 7 tasks are bound to CPU0~CPU6, while CPU7 is idle: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 util_avg 1024 1024 1024 1024 1024 1024 1024 0 Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%, select_idle_cpu() will not scan, thus CPU7 is undetected during scan. But according to Mel, it is unlikely the CPU7 will be idle all the time because CPU7 could pull some tasks via CPU_NEWLY_IDLE. lkp(kernel test robot) has reported a regression on stress-ng.sock on a very busy system. According to the sched_debug statistics, it might be caused by SIS_UTIL terminates the scan and chooses a previous CPU earlier, and this might introduce more context switch, especially involuntary preemption, which impacts a busy stress-ng. This regression has shown that, not all benchmarks in every scenario benefit from idle CPU scan limit, and it needs further investigation. Besides, there is slight regression in hackbench's 16 groups case when the LLC domain has 16 CPUs. Prateek mentioned that we should scan aggressively in an LLC domain with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is negligible. The current patch aims to propose a generic solution and only considers the util_avg. Something like the below could be applied on top of the current patch to fulfill the requirement: if (llc_weight <= 16) nr_scan = nr_scan 32 / llc_weight; For LLC domain with 16 CPUs, the nr_scan will be expanded to 2 times large. The smaller the CPU number this LLC domain has, the larger nr_scan will be expanded. This needs further investigation. There is also ongoing work[2] from Abel to filter out the busy CPUs during wakeup, to further speed up the idle CPU scan. And it could be a following-up optimization on top of this change. Suggested-by: Tim Chen <tim.c.chen@intel.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Mohini Narkhede <mohini.narkhede@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20220612163428.849378-1-yu.c.chen@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Jan Kara	96b18d3a1b	ext2: Add more validity checks for inode counts [ Upstream commit fa78f336937240d1bc598db817d638086060e7e9 ] Add checks verifying number of inodes stored in the superblock matches the number computed from number of inodes per group. Also verify we have at least one block worth of inodes per group. This prevents crashes on corrupted filesystems. Reported-by: syzbot+d273f7d7f58afd93be48@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Catalin Marinas	add4bc9281	arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" [ Upstream commit 20794545c14692094a882d2221c251c4573e6adf ] This reverts commit e5b8d9218951e59df986f627ec93569a0d22149b. Pages mapped in user-space with PROT_MTE have the allocation tags either zeroed or copied/restored to some user values. In order for the kernel to access such pages via page_address(), resetting the tag in page->flags was necessary. This tag resetting was deferred to set_pte_at() -> mte_sync_page_tags() but it can race with another CPU reading the flags (via page_to_virt()): P0 (mte_sync_page_tags): P1 (memcpy from virt_to_page): Rflags!=0xff Wflags=0xff DMB (doesn't help) Wtags=0 Rtags=0 // fault Since now the post_alloc_hook() function resets the page->flags tag when unpoisoning is skipped for user pages (including the __GFP_ZEROTAGS case), revert the arm64 commit calling page_kasan_tag_reset(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20220610152141.2148929-5-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
haibinzhang (张海斌)	cc69ef9598	arm64: fix oops in concurrently setting insn_emulation sysctls [ Upstream commit af483947d472eccb79e42059276c4deed76f99a6 ] emulation_proc_handler() changes table->data for proc_dointvec_minmax and can generate the following Oops if called concurrently with itself: \| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 \| Internal error: Oops: 96000006 [#1] SMP \| Call trace: \| update_insn_emulation_mode+0xc0/0x148 \| emulation_proc_handler+0x64/0xb8 \| proc_sys_call_handler+0x9c/0xf8 \| proc_sys_write+0x18/0x20 \| __vfs_write+0x20/0x48 \| vfs_write+0xe4/0x1d0 \| ksys_write+0x70/0xf8 \| __arm64_sys_write+0x20/0x28 \| el0_svc_common.constprop.0+0x7c/0x1c0 \| el0_svc_handler+0x2c/0xa0 \| el0_svc+0x8/0x200 To fix this issue, keep the table->data as &insn->current_mode and use container_of() to retrieve the insn pointer. Another mutex is used to protect against the current_mode update but not for retrieving insn_emulation as table->data is no longer changing. Co-developed-by: hewenliang <hewenliang4@huawei.com> Signed-off-by: hewenliang <hewenliang4@huawei.com> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220128090324.2727688-1-hewenliang4@huawei.com Link: https://lore.kernel.org/r/9A004C03-250B-46C5-BF39-782D7551B00E@tencent.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Francis Laniel	42eede3ae0	arm64: Do not forget syscall when starting a new thread. [ Upstream commit de6921856f99c11d3986c6702d851e1328d4f7f6 ] Enable tracing of the execve*() system calls with the syscalls:sys_exit_execve tracepoint by removing the call to forget_syscall() when starting a new thread and preserving the value of regs->syscallno across exec. Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com> Link: https://lore.kernel.org/r/20220608162447.666494-2-flaniel@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Mark Rutland	d1e812beae	arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic [ Upstream commit 4510bffb4d0246cdcc1f14c7367c026b807a862d ] On most architectures, IRQ flag tracing is disabled in NMI context, and architectures need to define and select TRACE_IRQFLAGS_NMI_SUPPORT in order to enable this. Commit: 859d069ee1ddd878 ("lockdep: Prepare for NMI IRQ state tracking") Permitted IRQ flag tracing in NMI context, allowing lockdep to work in NMI context where an architecture had suitable entry logic. At the time, most architectures did not have such suitable entry logic, and this broke lockdep on such architectures. Thus, this was partially disabled in commit: ed00495333ccc80f ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs") ... with architectures needing to select TRACE_IRQFLAGS_NMI_SUPPORT to enable IRQ flag tracing in NMI context. Currently TRACE_IRQFLAGS_NMI_SUPPORT is defined under arch/x86/Kconfig.debug. Move it to arch/Kconfig so architectures can select it without having to provide their own definition. Since the regular TRACE_IRQFLAGS_SUPPORT is selected by arch/x86/Kconfig, the select of TRACE_IRQFLAGS_NMI_SUPPORT is moved there too. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220511131733.4074499-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Wyes Karny	932b5e6524	x86: Handle idle=nomwait cmdline properly for x86_idle [ Upstream commit 8bcedb4ce04750e1ccc9a6b6433387f6a9166a56 ] When kernel is booted with idle=nomwait do not use MWAIT as the default idle state. If the user boots the kernel with idle=nomwait, it is a clear direction to not use mwait as the default idle state. However, the current code does not take this into consideration while selecting the default idle state on x86. Fix it by checking for the idle=nomwait boot option in prefer_mwait_c1_over_halt(). Also update the documentation around idle=nomwait appropriately. [ dhansen: tweak commit message ] Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Benjamin Segall	9ccb5d391c	epoll: autoremove wakers even more aggressively commit a16ceb13961068f7209e34d7984f8e42d2c06159 upstream. If a process is killed or otherwise exits while having active network connections and many threads waiting on epoll_wait, the threads will all be woken immediately, but not removed from ep->wq. Then when network traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will not remove the entries from the list. This means that the cost of the wakeup attempt is far higher than usual, does not decrease, and this also competes with the dying threads trying to actually make progress and remove themselves from the wq. Handle this by removing visited epoll wq entries unconditionally, rather than only when the wakeup succeeds - the structure of ep_poll means that the only potential loss is the timed_out->eavail heuristic, which now can race and result in a redundant ep_send_events attempt. (But only when incoming data and a timeout actually race, not on every timeout) Shakeel added: : We are seeing this issue in production with real workloads and it has : caused hard lockups. Particularly network heavy workloads with a lot : of threads in epoll_wait() can easily trigger this issue if they get : killed (oom-killed in our case). Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com Signed-off-by: Ben Segall <bsegall@google.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Heiher <r@hev.cc> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Florian Westphal	8a2df34b5b	netfilter: nf_tables: fix null deref due to zeroed list head commit 580077855a40741cf511766129702d97ff02f4d9 upstream. In nf_tables_updtable, if nf_tables_table_enable returns an error, nft_trans_destroy is called to free the transaction object. nft_trans_destroy() calls list_del(), but the transaction was never placed on a list -- the list head is all zeroes, this results in a null dereference: BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59 Call Trace: nft_trans_destroy+0x26/0x59 nf_tables_newtable+0x4bc/0x9bc [..] Its sane to assume that nft_trans_destroy() can be called on the transaction object returned by nft_trans_alloc(), so make sure the list head is initialised. Fixes: 55dd6f93076b ("netfilter: nf_tables: use new transaction infrastructure to handle table") Reported-by: mingi cho <mgcho.minic@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Thadeu Lima de Souza Cascardo	257b944464	netfilter: nf_tables: do not allow RULE_ID to refer to another chain commit 36d5b2913219ac853908b0f1c664345e04313856 upstream. When doing lookups for rules on the same batch by using its ID, a rule from a different chain can be used. If a rule is added to a chain but tries to be positioned next to a rule from a different chain, it will be linked to chain2, but the use counter on chain1 would be the one to be incremented. When looking for rules by ID, use the chain that was used for the lookup by name. The chain used in the context copied to the transaction needs to match that same chain. That way, struct nft_rule does not need to get enlarged with another member. Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute") Fixes: 75dd48e2e420 ("netfilter: nf_tables: Support RULE_ID reference in new rule") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Thadeu Lima de Souza Cascardo	9150151301	netfilter: nf_tables: do not allow CHAIN_ID to refer to another table commit 95f466d22364a33d183509629d0879885b4f547e upstream. When doing lookups for chains on the same batch by using its ID, a chain from a different table can be used. If a rule is added to a table but refers to a chain in a different table, it will be linked to the chain in table2, but would have expressions referring to objects in table1. Then, when table1 is removed, the rule will not be removed as its linked to a chain in table2. When expressions in the rule are processed or removed, that will lead to a use-after-free. When looking for chains by ID, use the table that was used for the lookup by name, and only return chains belonging to that same table. Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Thadeu Lima de Souza Cascardo	faafd9286f	netfilter: nf_tables: do not allow SET_ID to refer to another table commit 470ee20e069a6d05ae549f7d0ef2bdbcee6a81b2 upstream. When doing lookups for sets on the same batch by using its ID, a set from a different table can be used. Then, when the table is removed, a reference to the set may be kept after the set is freed, leading to a potential use-after-free. When looking for sets by ID, use the table that was used for the lookup by name, and only return sets belonging to that same table. This fixes CVE-2022-2586, also reported as ZDI-CAN-17470. Reported-by: Team Orca of Sea Security (@seasecresponse) Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Michael Grzeschik	5ea18ddc17	usb: dwc3: gadget: fix high speed multiplier setting commit 8affe37c525d800a2628c4ecfaed13b77dc5634a upstream. For High-Speed Transfers the prepare_one_trb function is calculating the multiplier setting for the trb based on the length parameter of the trb currently prepared. This assumption is wrong. For trbs with a sg list, the length of the actual request has to be taken instead. Fixes: 40d829fb2ec6 ("usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets") Cc: stable <stable@kernel.org> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Link: https://lore.kernel.org/r/20220704141812.1532306-3-m.grzeschik@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Michael Grzeschik	332a8c027a	usb: dwc3: gadget: refactor dwc3_repare_one_trb commit 23385cec5f354794dadced7f28c31da7ae3eb54c upstream. The function __dwc3_prepare_one_trb has many parameters. Since it is only used in dwc3_prepare_one_trb there is no point in keeping the function. We merge both functions and get rid of the big list of parameters. Fixes: 40d829fb2ec6 ("usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets") Cc: stable <stable@kernel.org> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Link: https://lore.kernel.org/r/20220704141812.1532306-2-m.grzeschik@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:59 +02:00
Kunihiko Hayashi	f0782cf2dc	arm64: dts: uniphier: Fix USB interrupts for PXs3 SoC commit fe17b91a7777df140d0f1433991da67ba658796c upstream. An interrupt for USB device are shared with USB host. Set interrupt-names property to common "dwc_usb3" instead of "host" and "peripheral". Cc: stable@vger.kernel.org Fixes: d7b9beb830d7 ("arm64: dts: uniphier: Add USB3 controller nodes") Reported-by: Ryuta NAKANISHI <nakanishi.ryuta@socionext.com> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Kunihiko Hayashi	148a7fe3cd	ARM: dts: uniphier: Fix USB interrupts for PXs2 SoC commit 9b0dc7abb5cc43a2dbf90690c3c6011dcadc574d upstream. An interrupt for USB device are shared with USB host. Set interrupt-names property to common "dwc_usb3" instead of "host" and "peripheral". Cc: stable@vger.kernel.org Fixes: 45be1573ad19 ("ARM: dts: uniphier: Add USB3 controller nodes") Reported-by: Ryuta NAKANISHI <nakanishi.ryuta@socionext.com> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Weitao Wang	b76ea430e9	USB: HCD: Fix URB giveback issue in tasklet function commit 26c6c2f8a907c9e3a2f24990552a4d77235791e6 upstream. Usb core introduce the mechanism of giveback of URB in tasklet context to reduce hardware interrupt handling time. On some test situation(such as FIO with 4KB block size), when tasklet callback function called to giveback URB, interrupt handler add URB node to the bh->head list also. If check bh->head list again after finish all URB giveback of local_list, then it may introduce a "dynamic balance" between giveback URB and add URB to bh->head list. This tasklet callback function may not exit for a long time, which will cause other tasklet function calls to be delayed. Some real-time applications(such as KB and Mouse) will see noticeable lag. In order to prevent the tasklet function from occupying the cpu for a long time at a time, new URBS will not be added to the local_list even though the bh->head list is not empty. But also need to ensure the left URB giveback to be processed in time, so add a member high_prio for structure giveback_urb_bh to prioritize tasklet and schelule this tasklet again if bh->head list is not empty. At the same time, we are able to prioritize tasklet through structure member high_prio. So, replace the local high_prio_bh variable with this structure member in usb_hcd_giveback_urb. Fixes: 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet context") Cc: stable <stable@kernel.org> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com> Link: https://lore.kernel.org/r/20220726074918.5114-1-WeitaoWang-oc@zhaoxin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Linyu Yuan	269c917837	usb: typec: ucsi: Acknowledge the GET_ERROR_STATUS command completion commit a7dc438b5e446afcd1b3b6651da28271400722f2 upstream. We found PPM will not send any notification after it report error status and OPM issue GET_ERROR_STATUS command to read the details about error. According UCSI spec, PPM may clear the Error Status Data after the OPM has acknowledged the command completion. This change add operation to acknowledge the command completion from PPM. Fixes: bdc62f2bae8f (usb: typec: ucsi: Simplified registration and I/O API) Cc: <stable@vger.kernel.org> # 5.10 Signed-off-by: Jack Pham <quic_jackp@quicinc.com> Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com> Link: https://lore.kernel.org/r/1658817949-4632-1-git-send-email-quic_linyyuan@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Suzuki K Poulose	b49b29ee11	coresight: Clear the connection field properly commit 2af89ebacf299b7fba5f3087d35e8a286ec33706 upstream. coresight devices track their connections (output connections) and hold a reference to the fwnode. When a device goes away, we walk through the devices on the coresight bus and make sure that the references are dropped. This happens both ways: a) For all output connections from the device, drop the reference to the target device via coresight_release_platform_data() b) Iterate over all the devices on the coresight bus and drop the reference to fwnode if this device is the target of the output connection, via coresight_remove_conns()->coresight_remove_match(). However, the coresight_remove_match() doesn't clear the fwnode field, after dropping the reference, this causes use-after-free and additional refcount drops on the fwnode. e.g., if we have two devices, A and B, with a connection, A -> B. If we remove B first, B would clear the reference on B, from A via coresight_remove_match(). But when A is removed, it still has a connection with fwnode still pointing to B. Thus it tries to drops the reference in coresight_release_platform_data(), raising the bells like : [ 91.990153] ------------[ cut here ]------------ [ 91.990163] refcount_t: addition on 0; use-after-free. [ 91.990212] WARNING: CPU: 0 PID: 461 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144 [ 91.990260] Modules linked in: coresight_funnel coresight_replicator coresight_etm4x(-) crct10dif_ce coresight ip_tables x_tables ipv6 [last unloaded: coresight_cpu_debug] [ 91.990398] CPU: 0 PID: 461 Comm: rmmod Tainted: G W T 5.19.0-rc2+ #53 [ 91.990418] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019 [ 91.990434] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 91.990454] pc : refcount_warn_saturate+0xa0/0x144 [ 91.990476] lr : refcount_warn_saturate+0xa0/0x144 [ 91.990496] sp : ffff80000c843640 [ 91.990509] x29: ffff80000c843640 x28: ffff800009957c28 x27: ffff80000c8439a8 [ 91.990560] x26: ffff00097eff1990 x25: ffff8000092b6ad8 x24: ffff00097eff19a8 [ 91.990610] x23: ffff80000c8439a8 x22: 0000000000000000 x21: ffff80000c8439c2 [ 91.990659] x20: 0000000000000000 x19: ffff00097eff1a10 x18: ffff80000ab99c40 [ 91.990708] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000abf6fa0 [ 91.990756] x14: 000000000000001d x13: 0a2e656572662d72 x12: 657466612d657375 [ 91.990805] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffff8000081aba28 [ 91.990854] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572 [ 91.990903] x5 : ffff00097648ec58 x4 : 0000000000000000 x3 : 0000000000000027 [ 91.990952] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080260ba00 [ 91.991000] Call trace: [ 91.991012] refcount_warn_saturate+0xa0/0x144 [ 91.991034] kobject_get+0xac/0xb0 [ 91.991055] of_node_get+0x2c/0x40 [ 91.991076] of_fwnode_get+0x40/0x60 [ 91.991094] fwnode_handle_get+0x3c/0x60 [ 91.991116] fwnode_get_nth_parent+0xf4/0x110 [ 91.991137] fwnode_full_name_string+0x48/0xc0 [ 91.991158] device_node_string+0x41c/0x530 [ 91.991178] pointer+0x320/0x3ec [ 91.991198] vsnprintf+0x23c/0x750 [ 91.991217] vprintk_store+0x104/0x4b0 [ 91.991238] vprintk_emit+0x8c/0x360 [ 91.991257] vprintk_default+0x44/0x50 [ 91.991276] vprintk+0xcc/0xf0 [ 91.991295] _printk+0x68/0x90 [ 91.991315] of_node_release+0x13c/0x14c [ 91.991334] kobject_put+0x98/0x114 [ 91.991354] of_node_put+0x24/0x34 [ 91.991372] of_fwnode_put+0x40/0x5c [ 91.991390] fwnode_handle_put+0x38/0x50 [ 91.991411] coresight_release_platform_data+0x74/0xb0 [coresight] [ 91.991472] coresight_unregister+0x64/0xcc [coresight] [ 91.991525] etm4_remove_dev+0x64/0x78 [coresight_etm4x] [ 91.991563] etm4_remove_amba+0x1c/0x2c [coresight_etm4x] [ 91.991598] amba_remove+0x3c/0x19c Reproducible by: (Build all coresight components as modules): #!/bin/sh while true do for m in tmc stm cpu_debug etm4x replicator funnel do modprobe coresight_${m} done for m in tmc stm cpu_debug etm4x replicator funnel do rmmode coresight_${m} done done Cc: stable@vger.kernel.org Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: 37ea1ffddffa ("coresight: Use fwnode handle instead of device names") Link: https://lore.kernel.org/r/20220614214024.3005275-1-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Huacai Chen	e41db8a9ce	MIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK commit e1a534f5d074db45ae5cbac41d8912b98e96a006 upstream. When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected, cpu_max_bits_warn() generates a runtime warning similar as below while we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) instead of NR_CPUS to iterate CPUs. [ 3.052463] ------------[ cut here ]------------ [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 [ 3.070072] Modules linked in: efivarfs autofs4 [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 [ 3.084034] Hardware name: Loongson Loongson-3A4000-7A1000-1w-V0.1-CRB/Loongson-LS3A4000-7A1000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V2.0.04082-beta7 04/27 [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c [ 3.195868] ... [ 3.199917] Call Trace: [ 3.203941] [<98000000002086d8>] show_stack+0x38/0x14c [ 3.210666] [<9800000000cf846c>] dump_stack_lvl+0x60/0x88 [ 3.217625] [<980000000023d268>] __warn+0xd0/0x100 [ 3.223958] [<9800000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc [ 3.231150] [<9800000000210220>] show_cpuinfo+0x5e8/0x5f0 [ 3.238080] [<98000000004f578c>] seq_read_iter+0x354/0x4b4 [ 3.245098] [<98000000004c2e90>] new_sync_read+0x17c/0x1c4 [ 3.252114] [<98000000004c5174>] vfs_read+0x138/0x1d0 [ 3.258694] [<98000000004c55f8>] ksys_read+0x70/0x100 [ 3.265265] [<9800000000cfde9c>] do_syscall+0x7c/0x94 [ 3.271820] [<9800000000202fe4>] handle_syscall+0xc4/0x160 [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Michael Ellerman	db68d474cf	powerpc/powernv: Avoid crashing if rng is NULL commit 90b5d4fe0b3ba7f589c6723c6bfb559d9e83956a upstream. On a bare-metal Power8 system that doesn't have an "ibm,power-rng", a malicious QEMU and guest that ignore the absence of the KVM_CAP_PPC_HWRNG flag, and calls H_RANDOM anyway, will dereference a NULL pointer. In practice all Power8 machines have an "ibm,power-rng", but let's not rely on that, add a NULL check and early return in powernv_get_random_real_mode(). Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220727143219.2684192-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Christophe Leroy	552a29e471	powerpc/ptdump: Fix display of RW pages on FSL_BOOK3E commit dd8de84b57b02ba9c1fe530a6d916c0853f136bd upstream. On FSL_BOOK3E, _PAGE_RW is defined with two bits, one for user and one for supervisor. As soon as one of the two bits is set, the page has to be display as RW. But the way it is implemented today requires both bits to be set in order to display it as RW. Instead of display RW when _PAGE_RW bits are set and R otherwise, reverse the logic and display R when _PAGE_RW bits are all 0 and RW otherwise. This change has no impact on other platforms as _PAGE_RW is a single bit on all of them. Fixes: 8eb07b187000 ("powerpc/mm: Dump linux pagetables") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0c33b96317811edf691e81698aaee8fa45ec3449.1656427391.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:58 +02:00
Pali Rohár	79da7a5f8f	powerpc/fsl-pci: Fix Class Code of PCIe Root Port commit 0c551abfa004ce154d487d91777bf221c808a64f upstream. By default old pre-3.0 Freescale PCIe controllers reports invalid PCI Class Code 0x0b20 for PCIe Root Port. It can be seen by lspci -b output on P2020 board which has this pre-3.0 controller: $ lspci -bvnn 00:00.0 Power PC [0b20]: Freescale Semiconductor Inc P2020E [1957:0070] (rev 21) !!! Invalid class 0b20 for header type 01 Capabilities: [4c] Express Root Port (Slot-), MSI 00 Fix this issue by programming correct PCI Class Code 0x0604 for PCIe Root Port to the Freescale specific PCIe register 0x474. With this change lspci -b output is: $ lspci -bvnn 00:00.0 PCI bridge [0604]: Freescale Semiconductor Inc P2020E [1957:0070] (rev 21) (prog-if 00 [Normal decode]) Capabilities: [4c] Express Root Port (Slot-), MSI 00 Without any "Invalid class" error. So class code was properly reflected into standard (read-only) PCI register 0x08. Same fix is already implemented in U-Boot pcie_fsl.c driver in commit: `d18d06ac35` Fix activated by U-Boot stay active also after booting Linux kernel. But boards which use older U-Boot version without that fix are affected and still require this fix. So implement this class code fix also in kernel fsl_pci.c driver. Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220706101043.4867-1-pali@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Pali Rohár	fdf7590591	PCI: Add defines for normal and subtractive PCI bridges commit 904b10fb189cc15376e9bfce1ef0282e68b0b004 upstream. Add these PCI class codes to pci_ids.h: PCI_CLASS_BRIDGE_PCI_NORMAL PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE Use these defines in all kernel code for describing PCI class codes for normal and subtractive PCI bridges. [bhelgaas: similar change in pci-mvebu.c] Link: https://lore.kernel.org/r/20220214114109.26809-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Guenter Roeck <linux@roeck-us.net>a Cc: Naresh Kamboju <naresh.kamboju@linaro.org> [ gregkh - take only the pci_ids.h portion for stable backports ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Alexander Lobakin	ae6620a397	ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() commit e5a16a5c4602c119262f350274021f90465f479d upstream. test_bit(), as any other bitmap op, takes `unsigned long ` as a second argument (pointer to the actual bitmap), as any bitmap itself is an array of unsigned longs. However, the ia64_get_irr() code passes a ref to `u64` as a second argument. This works with the ia64 bitops implementation due to that they have `void ` as the second argument and then cast it later on. This works with the bitmap API itself due to that `unsigned long` has the same size on ia64 as `u64` (`unsigned long long`), but from the compiler PoV those two are different. Define @irr as `unsigned long` to fix that. That implies no functional changes. Has been hidden for 16 years! Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations") Cc: stable@vger.kernel.org # 2.6.16+ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Xiaomeng Tong	74d6428453	media: [PATCH] pci: atomisp_cmd: fix three missing checks on list iterator commit 09b204eb9de9fdf07d028c41c4331b5cfeb70dd7 upstream. The three bugs are here: __func__, s3a_buf->s3a_data->exp_id); __func__, md_buf->metadata->exp_id); __func__, dis_buf->dis_data->exp_id); The list iterator 's3a_buf/md_buf/dis_buf' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, add an check. Use a new variable '_iter' as the list iterator, while use the old variable '_buf' as a dedicated pointer to point to the found element. Link: https://lore.kernel.org/linux-media/20220414041415.3342-1-xiam0nd.tong@gmail.com Cc: stable@vger.kernel.org Fixes: ad85094b293e4 ("Revert "media: staging: atomisp: Remove driver"") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Jan Kara	ddc7fadd05	mbcache: add functions to delete entry if unused commit 3dc96bba65f53daa217f0a8f43edad145286a8f5 upstream. Add function mb_cache_entry_delete_or_get() to delete mbcache entry if it is unused and also add a function to wait for entry to become unused - mb_cache_entry_wait_unused(). We do not share code between the two deleting function as one of them will go away soon. CC: stable@vger.kernel.org Fixes: 82939d7999df ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Jan Kara	1250557d3b	mbcache: don't reclaim used entries commit 58318914186c157477b978b1739dfe2f1b9dc0fe upstream. Do not reclaim entries that are currently used by somebody from a shrinker. Firstly, these entries are likely useful. Secondly, we will need to keep such entries to protect pending increment of xattr block refcount. CC: stable@vger.kernel.org Fixes: 82939d7999df ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Mikulas Patocka	0f4d18cbea	md-raid10: fix KASAN warning commit d17f744e883b2f8d13cca252d71cfe8ace346f7d upstream. There's a KASAN warning in raid10_remove_disk when running the lvm test lvconvert-raid-reshape.sh. We fix this warning by verifying that the value "number" is valid. BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10] Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682 CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? raid10_remove_disk+0x61/0x2a0 [raid10] kasan_report+0xa8/0xe0 ? raid10_remove_disk+0x61/0x2a0 [raid10] raid10_remove_disk+0x61/0x2a0 [raid10] Buffer I/O error on dev dm-76, logical block 15344, async page read ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0 remove_and_add_spares+0x367/0x8a0 [md_mod] ? super_written+0x1c0/0x1c0 [md_mod] ? mutex_trylock+0xac/0x120 ? _raw_spin_lock+0x72/0xc0 ? _raw_spin_lock_bh+0xc0/0xc0 md_check_recovery+0x848/0x960 [md_mod] raid10d+0xcf/0x3360 [raid10] ? sched_clock_cpu+0x185/0x1a0 ? rb_erase+0x4d4/0x620 ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid10_sync_request+0x36c0/0x36c0 [raid10] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? _raw_spin_unlock_irq+0x11/0x24 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 124495: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 setup_conf+0x140/0x5c0 [raid10] raid10_run+0x4cd/0x740 [raid10] md_run+0x6f9/0x1300 [md_mod] raid_ctr+0x2531/0x4ac0 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 L __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff889108f3d200 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 0 bytes to the right of 256-byte region [ffff889108f3d200, ffff889108f3d300) The buggy address belongs to the physical page: page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab\|head\|zone=2) raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:57 +02:00
Mikulas Patocka	c5e4cdd443	md-raid: destroy the bitmap after destroying the thread commit e151db8ecfb019b7da31d076130a794574c89f6f upstream. When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with kasan, we got failure in write_page. The reason for the failure is that md_bitmap_destroy is called before destroying the thread and the thread may be waiting in the function write_page for the bio to complete. When the thread finishes waiting, it executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which triggers the kasan warning. Note that the commit 48df498daf62 that caused this bug claims that it is neede for md-cluster, you should check md-cluster and possibly find another bugfix for it. BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod] Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539 CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? write_page+0x18d/0x680 [md_mod] kasan_report+0xa8/0xe0 ? write_page+0x18d/0x680 [md_mod] kasan_check_range+0x13f/0x180 write_page+0x18d/0x680 [md_mod] ? super_sync+0x4d5/0x560 [dm_raid] ? md_bitmap_file_kick+0xa0/0xa0 [md_mod] ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid] ? mutex_trylock+0x120/0x120 ? preempt_count_add+0x6b/0xc0 ? preempt_count_sub+0xf/0xc0 md_update_sb+0x707/0xe40 [md_mod] md_reap_sync_thread+0x1b2/0x4a0 [md_mod] md_check_recovery+0x533/0x960 [md_mod] raid1d+0xc8/0x2a20 [raid1] ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid1_end_read_request+0x2a0/0x2a0 [raid1] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 5522: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 md_bitmap_create+0xa8/0xe80 [md_mod] md_run+0x777/0x1300 [md_mod] raid_ctr+0x249c/0x4a30 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x40 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0xf7/0x140 kfree+0x80/0x240 md_bitmap_free+0x1c3/0x280 [md_mod] __md_stop+0x21/0x120 [md_mod] md_stop+0x9/0x40 [md_mod] raid_dtr+0x1b/0x40 [dm_raid] dm_table_destroy+0x98/0x1e0 [dm_mod] __dm_destroy+0x199/0x360 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Narendra Hadke	351ec3d68c	serial: mvebu-uart: uart2 error bits clearing commit a7209541239e5dd44d981289e5f9059222d40fd1 upstream. For mvebu uart2, error bits are not cleared on buffer read. This causes interrupt loop and system hang. Cc: stable@vger.kernel.org Reviewed-by: Yi Guo <yi.guo@cavium.com> Reviewed-by: Nadav Haklai <nadavh@marvell.com> Signed-off-by: Narendra Hadke <nhadke@marvell.com> Signed-off-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20220726091221.12358-1-pali@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Miklos Szeredi	ec8e701f9e	fuse: ioctl: translate ENOSYS commit 02c0cab8e7345b06f1c0838df444e2902e4138d3 upstream. Overlayfs may fail to complete updates when a filesystem lacks fileattr/xattr syscall support and responds with an ENOSYS error code, resulting in an unexpected "Function not implemented" error. This bug may occur with FUSE filesystems, such as davfs2. Steps to reproduce: # install davfs2, e.g., apk add davfs2 mkdir /test mkdir /test/lower /test/upper /test/work /test/mnt yes '' \| mount -t davfs -o ro http://some-web-dav-server/path \ /test/lower mount -t overlay -o upperdir=/test/upper,lowerdir=/test/lower \ -o workdir=/test/work overlay /test/mnt # when "some-file" exists in the lowerdir, this fails with "Function # not implemented", with dmesg showing "overlayfs: failed to retrieve # lower fileattr (/some-file, err=-38)" touch /test/mnt/some-file The underlying cause of this regresion is actually in FUSE, which fails to translate the ENOSYS error code returned by userspace filesystem (which means that the ioctl operation is not supported) to ENOTTY. Reported-by: Christian Kohlschütter <christian@kohlschutter.com> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Fixes: 59efec7b9039 ("fuse: implement ioctl support") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Miklos Szeredi	daa9cfb862	fuse: limit nsec commit 47912eaa061a6a81e4aa790591a1874c650733c0 upstream. Limit nanoseconds to 0..999999999. Fixes: d8a5ba45457e ("[PATCH] FUSE - core") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Namjae Jeon	a54c509c32	ksmbd: fix use-after-free bug in smb2_tree_disconect commit cf6531d98190fa2cf92a6d8bbc8af0a4740a223c upstream. smb2_tree_disconnect() freed the struct ksmbd_tree_connect, but it left the dangling pointer. It can be accessed again under compound requests. This bug can lead an oops looking something link: [ 1685.468014 ] BUG: KASAN: use-after-free in ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468068 ] Read of size 4 at addr ffff888102172180 by task kworker/1:2/4807 ... [ 1685.468130 ] Call Trace: [ 1685.468132 ] <TASK> [ 1685.468135 ] dump_stack_lvl+0x49/0x5f [ 1685.468141 ] print_report.cold+0x5e/0x5cf [ 1685.468145 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468157 ] kasan_report+0xaa/0x120 [ 1685.468194 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468206 ] __asan_report_load4_noabort+0x14/0x20 [ 1685.468210 ] ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468222 ] smb2_tree_disconnect+0x175/0x250 [ksmbd] [ 1685.468235 ] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1685.468247 ] process_one_work+0x778/0x11c0 [ 1685.468251 ] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1685.468289 ] worker_thread+0x544/0x1180 [ 1685.468293 ] ? __cpuidle_text_end+0x4/0x4 [ 1685.468297 ] kthread+0x282/0x320 [ 1685.468301 ] ? process_one_work+0x11c0/0x11c0 [ 1685.468305 ] ? kthread_complete_and_exit+0x30/0x30 [ 1685.468309 ] ret_from_fork+0x1f/0x30 Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17816 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Hyunchul Lee	5776196055	ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT commit 824d4f64c20093275f72fc8101394d75ff6a249e upstream. if Status is not 0 and PathLength is long, smb_strndup_from_utf16 could make out of bound read in smb2_tree_connnect. This bug can lead an oops looking something like: [ 1553.882047] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882064] Read of size 2 at addr ffff88802c4eda04 by task kworker/0:2/42805 ... [ 1553.882095] Call Trace: [ 1553.882098] <TASK> [ 1553.882101] dump_stack_lvl+0x49/0x5f [ 1553.882107] print_report.cold+0x5e/0x5cf [ 1553.882112] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882122] kasan_report+0xaa/0x120 [ 1553.882128] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882139] __asan_report_load_n_noabort+0xf/0x20 [ 1553.882143] smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882155] ? smb_strtoUTF16+0x3b0/0x3b0 [ksmbd] [ 1553.882166] ? __kmalloc_node+0x185/0x430 [ 1553.882171] smb2_tree_connect+0x140/0xab0 [ksmbd] [ 1553.882185] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1553.882197] process_one_work+0x778/0x11c0 [ 1553.882201] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1553.882206] worker_thread+0x544/0x1180 [ 1553.882209] ? __cpuidle_text_end+0x4/0x4 [ 1553.882214] kthread+0x282/0x320 [ 1553.882218] ? process_one_work+0x11c0/0x11c0 [ 1553.882221] ? kthread_complete_and_exit+0x30/0x30 [ 1553.882225] ret_from_fork+0x1f/0x30 [ 1553.882231] </TASK> There is no need to check error request validation in server. This check allow invalid requests not to validate message. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17818 Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Namjae Jeon	dd4e4c8118	ksmbd: fix memory leak in smb2_handle_negotiate commit aa7253c2393f6dcd6a1468b0792f6da76edad917 upstream. The allocated memory didn't free under an error path in smb2_handle_negotiate(). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17815 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Srinivas Kandagatla	dead7f484a	soundwire: qcom: Check device status before reading devid commit aa1262ca66957183ea1fb32a067e145b995f3744 upstream. As per hardware datasheet its recommended that we check the device status before reading devid assigned by auto-enumeration. Without this patch we see SoundWire devices with invalid enumeration addresses on the bus. Cc: stable@vger.kernel.org Fixes: a6e6581942ca ("soundwire: qcom: add auto enumeration support") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220706095644.5852-1-srinivas.kandagatla@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:56 +02:00
Bikash Hazarika	71bc3b75e9	scsi: qla2xxx: Zero undefined mailbox IN registers commit 6c96a3c7d49593ef15805f5e497601c87695abc9 upstream. While requesting a new mailbox command, driver does not write any data to unused registers. Initialize the unused register value to zero while requesting a new mailbox command to prevent stale entry access by firmware. Link: https://lore.kernel.org/r/20220713052045.10683-4-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:55 +02:00
Bikash Hazarika	a659c7f811	scsi: qla2xxx: Fix incorrect display of max frame size commit cf3b4fb655796674e605268bd4bfb47a47c8bce6 upstream. Replace display field with the correct field. Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com Fixes: 8777e4314d39 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine") Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:55 +02:00
Tony Battersby	8c004b7dbb	scsi: sg: Allow waiting for commands to complete on removed device commit 3455607fd7be10b449f5135c00dc306b85dc0d21 upstream. When a SCSI device is removed while in active use, currently sg will immediately return -ENODEV on any attempt to wait for active commands that were sent before the removal. This is problematic for commands that use SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel when userspace frees or reuses it after getting ENODEV, leading to corrupted userspace memory (in the case of READ-type commands) or corrupted data being sent to the device (in the case of WRITE-type commands). This has been seen in practice when logging out of a iscsi_tcp session, where the iSCSI driver may still be processing commands after the device has been marked for removal. Change the policy to allow userspace to wait for active sg commands even when the device is being removed. Return -ENODEV only when there are no more responses to read. Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com Cc: <stable@vger.kernel.org> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:55 +02:00

1 2 3 4 5 ...

1054068 Commits