linux

iv/linux

Author	SHA1	Message	Date
Bartosz Golaszewski	da3ccf5b20	rtc: rx8010: don't modify the global rtc ops commit d3b14296da69adb7825022f3224ac6137eb30abf upstream. The way the driver is implemented is buggy for the (admittedly unlikely) use case where there are two RTCs with one having an interrupt configured and the second not. This is caused by the fact that we use a global rtc_class_ops struct which we modify depending on whether the irq number is present or not. Fix it by using two const ops structs with and without alarm operations. While at it: not being able to request a configured interrupt is an error so don't ignore it and bail out of probe(). Fixes: ed13d89b08e3 ("rtc: Add Epson RX8010SJ RTC driver") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200914154601.32245-2-brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:33 +01:00
Krzysztof Kozlowski	e17afa6d1d	ia64: fix build error with !COREDUMP commit 7404840d87557c4092bf0272bce5e0354c774bf9 upstream. Fix linkage error when CONFIG_BINFMT_ELF is selected but CONFIG_COREDUMP is not: ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_phdrs': elfcore.c:(.text+0x172): undefined reference to `dump_emit' ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_data': elfcore.c:(.text+0x2b2): undefined reference to `dump_emit' Fixes: 1fcccbac89f5 ("elf coredump: replace ELF_CORE_EXTRA_* macros by functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200819064146.12529-1-krzk@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:33 +01:00
Zhihao Cheng	da3bb6fa23	ubi: check kthread_should_stop() after the setting of task state commit d005f8c6588efcfbe88099b6edafc6f58c84a9c1 upstream. A detach hung is possible when a race occurs between the detach process and the ubi background thread. The following sequences outline the race: ubi thread: if (list_empty(&ubi->works)... ubi detach: set_bit(KTHREAD_SHOULD_STOP, &kthread->flags) => by kthread_stop() wake_up_process() => ubi thread is still running, so 0 is returned ubi thread: set_current_state(TASK_INTERRUPTIBLE) schedule() => ubi thread will never be scheduled again ubi detach: wait_for_completion() => hung task! To fix that, we need to check kthread_should_stop() after we set the task state, so the ubi thread will either see the stop bit and exit or the task state is reset to runnable such that it isn't scheduled out indefinitely. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> Fixes: 801c135ce73d5df1ca ("UBI: Unsorted Block Images") Reported-by: syzbot+853639d0cb16c31c7a14@syzkaller.appspotmail.com Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:33 +01:00
Vineet Gupta	6d0beeebd1	ARC: perf: redo the pct irq missing in device-tree handling commit 8c42a5c02bec6c7eccf08957be3c6c8fccf9790b upstream. commit feb92d7d3813456c11dce21 "(ARC: perf: don't bail setup if pct irq missing in device-tree)" introduced a silly brown-paper bag bug: The assignment and comparison in an if statement were not bracketed correctly leaving the order of evaluation undefined. \| \| if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) { \| ^^^ ^^^^ And given such a chance, the compiler will bite you hard, fully entitled to generating this piece of beauty: \| \| # if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) { \| \| bl.d @platform_get_irq <-- irq returned in r0 \| \| setge r2, r0, 0 <-- r2 is bool 1 or 0 if irq >= 0 true/false \| brlt.d r0, 0, @.L114 \| \| st_s r2,[sp] <-- irq saved is bool 1 or 0, not actual return val \| st 1,[r3,160] # arc_pmu.18_29->irq <-- drops bool and assumes 1 \| \| # return __request_percpu_irq(irq, handler, 0, \| \| bl.d @__request_percpu_irq; \| mov_s r0,1 <-- drops even bool and assumes 1 which fails With the snafu fixed, everything is as expected. \| bl.d @platform_get_irq <-- returns irq in r0 \| \| mov_s r2,r0 \| brlt.d r2, 0, @.L112 \| \| st_s r0,[sp] <-- irq isaved is actual return value above \| st r0,[r13,160] #arc_pmu.18_27->irq \| \| bl.d @__request_percpu_irq <-- r0 unchanged so actual irq returned \| add r4,r4,r12 #, tmp363, __ptr Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:33 +01:00
Jiri Olsa	4688115958	perf python scripting: Fix printable strings in python3 scripts commit 6fcd5ddc3b1467b3586972ef785d0d926ae4cdf4 upstream. Hagen reported broken strings in python3 tracepoint scripts: make PYTHON=python3 perf record -e sched:sched_switch -a -- sleep 5 perf script --gen-script py perf script -s ./perf-script.py [..] sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'), The problem is in the is_printable_array function that does not take the zero byte into account and claim such string as not printable, so the code will create byte array instead of string. Committer testing: After this fix: sched__sched_switch 3 484522.497072626 1158680 kworker/3:0-eve prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120 Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A\|SNP N/A\|TLB N/A\|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0} sched__sched_switch 4 484522.497085610 1225814 perf prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0 Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A\|SNP N/A\|TLB N/A\|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0} Fixes: 249de6e07458 ("perf script python: Fix string vs byte array resolving") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200928201135.3633850-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Zhihao Cheng	a99cbd20a5	ubifs: mount_ubifs: Release authentication resource in error handling path commit e2a05cc7f8229e150243cdae40f2af9021d67a4a upstream. Release the authentication related resource in some error handling branches in mount_ubifs(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Zhihao Cheng	9ba6324ca9	ubifs: Don't parse authentication mount options in remount process commit bb674a4d4de1032837fcbf860a63939e66f0b7ad upstream. There is no need to dump authentication options while remounting, because authentication initialization can only be doing once in the first mount process. Dumping authentication mount options in remount process may cause memory leak if UBIFS has already been mounted with old authentication mount options. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Zhihao Cheng	748057df47	ubifs: Fix a memleak after dumping authentication mount options commit 47f6d9ce45b03a40c34b668a9884754c58122b39 upstream. Fix a memory leak after dumping authentication mount options in error handling branch. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> # 4.20+ Fixes: d8a22773a12c6d7 ("ubifs: Enable authentication support") Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Richard Weinberger	bc202c839b	ubifs: journal: Make sure to not dirty twice for auth nodes commit 78c7d49f55d8631b67c09f9bfbe8155211a9ea06 upstream. When removing the last reference of an inode the size of an auth node is already part of write_len. So we must not call ubifs_add_auth_dirt(). Call it only when needed. Cc: <stable@vger.kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Kristof Havasi <havasiefr@gmail.com> Fixes: 6a98bc4614de ("ubifs: Add authentication nodes to journal") Reported-and-tested-by: Kristof Havasi <havasiefr@gmail.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Zhihao Cheng	a779274697	ubifs: xattr: Fix some potential memory leaks while iterating entries commit f2aae745b82c842221f4f233051f9ac641790959 upstream. Fix some potential memory leaks in error handling branches while iterating xattr entries. For example, function ubifs_tnc_remove_ino() forgets to free pxent if it exists. Similar problems also exist in ubifs_purge_xattrs(), ubifs_add_orphan() and ubifs_jnl_write_inode(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Zhihao Cheng	213c836b23	ubifs: dent: Fix some potential memory leaks while iterating entries commit 58f6e78a65f1fcbf732f60a7478ccc99873ff3ba upstream. Fix some potential memory leaks in error handling branches while iterating dent entries. For example, function dbg_check_dir() forgets to free pdent if it exists. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:32 +01:00
Chuck Lever	c1ea3c4a43	NFSD: Add missing NFSv2 .pc_func methods commit 6b3dccd48de8a4c650b01499a0b09d1e2279649e upstream. There's no protection in nfsd_dispatch() against a NULL .pc_func helpers. A malicious NFS client can trigger a crash by invoking the unused/unsupported NFSv2 ROOT or WRITECACHE procedures. The current NFSD dispatcher does not support returning a void reply to a non-NULL procedure, so the reply to both of these is wrong, for the moment. Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Olga Kornievskaia	da86bb4c21	NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag commit 8c39076c276be0b31982e44654e2c2357473258a upstream. RFC 7862 introduced a new flag that either client or server is allowed to set: EXCHGID4_FLAG_SUPP_FENCE_OPS. Client needs to update its bitmask to allow for this flag value. v2: changed minor version argument to unsigned int Signed-off-by: Olga Kornievskaia <kolga@netapp.com> CC: <stable@vger.kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Benjamin Coddington	c342001cab	NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE commit b4868b44c5628995fdd8ef2e24dda73cef963a75 upstream. Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a CLOSE races with the update of the nfs_state: Process 1 Process 2 Server ========= ========= ======== OPEN file OPEN file Reply OPEN (1) Reply OPEN (2) Update state (1) CLOSE file (1) Reply OLD_STATEID (1) CLOSE file (2) Reply CLOSE (-1) Update state (2) wait for state change OPEN file wake CLOSE file OPEN file wake CLOSE file ... ... We can avoid this situation by not issuing an immediate retry with a bumped seqid when CLOSE/OPEN_DOWNGRADE receives NFS4ERR_OLD_STATEID. Instead, take the same approach used by OPEN and wait at least 5 seconds for outstanding stateid updates to complete if we can detect that we're out of sequence. Note that after this change it is still possible (though unlikely) that CLOSE waits a full 5 seconds, bumps the seqid, and retries -- and that attempt races with another OPEN at the same time. In order to avoid this race (which would result in the livelock), update nfs_need_update_open_stateid() to handle the case where: - the state is NFS_OPEN_STATE, and - the stateid doesn't match the current open stateid Finally, nfs_need_update_open_stateid() is modified to be idempotent and renamed to better suit the purpose of signaling that the stateid passed is the next stateid in sequence. Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Michael Neuling	415043c3ec	powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation commit 1da4a0272c5469169f78cd76cf175ff984f52f06 upstream. __get_user_atomic_128_aligned() stores to kaddr using stvx which is a VMX store instruction, hence kaddr must be 16 byte aligned otherwise the store won't occur as expected. Unfortunately when we call __get_user_atomic_128_aligned() in p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't guaranteed to be 16B aligned. This means that the write to vbuf in __get_user_atomic_128_aligned() has the bottom bits of the address truncated. This results in other local variables being overwritten. Also vbuf will not contain the correct data which results in the userspace emulation being wrong and hence undetected user data corruption. In the past we've been mostly lucky as vbuf has ended up aligned but this is fragile and isn't always true. CONFIG_STACKPROTECTOR in particular can change the stack arrangement enough that our luck runs out. This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal. The fix is to align vbuf to a 16 byte boundary. Fixes: 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Christophe Leroy	94e27f1369	powerpc/powermac: Fix low_sleep_handler with KUAP and KUEP commit 2c637d2df4ee4830e9d3eb2bd5412250522ce96e upstream. low_sleep_handler() has an hardcoded restore of segment registers that doesn't take KUAP and KUEP into account. Use head_32's load_segment_registers() routine instead. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Fixes: 31ed2b13c48d ("powerpc/32s: Implement Kernel Userspace Execution Prevention.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/21b05f7298c1b18f73e6e5b4cd5005aafa24b6da.1599820109.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Mahesh Salgaonkar	61ed8c1b94	powerpc/powernv/elog: Fix race while processing OPAL error log event. commit aea948bb80b478ddc2448f7359d574387521a52d upstream. Every error log reported by OPAL is exported to userspace through a sysfs interface and notified using kobject_uevent(). The userspace daemon (opal_errd) then reads the error log and acknowledges the error log is saved safely to disk. Once acknowledged the kernel removes the respective sysfs file entry causing respective resources to be released including kobject. However it's possible the userspace daemon may already be scanning elog entries when a new sysfs elog entry is created by the kernel. User daemon may read this new entry and ack it even before kernel can notify userspace about it through kobject_uevent() call. If that happens then we have a potential race between elog_ack_store->kobject_put() and kobject_uevent which can lead to use-after-free of a kernfs object resulting in a kernel crash. eg: BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6bfb Faulting instruction address: 0xc0000000008ff2a0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV CPU: 27 PID: 805 Comm: irq/29-opal-elo Not tainted 5.9.0-rc2-gcc-8.2.0-00214-g6f56a67bcbb5-dirty #363 ... NIP kobject_uevent_env+0xa0/0x910 LR elog_event+0x1f4/0x2d0 Call Trace: 0x5deadbeef0000122 (unreliable) elog_event+0x1f4/0x2d0 irq_thread_fn+0x4c/0xc0 irq_thread+0x1c0/0x2b0 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x6c This patch fixes this race by protecting the sysfs file creation/notification by holding a reference count on kobject until we safely send kobject_uevent(). The function create_elog_obj() returns the elog object which if used by caller function will end up in use-after-free problem again. However, the return value of create_elog_obj() function isn't being used today and there is no need as well. Hence change it to return void to make this fix complete. Fixes: 774fea1a38c6 ("powerpc/powernv: Read OPAL error log and export it through sysfs") Cc: stable@vger.kernel.org # v3.15+ Reported-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Rework the logic to use a single return, reword comments, add oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201006122051.190176-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Aneesh Kumar K.V	7850dd0851	powerpc/memhotplug: Make lmb size 64bit commit 301d2ea6572386245c5d2d2dc85c3b5a737b85ac upstream. Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic") make sure different variables tracking lmb_size are updated to be 64 bit. This was found by code audit. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-3-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Joel Stanley	3fa03b7f21	powerpc: Warn about use of smt_snooze_delay commit a02f6d42357acf6e5de6ffc728e6e77faf3ad217 upstream. It's not done anything for a long time. Save the percpu variable, and emit a warning to remind users to not expect it to do anything. This uses pr_warn_once instead of pr_warn_ratelimit as testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning to be noisy, as the online/offline loop is slow. Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.") Cc: stable@vger.kernel.org # v3.14 Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902000012.3440389-1-joel@jms.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:31 +01:00
Andrew Donnellan	240baebeda	powerpc/rtas: Restrict RTAS requests from userspace commit bd59380c5ba4147dcbaad3e582b55ccfd120b764 upstream. A number of userspace utilities depend on making calls to RTAS to retrieve information and update various things. The existing API through which we expose RTAS to userspace exposes more RTAS functionality than we actually need, through the sys_rtas syscall, which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they want with arbitrary arguments. Many RTAS calls take the address of a buffer as an argument, and it's up to the caller to specify the physical address of the buffer as an argument. We allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can access, and then expose the physical address and size of this buffer in /proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address, poke at the buffer using /dev/mem, and pass an address in the RMO buffer to the RTAS call. However, there's nothing stopping the caller from specifying whatever address they want in the RTAS call, and it's easy to construct a series of RTAS calls that can overwrite arbitrary bytes (even without /dev/mem access). Additionally, there are some RTAS calls that do potentially dangerous things and for which there are no legitimate userspace use cases. In the past, this would not have been a particularly big deal as it was assumed that root could modify all system state freely, but with Secure Boot and lockdown we need to care about this. We can't fundamentally change the ABI at this point, however we can address this by implementing a filter that checks RTAS calls against a list of permitted calls and forces the caller to use addresses within the RMO buffer. The list is based off the list of calls that are used by the librtas userspace library, and has been tested with a number of existing userspace RTAS utilities. For compatibility with any applications we are not aware of that require other calls, the filter can be turned off at build time. Cc: stable@vger.kernel.org Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820044512.7543-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Sven Schnelle	551bf7c4bc	s390/stp: add locking to sysfs functions commit b3bd02495cb339124f13135d51940cf48d83e5cb upstream. The sysfs function might race with stp_work_fn. To prevent that, add the required locking. Another issue is that the sysfs functions are checking the stp_online flag, but this flag just holds the user setting whether STP is enabled. Add a flag to clock_sync_flag whether stp_info holds valid data and use that instead. Cc: stable@vger.kernel.org Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Maciej W. Rozycki	58a7dc5f52	MIPS: DEC: Restore bootmem reservation for firmware working memory area commit cf3af0a4d3b62ab48e0b90180ea161d0f5d4953f upstream. Fix a crash on DEC platforms starting with: VFS: Mounted root (nfs filesystem) on device 0:11. Freeing unused PROM memory: 124k freed BUG: Bad page state in process swapper pfn:00001 page:(ptrval) refcount:0 mapcount:-128 mapping:00000000 index:0x1 pfn:0x1 flags: 0x0() raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffff7f 00000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-00858-g865c50e1d279 #1 Stack : 8065dc48 0000000b 8065d2b8 9bc27dcc 80645bfc 9bc259a4 806a1b97 80703124 80710000 8064a900 00000001 80099574 806b116c 1000ec00 9bc27d88 806a6f30 00000000 00000000 80645bfc 00000000 31232039 80706ba4 2e392e35 8039f348 2d383538 00000070 0000000a 35363867 00000000 806c2830 80710000 806b0000 80710000 8064a900 00000001 81000000 00000000 00000000 8035af2c 80700000 ... Call Trace: [<8004bc5c>] show_stack+0x34/0x104 [<8015675c>] bad_page+0xfc/0x128 [<80157714>] free_pcppages_bulk+0x1f4/0x5dc [<801591cc>] free_unref_page+0xc0/0x130 [<8015cb04>] free_reserved_area+0x144/0x1d8 [<805abd78>] kernel_init+0x20/0x100 [<80046070>] ret_from_kernel_thread+0x14/0x1c Disabling lock debugging due to kernel taint caused by an attempt to free bootmem space that as from commit b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") has not been anymore reserved due to the removal of generic MIPS arch code that used to reserve all the memory from the beginning of RAM up to the kernel load address. This memory does need to be reserved on DEC platforms however as it is used by REX firmware as working area, as per the TURBOchannel firmware specification[1]: Table 2-2 REX Memory Regions ------------------------------------------------------------------------- Starting Ending Region Address Address Use ------------------------------------------------------------------------- 0 0xa0000000 0xa000ffff Restart block, exception vectors, REX stack and bss 1 0xa0010000 0xa0017fff Keyboard or tty drivers 2 0xa0018000 0xa001f3ff 1) CRT driver 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects 4 0xa0020000 0xa002ffff 64KB scratch space ------------------------------------------------------------------------- 1) Note that the last 3 Kbytes of region 2 are reserved for backward compatibility with previous system software. ------------------------------------------------------------------------- (this table uses KSEG2 unmapped virtual addresses, which in the MIPS architecture are offset from physical addresses by a fixed value of 0xa0000000 and therefore the regions referred do correspond to the beginning of the physical address space) and we call into the firmware on several occasions throughout the bootstrap process. It is believed that pre-REX firmware used with non-TURBOchannel DEC platforms has the same requirements, as hinted by note #1 cited. Recreate the discarded reservation then, in DEC platform code, removing the crash. References: [1] "TURBOchannel Firmware Specification", On-line version, EK-TCAAD-FS-004, Digital Equipment Corporation, January 1993, Chapter 2 "System Module Firmware", p. 2-5 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Fixes: b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	2020-11-05 11:43:30 +01:00
Aneesh Kumar K.V	73597ab2a9	powerpc/drmem: Make lmb_size 64 bit commit ec72024e35dddb88a81e40071c87ceb18b5ee835 upstream. Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic") make sure different variables tracking lmb_size are updated to be 64 bit. This was found by code audit. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-2-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Jonathan Cameron	829c0a9634	iio:gyro:itg3200: Fix timestamp alignment and prevent data leak. commit 10ab7cfd5522f0041028556dac864a003e158556 upstream. One of a class of bugs pointed out by Lars in a recent review. iio_push_to_buffers_with_timestamp assumes the buffer used is aligned to the size of the timestamp (8 bytes). This is not guaranteed in this driver which uses a 16 byte array of smaller elements on the stack. This is fixed by using an explicit c structure. As there are no holes in the structure, there is no possiblity of data leakage in this case. The explicit alignment of ts is not strictly necessary but potentially makes the code slightly less fragile. It also removes the possibility of this being cut and paste into another driver where the alignment isn't already true. Fixes: 36e0371e7764 ("iio:itg3200: Use iio_push_to_buffers_with_timestamp()") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200722155103.979802-6-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Jonathan Cameron	9f4f75df4b	iio:adc:ti-adc12138 Fix alignment issue with timestamp commit 293e809b2e8e608b65a949101aaf7c0bd1224247 upstream. One of a class of bugs pointed out by Lars in a recent review. iio_push_to_buffers_with_timestamp assumes the buffer used is aligned to the size of the timestamp (8 bytes). This is not guaranteed in this driver which uses an array of smaller elements on the stack. We move to a suitable structure in the iio_priv() data with alignment explicitly requested. This data is allocated with kzalloc so no data can leak apart from previous readings. Note that previously no leak at all could occur, but previous readings should never be a problem. In this case the timestamp location depends on what other channels are enabled. As such we can't use a structure without misleading by suggesting only one possible timestamp location. Fixes: 50a6edb1b6e0 ("iio: adc: add ADC12130/ADC12132/ADC12138 ADC driver") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200722155103.979802-26-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Jonathan Cameron	96a5134423	iio:adc:ti-adc0832 Fix alignment issue with timestamp commit 39e91f3be4cba51c1560bcda3a343ed1f64dc916 upstream. One of a class of bugs pointed out by Lars in a recent review. iio_push_to_buffers_with_timestamp assumes the buffer used is aligned to the size of the timestamp (8 bytes). This is not guaranteed in this driver which uses an array of smaller elements on the stack. We fix this issues by moving to a suitable structure in the iio_priv() data with alignment explicitly requested. This data is allocated with kzalloc so no data can leak apart from previous readings. Note that previously no data could leak 'including' previous readings but I don't think it is an issue to potentially leak them like this now does. In this case the postioning of the timestamp is depends on what other channels are enabled. As such we cannot use a structure to make the alignment explicit as it would be missleading by suggesting only one possible location for the timestamp. Fixes: 815bbc87462a ("iio: ti-adc0832: add triggered buffer support") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200722155103.979802-25-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Tobias Jordan	a8c59abdbc	iio: adc: gyroadc: fix leak of device node iterator commit da4410d4078ba4ead9d6f1027d6db77c5a74ecee upstream. Add missing of_node_put calls when exiting the for_each_child_of_node loop in rcar_gyroadc_parse_subdevs early. Also add goto-exception handling for the error paths in that loop. Fixes: 059c53b32329 ("iio: adc: Add Renesas GyroADC driver") Signed-off-by: Tobias Jordan <kernel@cdqe.de> Link: https://lore.kernel.org/r/20200926161946.GA10240@agrajag.zerfleddert.de Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:30 +01:00
Jonathan Cameron	ad877be5b9	iio:light:si1145: Fix timestamp alignment and prevent data leak. commit 0456ecf34d466261970e0ff92b2b9c78a4908637 upstream. One of a class of bugs pointed out by Lars in a recent review. iio_push_to_buffers_with_timestamp assumes the buffer used is aligned to the size of the timestamp (8 bytes). This is not guaranteed in this driver which uses a 24 byte array of smaller elements on the stack. As Lars also noted this anti pattern can involve a leak of data to userspace and that indeed can happen here. We close both issues by moving to a suitable array in the iio_priv() data with alignment explicitly requested. This data is allocated with kzalloc so no data can leak appart from previous readings. Depending on the enabled channels, the location of the timestamp can be at various aligned offsets through the buffer. As such we any use of a structure to enforce this alignment would incorrectly suggest a single location for the timestamp. Comments adjusted to express this clearly in the code. Fixes: ac45e57f1590 ("iio: light: Add driver for Silabs si1132, si1141/2/3 and si1145/6/7 ambient light, uv index and proximity sensors") Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200722155103.979802-9-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Paul Cercueil	a4f02a81c7	dmaengine: dma-jz4780: Fix race in jz4780_dma_tx_status commit baf6fd97b16ea8f981b8a8b04039596f32fc2972 upstream. The jz4780_dma_tx_status() function would check if a channel's cookie state was set to 'completed', and if not, it would enter the critical section. However, in that time frame, the jz4780_dma_chan_irq() function was able to set the cookie to 'completed', and clear the jzchan->vchan pointer, which was deferenced in the critical section of the first function. Fix this race by checking the channel's cookie state after entering the critical function and not before. Fixes: d894fc6046fe ("dmaengine: jz4780: add driver for the Ingenic JZ4780 DMA controller") Cc: stable@vger.kernel.org # v4.0 Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reported-by: Artur Rojek <contact@artur-rojek.eu> Tested-by: Artur Rojek <contact@artur-rojek.eu> Link: https://lore.kernel.org/r/20201004140307.885556-1-paul@crapouillou.net Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Jan Kara	f707ccb2f1	udf: Fix memory leak when mounting commit a7be300de800e755714c71103ae4a0d205e41e99 upstream. udf_process_sequence() allocates temporary array for processing partition descriptors on volume which it fails to free. Free the array when it is not needed anymore. Fixes: 7b78fd02fb19 ("udf: Fix handling of Partition Descriptors") CC: stable@vger.kernel.org Reported-by: syzbot+128f4dd6e796c98b3760@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Jason Gerecke	93da9dcee2	HID: wacom: Avoid entering wacom_wac_pen_report for pad / battery commit d9216d753b2b1406b801243b12aaf00a5ce5b861 upstream. It has recently been reported that the "heartbeat" report from devices like the 2nd-gen Intuos Pro (PTH-460, PTH-660, PTH-860) or the 2nd-gen Bluetooth-enabled Intuos tablets (CTL-4100WL, CTL-6100WL) can cause the driver to send a spurious BTN_TOUCH=0 once per second in the middle of drawing. This can result in broken lines while drawing on Chrome OS. The source of the issue has been traced back to a change which modified the driver to only call `wacom_wac_pad_report()` once per report instead of once per collection. As part of this change, pad-handling code was removed from `wacom_wac_collection()` under the assumption that the `WACOM_PEN_FIELD` and `WACOM_TOUCH_FIELD` checks would not be satisfied when a pad or battery collection was being processed. To be clear, the macros `WACOM_PAD_FIELD` and `WACOM_PEN_FIELD` do not currently check exclusive conditions. In fact, most "pad" fields will also appear to be "pen" fields simply due to their presence inside of a Digitizer application collection. Because of this, the removal of the check from `wacom_wac_collection()` just causes pad / battery collections to instead trigger a call to `wacom_wac_pen_report()` instead. The pen report function in turn resets the tip switch state just prior to exiting, resulting in the observed BTN_TOUCH=0 symptom. To correct this, we restore a version of the `WACOM_PAD_FIELD` check in `wacom_wac_collection()` and return early. This effectively prevents pad / battery collections from being reported until the very end of the report as originally intended. Fixes: d4b8efeb46d9 ("HID: wacom: generic: Correct pad syncing") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Tested-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Jiri Slaby	87d398f348	vt: keyboard, extend func_buf_lock to readers commit 82e61c3909db51d91b9d3e2071557b6435018b80 upstream. Both read-side users of func_table/func_buf need locking. Without that, one can easily confuse the code by repeatedly setting altering strings like: while (1) for (a = 0; a < 2; a++) { struct kbsentry kbs = {}; strcpy((char *)kbs.kb_string, a ? ".\n" : "88888\n"); ioctl(fd, KDSKBSENT, &kbs); } When that program runs, one can get unexpected output by holding F1 (note the unxpected period on the last line): . 88888 .8888 So protect all accesses to 'func_table' (and func_buf) by preexisting 'func_buf_lock'. It is easy in 'k_fn' handler as 'puts_queue' is expected not to sleep. On the other hand, KDGKBSENT needs a local (atomic) copy of the string because copy_to_user can sleep. Use already allocated, but unused 'kbs->kb_string' for that purpose. Note that the program above needs at least CAP_SYS_TTY_CONFIG. This depends on the previous patch and on the func_buf_lock lock added in commit 46ca3f735f34 (tty/vt: fix write/write race in ioctl(KDSKBSENT) handler) in 5.2. Likely fixes CVE-2020-25656. Cc: <stable@vger.kernel.org> Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20201019085517.10176-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Jiri Slaby	eb4c460e2e	vt: keyboard, simplify vt_kdgkbsent commit 6ca03f90527e499dd5e32d6522909e2ad390896b upstream. Use 'strlen' of the string, add one for NUL terminator and simply do 'copy_to_user' instead of the explicit 'for' loop. This makes the KDGKBSENT case more compact. The only thing we need to take care about is NULL 'func_table[i]'. Use an empty string in that case. The original check for overflow could never trigger as the func_buf strings are always shorter or equal to 'struct kbsentry's. Cc: <stable@vger.kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20201019085517.10176-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Chris Wilson	8c16ca6006	drm/i915: Force VT'd workarounds when running as a guest OS commit 8195400f7ea95399f721ad21f4d663a62c65036f upstream. If i915.ko is being used as a passthrough device, it does not know if the host is using intel_iommu. Mixing the iommu and gfx causes a few issues (such as scanout overfetch) which we need to workaround inside the driver, so if we detect we are running under a hypervisor, also assume the device access is being virtualised. Reported-by: Stefan Fritsch <sf@sfritsch.de> Suggested-by: Stefan Fritsch <sf@sfritsch.de> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Stefan Fritsch <sf@sfritsch.de> Cc: stable@vger.kernel.org Tested-by: Stefan Fritsch <sf@sfritsch.de> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019101523.4145-1-chris@chris-wilson.co.uk (cherry picked from commit f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Ran Wang	94478c1dc5	usb: host: fsl-mph-dr-of: check return of dma_set_mask() commit 3cd54a618834430a26a648d880dd83d740f2ae30 upstream. fsl_usb2_device_register() should stop init if dma_set_mask() return error. Fixes: cae058610465 ("drivers/usb/host: fsl: Set DMA_MASK of usb platform device") Reviewed-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Ran Wang <ran.wang_1@nxp.com> Link: https://lore.kernel.org/r/20201010060308.33693-1-ran.wang_1@nxp.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:29 +01:00
Li Jun	75d0d4ff59	usb: typec: tcpm: reset hard_reset_count for any disconnect commit 2d9c6442a9c81f4f8dee678d0b3c183173ab1e2d upstream. Current tcpm_detach() only reset hard_reset_count if port->attached is true, this may cause this counter clear is missed if the CC disconnect event is generated after tcpm_port_reset() is done by other events, e.g. VBUS off comes first before CC disconect for a power sink, in that case the first tcpm_detach() will only clear port->attached flag but leave hard_reset_count there because tcpm_port_is_disconnected() is still false, then later tcpm_detach() by CC disconnect will directly return due to port->attached is cleared, finally this will result tcpm will not try hard reset or error recovery for later attach. ChiYuan reported this issue on his platform with below tcpm trace: After power sink session setup after hard reset 2 times, detach from the power source and then attach: [ 4848.046358] VBUS off [ 4848.046384] state change SNK_READY -> SNK_UNATTACHED [ 4848.050908] Setting voltage/current limit 0 mV 0 mA [ 4848.050936] polarity 0 [ 4848.052593] Requesting mux state 0, usb-role 0, orientation 0 [ 4848.053222] Start toggling [ 4848.086500] state change SNK_UNATTACHED -> TOGGLING [ 4848.089983] CC1: 0 -> 0, CC2: 3 -> 3 [state TOGGLING, polarity 0, connected] [ 4848.089993] state change TOGGLING -> SNK_ATTACH_WAIT [ 4848.090031] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms [ 4848.141162] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_ATTACH_WAIT, polarity 0, disconnected] [ 4848.141170] state change SNK_ATTACH_WAIT -> SNK_ATTACH_WAIT [ 4848.141184] pending state change SNK_ATTACH_WAIT -> SNK_UNATTACHED @20 ms [ 4848.163156] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [delayed 20 ms] [ 4848.163162] Start toggling [ 4848.216918] CC1: 0 -> 0, CC2: 0 -> 3 [state TOGGLING, polarity 0, connected] [ 4848.216954] state change TOGGLING -> SNK_ATTACH_WAIT [ 4848.217080] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms [ 4848.231771] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_ATTACH_WAIT, polarity 0, disconnected] [ 4848.231800] state change SNK_ATTACH_WAIT -> SNK_ATTACH_WAIT [ 4848.231857] pending state change SNK_ATTACH_WAIT -> SNK_UNATTACHED @20 ms [ 4848.256022] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [delayed20 ms] [ 4848.256049] Start toggling [ 4848.871148] VBUS on [ 4848.885324] CC1: 0 -> 0, CC2: 0 -> 3 [state TOGGLING, polarity 0, connected] [ 4848.885372] state change TOGGLING -> SNK_ATTACH_WAIT [ 4848.885548] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms [ 4849.088240] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed200 ms] [ 4849.088284] state change SNK_DEBOUNCED -> SNK_ATTACHED [ 4849.088291] polarity 1 [ 4849.088769] Requesting mux state 1, usb-role 2, orientation 2 [ 4849.088895] state change SNK_ATTACHED -> SNK_STARTUP [ 4849.088907] state change SNK_STARTUP -> SNK_DISCOVERY [ 4849.088915] Setting voltage/current limit 5000 mV 0 mA [ 4849.088927] vbus=0 charge:=1 [ 4849.090505] state change SNK_DISCOVERY -> SNK_WAIT_CAPABILITIES [ 4849.090828] pending state change SNK_WAIT_CAPABILITIES -> SNK_READY @240 ms [ 4849.335878] state change SNK_WAIT_CAPABILITIES -> SNK_READY [delayed240 ms] this patch fix this issue by clear hard_reset_count at any cases of cc disconnect, í.e. don't check port->attached flag. Fixes: 4b4e02c83167 ("typec: tcpm: Move out of staging") Cc: stable@vger.kernel.org Reported-and-tested-by: ChiYuan Huang <cy_huang@richtek.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Li Jun <jun.li@nxp.com> Link: https://lore.kernel.org/r/1602500592-3817-1-git-send-email-jun.li@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Jerome Brunet	543432d078	usb: cdc-acm: fix cooldown mechanism commit 38203b8385bf6283537162bde7d499f830964711 upstream. Commit a4e7279cd1d1 ("cdc-acm: introduce a cool down") is causing regression if there is some USB error, such as -EPROTO. This has been reported on some samples of the Odroid-N2 using the Combee II Zibgee USB dongle. > struct acm *acm = container_of(work, struct acm, work) is incorrect in case of a delayed work and causes warnings, usually from the workqueue: > WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1474 __queue_work+0x480/0x528. When this happens, USB eventually stops working completely after a while. Also the ACM_ERROR_DELAY bit is never set, so the cooldown mechanism previously introduced cannot be triggered and acm_submit_read_urb() is never called. This changes makes the cdc-acm driver use a single delayed work, fixing the pointer arithmetic in acm_softint() and set the ACM_ERROR_DELAY when the cooldown mechanism appear to be needed. Fixes: a4e7279cd1d1 ("cdc-acm: introduce a cool down") Cc: Oliver Neukum <oneukum@suse.com> Reported-by: Pascal Vizeli <pascal.vizeli@nabucasa.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20201019170702.150534-1-jbrunet@baylibre.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Thinh Nguyen	2850f148cd	usb: dwc3: gadget: END_TRANSFER before CLEAR_STALL command commit d97c78a1908e59a1fdbcbece87cd0440b5d7a1f2 upstream. According the programming guide (for all DWC3 IPs), when the driver handles ClearFeature(halt) request, it should issue CLEAR_STALL command _after_ the END_TRANSFER command completes. The END_TRANSFER command may take some time to complete. So, delay the ClearFeature(halt) request control status stage and wait for END_TRANSFER command completion interrupt. Only after END_TRANSFER command completes that the driver may issue CLEAR_STALL command. Cc: stable@vger.kernel.org Fixes: cb11ea56f37a ("usb: dwc3: gadget: Properly handle ClearFeature(halt)") Signed-off-by: Thinh Nguyen <thinhn@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Thinh Nguyen	206dcd6ce8	usb: dwc3: gadget: Resume pending requests after CLEAR_STALL commit c503672abe1348f10f5a54a662336358c6e1a297 upstream. The function driver may queue new requests right after halting the endpoint (i.e. queue new requests while the endpoint is stalled). There's no restriction preventing it from doing so. However, dwc3 currently drops those requests after CLEAR_STALL. The driver should only drop started requests. Keep the pending requests in the pending list to resume and process them after the host issues ClearFeature(Halt) to the endpoint. Cc: stable@vger.kernel.org Fixes: cb11ea56f37a ("usb: dwc3: gadget: Properly handle ClearFeature(halt)") Signed-off-by: Thinh Nguyen <thinhn@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Li Jun	97224cdc04	usb: dwc3: core: don't trigger runtime pm when remove driver commit 266d0493900ac5d6a21cdbe6b1624ed2da94d47a upstream. No need to trigger runtime pm in driver removal, otherwise if user disable auto suspend via sys file, runtime suspend may be entered, which will call dwc3_core_exit() again and there will be clock disable not balance warning: [ 2026.820154] xhci-hcd xhci-hcd.0.auto: remove, state 4 [ 2026.825268] usb usb2: USB disconnect, device number 1 [ 2026.831017] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered [ 2026.836806] xhci-hcd xhci-hcd.0.auto: remove, state 4 [ 2026.842029] usb usb1: USB disconnect, device number 1 [ 2026.848029] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered [ 2026.865889] ------------[ cut here ]------------ [ 2026.870506] usb2_ctrl_root_clk already disabled [ 2026.875082] WARNING: CPU: 0 PID: 731 at drivers/clk/clk.c:958 clk_core_disable+0xa0/0xa8 [ 2026.883170] Modules linked in: dwc3(-) phy_fsl_imx8mq_usb [last unloaded: dwc3] [ 2026.890488] CPU: 0 PID: 731 Comm: rmmod Not tainted 5.8.0-rc7-00280-g9d08cca-dirty #245 [ 2026.898489] Hardware name: NXP i.MX8MQ EVK (DT) [ 2026.903020] pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--) [ 2026.908594] pc : clk_core_disable+0xa0/0xa8 [ 2026.912777] lr : clk_core_disable+0xa0/0xa8 [ 2026.916958] sp : ffff8000121b39a0 [ 2026.920271] x29: ffff8000121b39a0 x28: ffff0000b11f3700 [ 2026.925583] x27: 0000000000000000 x26: ffff0000b539c700 [ 2026.930895] x25: 000001d7e44e1232 x24: ffff0000b76fa800 [ 2026.936208] x23: ffff0000b76fa6f8 x22: ffff800008d01040 [ 2026.941520] x21: ffff0000b539ce00 x20: ffff0000b7105000 [ 2026.946832] x19: ffff0000b7105000 x18: 0000000000000010 [ 2026.952144] x17: 0000000000000001 x16: 0000000000000000 [ 2026.957456] x15: ffff0000b11f3b70 x14: ffffffffffffffff [ 2026.962768] x13: ffff8000921b36f7 x12: ffff8000121b36ff [ 2026.968080] x11: ffff8000119e1000 x10: ffff800011bf26d0 [ 2026.973392] x9 : 0000000000000000 x8 : ffff800011bf3000 [ 2026.978704] x7 : ffff800010695d68 x6 : 0000000000000252 [ 2026.984016] x5 : ffff0000bb9881f0 x4 : 0000000000000000 [ 2026.989327] x3 : 0000000000000027 x2 : 0000000000000023 [ 2026.994639] x1 : ac2fa471aa7cab00 x0 : 0000000000000000 [ 2026.999951] Call trace: [ 2027.002401] clk_core_disable+0xa0/0xa8 [ 2027.006238] clk_core_disable_lock+0x20/0x38 [ 2027.010508] clk_disable+0x1c/0x28 [ 2027.013911] clk_bulk_disable+0x34/0x50 [ 2027.017758] dwc3_core_exit+0xec/0x110 [dwc3] [ 2027.022122] dwc3_suspend_common+0x84/0x188 [dwc3] [ 2027.026919] dwc3_runtime_suspend+0x74/0x9c [dwc3] [ 2027.031712] pm_generic_runtime_suspend+0x28/0x40 [ 2027.036419] genpd_runtime_suspend+0xa0/0x258 [ 2027.040777] __rpm_callback+0x88/0x140 [ 2027.044526] rpm_callback+0x20/0x80 [ 2027.048015] rpm_suspend+0xd0/0x418 [ 2027.051503] __pm_runtime_suspend+0x58/0xa0 [ 2027.055693] dwc3_runtime_idle+0x7c/0x90 [dwc3] [ 2027.060224] __rpm_callback+0x88/0x140 [ 2027.063973] rpm_idle+0x78/0x150 [ 2027.067201] __pm_runtime_idle+0x58/0xa0 [ 2027.071130] dwc3_remove+0x64/0xc0 [dwc3] [ 2027.075140] platform_drv_remove+0x28/0x48 [ 2027.079239] device_release_driver_internal+0xf4/0x1c0 [ 2027.084377] driver_detach+0x4c/0xd8 [ 2027.087954] bus_remove_driver+0x54/0xa8 [ 2027.091877] driver_unregister+0x2c/0x58 [ 2027.095799] platform_driver_unregister+0x10/0x18 [ 2027.100509] dwc3_driver_exit+0x14/0x1408 [dwc3] [ 2027.105129] __arm64_sys_delete_module+0x178/0x218 [ 2027.109922] el0_svc_common.constprop.0+0x68/0x160 [ 2027.114714] do_el0_svc+0x20/0x80 [ 2027.118031] el0_sync_handler+0x88/0x190 [ 2027.121953] el0_sync+0x140/0x180 [ 2027.125267] ---[ end trace 027f4f8189958f1f ]--- [ 2027.129976] ------------[ cut here ]------------ Fixes: fc8bb91bc83e ("usb: dwc3: implement runtime PM") Cc: <stable@vger.kernel.org> Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Li Jun	726f638e7c	usb: dwc3: core: add phy cleanup for probe error handling commit 03c1fd622f72c7624c81b64fdba4a567ae5ee9cb upstream. Add the phy cleanup if dwc3 mode init fail, which is the missing part of de-init for dwc3 core init. Fixes: c499ff71ff2a ("usb: dwc3: core: re-factor init and exit paths") Cc: <stable@vger.kernel.org> Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Thinh Nguyen	f935b70cf7	usb: dwc3: gadget: Check MPS of the request length commit ca3df3468eec87f6374662f7de425bc44c3810c1 upstream. When preparing for SG, not all the entries are prepared at once. When resume, don't use the remaining request length to calculate for MPS alignment. Use the entire request->length to do that. Cc: stable@vger.kernel.org Fixes: 5d187c0454ef ("usb: dwc3: gadget: Don't setup more than requested") Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:28 +01:00
Thinh Nguyen	1c9e86c933	usb: dwc3: ep0: Fix ZLP for OUT ep0 requests commit 66706077dc89c66a4777a4c6298273816afb848c upstream. The current ZLP handling for ep0 requests is only for control IN requests. For OUT direction, DWC3 needs to check and setup for MPS alignment. Usually, control OUT requests can indicate its transfer size via the wLength field of the control message. So usb_request->zero is usually not needed for OUT direction. To handle ZLP OUT for control endpoint, make sure the TRB is MPS size. Cc: stable@vger.kernel.org Fixes: c7fcdeb2627c ("usb: dwc3: ep0: simplify EP0 state machine") Fixes: d6e5a549cc4d ("usb: dwc3: simplify ZLP handling") Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Raymond Tan	3468cbceb5	usb: dwc3: pci: Allow Elkhart Lake to utilize DSM method for PM functionality commit a609ce2a13360d639b384b6ca783b38c1247f2db upstream. Similar to some other IA platforms, Elkhart Lake too depends on the PMU register write to request transition of Dx power state. Thus, we add the PCI_DEVICE_ID_INTEL_EHLLP to the list of devices that shall execute the ACPI _DSM method during D0/D3 sequence. [heikki.krogerus@linux.intel.com: included Fixes tag] Fixes: dbb0569de852 ("usb: dwc3: pci: Add Support for Intel Elkhart Lake Devices") Cc: stable@vger.kernel.org Signed-off-by: Raymond Tan <raymond.tan@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Sandeep Singh	2600a131e1	usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC commit 2a632815683d2d34df52b701a36fe5ac6654e719 upstream. On some platform of AMD, S3 fails with HCE and SRE errors. To fix this, need to disable a bit which is enable in sparse controller. Cc: stable@vger.kernel.org #v4.19+ Signed-off-by: Sanket Goswami <Sanket.Goswami@amd.com> Signed-off-by: Sandeep Singh <sandeep.singh@amd.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20201028203124.375344-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Filipe Manana	c964d386e8	btrfs: fix readahead hang and use-after-free after removing a device commit 66d204a16c94f24ad08290a7663ab67e7fc04e82 upstream. Very sporadically I had test case btrfs/069 from fstests hanging (for years, it is not a recent regression), with the following traces in dmesg/syslog: [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0 [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds. [162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000 [162513.514751] Call Trace: [162513.514761] __schedule+0x5ce/0xd00 [162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.514771] schedule+0x46/0xf0 [162513.514844] wait_current_trans+0xde/0x140 [btrfs] [162513.514850] ? finish_wait+0x90/0x90 [162513.514864] start_transaction+0x37c/0x5f0 [btrfs] [162513.514879] transaction_kthread+0xa4/0x170 [btrfs] [162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs] [162513.514894] kthread+0x153/0x170 [162513.514897] ? kthread_stop+0x2c0/0x2c0 [162513.514902] ret_from_fork+0x22/0x30 [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds. [162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000 [162513.515682] Call Trace: [162513.515688] __schedule+0x5ce/0xd00 [162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.515697] schedule+0x46/0xf0 [162513.515712] wait_current_trans+0xde/0x140 [btrfs] [162513.515716] ? finish_wait+0x90/0x90 [162513.515729] start_transaction+0x37c/0x5f0 [btrfs] [162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.515758] ? __ia32_sys_fdatasync+0x20/0x20 [162513.515761] iterate_supers+0x87/0xf0 [162513.515765] ksys_sync+0x60/0xb0 [162513.515768] __do_sys_sync+0xa/0x10 [162513.515771] do_syscall_64+0x33/0x80 [162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.515781] RIP: 0033:0x7f5238f50bd7 [162513.515782] Code: Bad RIP value. [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0 [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds. [162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000 [162513.516620] Call Trace: [162513.516625] __schedule+0x5ce/0xd00 [162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.516634] schedule+0x46/0xf0 [162513.516647] wait_current_trans+0xde/0x140 [btrfs] [162513.516650] ? finish_wait+0x90/0x90 [162513.516662] start_transaction+0x4d7/0x5f0 [btrfs] [162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs] [162513.516686] __vfs_setxattr+0x66/0x80 [162513.516691] __vfs_setxattr_noperm+0x70/0x200 [162513.516697] vfs_setxattr+0x6b/0x120 [162513.516703] setxattr+0x125/0x240 [162513.516709] ? lock_acquire+0xb1/0x480 [162513.516712] ? mnt_want_write+0x20/0x50 [162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0 [162513.516723] ? preempt_count_add+0x49/0xa0 [162513.516725] ? __sb_start_write+0x19b/0x290 [162513.516727] ? preempt_count_add+0x49/0xa0 [162513.516732] path_setxattr+0xba/0xd0 [162513.516739] __x64_sys_setxattr+0x27/0x30 [162513.516741] do_syscall_64+0x33/0x80 [162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.516745] RIP: 0033:0x7f5238f56d5a [162513.516746] Code: Bad RIP value. [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470 [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700 [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004 [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0 [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds. [162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000 [162513.517780] Call Trace: [162513.517786] __schedule+0x5ce/0xd00 [162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.517796] schedule+0x46/0xf0 [162513.517810] wait_current_trans+0xde/0x140 [btrfs] [162513.517814] ? finish_wait+0x90/0x90 [162513.517829] start_transaction+0x37c/0x5f0 [btrfs] [162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.517862] ? __ia32_sys_fdatasync+0x20/0x20 [162513.517865] iterate_supers+0x87/0xf0 [162513.517869] ksys_sync+0x60/0xb0 [162513.517872] __do_sys_sync+0xa/0x10 [162513.517875] do_syscall_64+0x33/0x80 [162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.517881] RIP: 0033:0x7f5238f50bd7 [162513.517883] Code: Bad RIP value. [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053 [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0 [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053 [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds. [162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000 [162513.519160] Call Trace: [162513.519165] __schedule+0x5ce/0xd00 [162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.519174] schedule+0x46/0xf0 [162513.519190] wait_current_trans+0xde/0x140 [btrfs] [162513.519193] ? finish_wait+0x90/0x90 [162513.519206] start_transaction+0x4d7/0x5f0 [btrfs] [162513.519222] btrfs_create+0x57/0x200 [btrfs] [162513.519230] lookup_open+0x522/0x650 [162513.519246] path_openat+0x2b8/0xa50 [162513.519270] do_filp_open+0x91/0x100 [162513.519275] ? find_held_lock+0x32/0x90 [162513.519280] ? lock_acquired+0x33b/0x470 [162513.519285] ? do_raw_spin_unlock+0x4b/0xc0 [162513.519287] ? _raw_spin_unlock+0x29/0x40 [162513.519295] do_sys_openat2+0x20d/0x2d0 [162513.519300] do_sys_open+0x44/0x80 [162513.519304] do_syscall_64+0x33/0x80 [162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.519309] RIP: 0033:0x7f5238f4a903 [162513.519310] Code: Bad RIP value. [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903 [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470 [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002 [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013 [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620 [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds. [162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002 [162513.520511] Call Trace: [162513.520516] __schedule+0x5ce/0xd00 [162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.520525] schedule+0x46/0xf0 [162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs] [162513.520548] ? finish_wait+0x90/0x90 [162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs] [162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs] [162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs] [162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs] [162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs] [162513.520643] ? do_sigaction+0xf3/0x240 [162513.520645] ? find_held_lock+0x32/0x90 [162513.520648] ? do_sigaction+0xf3/0x240 [162513.520651] ? lock_acquired+0x33b/0x470 [162513.520655] ? _raw_spin_unlock_irq+0x24/0x50 [162513.520657] ? lockdep_hardirqs_on+0x7d/0x100 [162513.520660] ? _raw_spin_unlock_irq+0x35/0x50 [162513.520662] ? do_sigaction+0xf3/0x240 [162513.520671] ? __x64_sys_ioctl+0x83/0xb0 [162513.520672] __x64_sys_ioctl+0x83/0xb0 [162513.520677] do_syscall_64+0x33/0x80 [162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.520681] RIP: 0033:0x7fc3cd307d87 [162513.520682] Code: Bad RIP value. [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87 [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003 [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003 [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001 [162513.520703] Showing all locks held in the system: [162513.520712] 1 lock held by khungtaskd/54: [162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197 [162513.520728] 1 lock held by in:imklog/596: [162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [162513.520782] 1 lock held by btrfs-transacti/1356167: [162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs] [162513.520798] 1 lock held by btrfs/1356190: [162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60 [162513.520805] 1 lock held by fsstress/1356184: [162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520811] 3 locks held by fsstress/1356185: [162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520815] #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120 [162513.520820] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520833] 1 lock held by fsstress/1356196: [162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520838] 3 locks held by fsstress/1356197: [162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520843] #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50 [162513.520846] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520858] 2 locks held by btrfs/1356211: [162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs] [162513.520877] #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] This was weird because the stack traces show that a transaction commit, triggered by a device replace operation, is blocking trying to pause any running scrubs but there are no stack traces of blocked tasks doing a scrub. After poking around with drgn, I noticed there was a scrub task that was constantly running and blocking for shorts periods of time: >>> t = find_task(prog, 1356190) >>> prog.stack_trace(t) #0 __schedule+0x5ce/0xcfc #1 schedule+0x46/0xe4 #2 schedule_timeout+0x1df/0x475 #3 btrfs_reada_wait+0xda/0x132 #4 scrub_stripe+0x2a8/0x112f #5 scrub_chunk+0xcd/0x134 #6 scrub_enumerate_chunks+0x29e/0x5ee #7 btrfs_scrub_dev+0x2d5/0x91b #8 btrfs_ioctl+0x7f5/0x36e7 #9 __x64_sys_ioctl+0x83/0xb0 #10 do_syscall_64+0x33/0x77 #11 entry_SYSCALL_64+0x7c/0x156 Which corresponds to: int btrfs_reada_wait(void handle) { struct reada_control rc = handle; struct btrfs_fs_info fs_info = rc->fs_info; while (atomic_read(&rc->elems)) { if (!atomic_read(&fs_info->reada_works_cnt)) reada_start_machine(fs_info); wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0, (HZ + 9) / 10); } (...) So the counter "rc->elems" was set to 1 and never decreased to 0, causing the scrub task to loop forever in that function. Then I used the following script for drgn to check the readahead requests: $ cat dump_reada.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info ', mnt.mnt.mnt_sb.s_fs_info) def dump_re(re): nzones = re.nzones.value_() print(f're at {hex(re.value_())}') print(f'\t logical {re.logical.value_()}') print(f'\t refcnt {re.refcnt.value_()}') print(f'\t nzones {nzones}') for i in range(nzones): dev = re.zones[i].device name = dev.name.str.string_() print(f'\t\t dev id {dev.devid.value_()} name {name}') print() for _, e in radix_tree_for_each(fs_info.reada_tree): re = cast('struct reada_extent ', e) dump_re(re) $ drgn dump_reada.py re at 0xffff8f3da9d25ad8 logical 38928384 refcnt 1 nzones 1 dev id 0 name b'/dev/sdd' $ So there was one readahead extent with a single zone corresponding to the source device of that last device replace operation logged in dmesg/syslog. Also the ID of that zone's device was 0 which is a special value set in the source device of a device replace operation when the operation finishes (constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()), confirming again that device /dev/sdd was the source of a device replace operation. Normally there should be as many zones in the readahead extent as there are devices, and I wasn't expecting the extent to be in a block group with a 'single' profile, so I went and confirmed with the following drgn script that there weren't any single profile block groups: $ cat dump_block_groups.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) BTRFS_BLOCK_GROUP_DATA = (1 << 0) BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1) BTRFS_BLOCK_GROUP_METADATA = (1 << 2) BTRFS_BLOCK_GROUP_RAID0 = (1 << 3) BTRFS_BLOCK_GROUP_RAID1 = (1 << 4) BTRFS_BLOCK_GROUP_DUP = (1 << 5) BTRFS_BLOCK_GROUP_RAID10 = (1 << 6) BTRFS_BLOCK_GROUP_RAID5 = (1 << 7) BTRFS_BLOCK_GROUP_RAID6 = (1 << 8) BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9) BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10) def bg_flags_string(bg): flags = bg.flags.value_() ret = '' if flags & BTRFS_BLOCK_GROUP_DATA: ret = 'data' if flags & BTRFS_BLOCK_GROUP_METADATA: if len(ret) > 0: ret += '\|' ret += 'meta' if flags & BTRFS_BLOCK_GROUP_SYSTEM: if len(ret) > 0: ret += '\|' ret += 'system' if flags & BTRFS_BLOCK_GROUP_RAID0: ret += ' raid0' elif flags & BTRFS_BLOCK_GROUP_RAID1: ret += ' raid1' elif flags & BTRFS_BLOCK_GROUP_DUP: ret += ' dup' elif flags & BTRFS_BLOCK_GROUP_RAID10: ret += ' raid10' elif flags & BTRFS_BLOCK_GROUP_RAID5: ret += ' raid5' elif flags & BTRFS_BLOCK_GROUP_RAID6: ret += ' raid6' elif flags & BTRFS_BLOCK_GROUP_RAID1C3: ret += ' raid1c3' elif flags & BTRFS_BLOCK_GROUP_RAID1C4: ret += ' raid1c4' else: ret += ' single' return ret def dump_bg(bg): print() print(f'block group at {hex(bg.value_())}') print(f'\t start {bg.start.value_()} length {bg.length.value_()}') print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}') bg_root = fs_info.block_group_cache_tree.address_of_() for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'): dump_bg(bg) $ drgn dump_block_groups.py block group at 0xffff8f3d673b0400 start 22020096 length 16777216 flags 258 - system raid6 block group at 0xffff8f3d53ddb400 start 38797312 length 536870912 flags 260 - meta raid6 block group at 0xffff8f3d5f4d9c00 start 575668224 length 2147483648 flags 257 - data raid6 block group at 0xffff8f3d08189000 start 2723151872 length 67108864 flags 258 - system raid6 block group at 0xffff8f3db70ff000 start 2790260736 length 1073741824 flags 260 - meta raid6 block group at 0xffff8f3d5f4dd800 start 3864002560 length 67108864 flags 258 - system raid6 block group at 0xffff8f3d67037000 start 3931111424 length 2147483648 flags 257 - data raid6 $ So there were only 2 reasons left for having a readahead extent with a single zone: reada_find_zone(), called when creating a readahead extent, returned NULL either because we failed to find the corresponding block group or because a memory allocation failed. With some additional and custom tracing I figured out that on every further ocurrence of the problem the block group had just been deleted when we were looping to create the zones for the readahead extent (at reada_find_extent()), so we ended up with only one zone in the readahead extent, corresponding to a device that ends up getting replaced. So after figuring that out it became obvious why the hang happens: 1) Task A starts a scrub on any device of the filesystem, except for device /dev/sdd; 2) Task B starts a device replace with /dev/sdd as the source device; 3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently starting to scrub a stripe from block group X. This call to btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add() calls reada_add_block(), it passes the logical address of the extent tree's root node as its 'logical' argument - a value of 38928384; 4) Task A then enters reada_find_extent(), called from reada_add_block(). It finds there isn't any existing readahead extent for the logical address 38928384, so it proceeds to the path of creating a new one. It calls btrfs_map_block() to find out which stripes exist for the block group X. On the first iteration of the for loop that iterates over the stripes, it finds the stripe for device /dev/sdd, so it creates one zone for that device and adds it to the readahead extent. Before getting into the second iteration of the loop, the cleanup kthread deletes block group X because it was empty. So in the iterations for the remaining stripes it does not add more zones to the readahead extent, because the calls to reada_find_zone() returned NULL because they couldn't find block group X anymore. As a result the new readahead extent has a single zone, corresponding to the device /dev/sdd; 4) Before task A returns to btrfs_reada_add() and queues the readahead job for the readahead work queue, task B finishes the device replace and at btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new device /dev/sdg; 5) Task A returns to reada_add_block(), which increments the counter "->elems" of the reada_control structure allocated at btrfs_reada_add(). Then it returns back to btrfs_reada_add() and calls reada_start_machine(). This queues a job in the readahead work queue to run the function reada_start_machine_worker(), which calls __reada_start_machine(). At __reada_start_machine() we take the device list mutex and for each device found in the current device list, we call reada_start_machine_dev() to start the readahead work. However at this point the device /dev/sdd was already freed and is not in the device list anymore. This means the corresponding readahead for the extent at 38928384 is never started, and therefore the "->elems" counter of the reada_control structure allocated at btrfs_reada_add() never goes down to 0, causing the call to btrfs_reada_wait(), done by the scrub task, to wait forever. Note that the readahead request can be made either after the device replace started or before it started, however in pratice it is very unlikely that a device replace is able to start after a readahead request is made and is able to complete before the readahead request completes - maybe only on a very small and nearly empty filesystem. This hang however is not the only problem we can have with readahead and device removals. When the readahead extent has other zones other than the one corresponding to the device that is being removed (either by a device replace or a device remove operation), we risk having a use-after-free on the device when dropping the last reference of the readahead extent. For example if we create a readahead extent with two zones, one for the device /dev/sdd and one for the device /dev/sde: 1) Before the readahead worker starts, the device /dev/sdd is removed, and the corresponding btrfs_device structure is freed. However the readahead extent still has the zone pointing to the device structure; 2) When the readahead worker starts, it only finds device /dev/sde in the current device list of the filesystem; 3) It starts the readahead work, at reada_start_machine_dev(), using the device /dev/sde; 4) Then when it finishes reading the extent from device /dev/sde, it calls __readahead_hook() which ends up dropping the last reference on the readahead extent through the last call to reada_extent_put(); 5) At reada_extent_put() it iterates over each zone of the readahead extent and attempts to delete an element from the device's 'reada_extents' radix tree, resulting in a use-after-free, as the device pointer of the zone for /dev/sdd is now stale. We can also access the device after dropping the last reference of a zone, through reada_zone_release(), also called by reada_extent_put(). And a device remove suffers the same problem, however since it shrinks the device size down to zero before removing the device, it is very unlikely to still have readahead requests not completed by the time we free the device, the only possibility is if the device has a very little space allocated. While the hang problem is exclusive to scrub, since it is currently the only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free problem affects any path that triggers readhead, which includes btree_readahead_hook() and __readahead_hook() (a readahead worker can trigger readahed for the children of a node) for example - any path that ends up calling reada_add_block() can trigger the use-after-free after a device is removed. So fix this by waiting for any readahead requests for a device to complete before removing a device, ensuring that while waiting for existing ones no new ones can be made. This problem has been around for a very long time - the readahead code was added in 2011, device remove exists since 2008 and device replace was introduced in 2013, hard to pick a specific commit for a git Fixes tag. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Filipe Manana	dfda50e882	btrfs: fix use-after-free on readahead extent after failure to create it commit 83bc1560e02e25c6439341352024ebe8488f4fbd upstream. If we fail to find suitable zones for a new readahead extent, we end up leaving a stale pointer in the global readahead extents radix tree (fs_info->reada_tree), which can trigger the following trace later on: [13367.696354] BUG: kernel NULL pointer dereference, address: 00000000000000b0 [13367.696802] #PF: supervisor read access in kernel mode [13367.697249] #PF: error_code(0x0000) - not-present page [13367.697721] PGD 0 P4D 0 [13367.698171] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [13367.698632] CPU: 6 PID: 851214 Comm: btrfs Tainted: G W 5.9.0-rc6-btrfs-next-69 #1 [13367.699100] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [13367.700069] RIP: 0010:__lock_acquire+0x20a/0x3970 [13367.700562] Code: ff 1f 0f b7 c0 48 0f (...) [13367.701609] RSP: 0018:ffffb14448f57790 EFLAGS: 00010046 [13367.702140] RAX: 0000000000000000 RBX: 29b935140c15e8cf RCX: 0000000000000000 [13367.702698] RDX: 0000000000000002 RSI: ffffffffb3d66bd0 RDI: 0000000000000046 [13367.703240] RBP: ffff8a52ba8ac040 R08: 00000c2866ad9288 R09: 0000000000000001 [13367.703783] R10: 0000000000000001 R11: 00000000b66d9b53 R12: ffff8a52ba8ac9b0 [13367.704330] R13: 0000000000000000 R14: ffff8a532b6333e8 R15: 0000000000000000 [13367.704880] FS: 00007fe1df6b5700(0000) GS:ffff8a5376600000(0000) knlGS:0000000000000000 [13367.705438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13367.705995] CR2: 00000000000000b0 CR3: 000000022cca8004 CR4: 00000000003706e0 [13367.706565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [13367.707127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [13367.707686] Call Trace: [13367.708246] ? ___slab_alloc+0x395/0x740 [13367.708820] ? reada_add_block+0xae/0xee0 [btrfs] [13367.709383] lock_acquire+0xb1/0x480 [13367.709955] ? reada_add_block+0xe0/0xee0 [btrfs] [13367.710537] ? reada_add_block+0xae/0xee0 [btrfs] [13367.711097] ? rcu_read_lock_sched_held+0x5d/0x90 [13367.711659] ? kmem_cache_alloc_trace+0x8d2/0x990 [13367.712221] ? lock_acquired+0x33b/0x470 [13367.712784] _raw_spin_lock+0x34/0x80 [13367.713356] ? reada_add_block+0xe0/0xee0 [btrfs] [13367.713966] reada_add_block+0xe0/0xee0 [btrfs] [13367.714529] ? btrfs_root_node+0x15/0x1f0 [btrfs] [13367.715077] btrfs_reada_add+0x117/0x170 [btrfs] [13367.715620] scrub_stripe+0x21e/0x10d0 [btrfs] [13367.716141] ? kvm_sched_clock_read+0x5/0x10 [13367.716657] ? __lock_acquire+0x41e/0x3970 [13367.717184] ? scrub_chunk+0x60/0x140 [btrfs] [13367.717697] ? find_held_lock+0x32/0x90 [13367.718254] ? scrub_chunk+0x60/0x140 [btrfs] [13367.718773] ? lock_acquired+0x33b/0x470 [13367.719278] ? scrub_chunk+0xcd/0x140 [btrfs] [13367.719786] scrub_chunk+0xcd/0x140 [btrfs] [13367.720291] scrub_enumerate_chunks+0x270/0x5c0 [btrfs] [13367.720787] ? finish_wait+0x90/0x90 [13367.721281] btrfs_scrub_dev+0x1ee/0x620 [btrfs] [13367.721762] ? rcu_read_lock_any_held+0x8e/0xb0 [13367.722235] ? preempt_count_add+0x49/0xa0 [13367.722710] ? __sb_start_write+0x19b/0x290 [13367.723192] btrfs_ioctl+0x7f5/0x36f0 [btrfs] [13367.723660] ? __fget_files+0x101/0x1d0 [13367.724118] ? find_held_lock+0x32/0x90 [13367.724559] ? __fget_files+0x101/0x1d0 [13367.724982] ? __x64_sys_ioctl+0x83/0xb0 [13367.725399] __x64_sys_ioctl+0x83/0xb0 [13367.725802] do_syscall_64+0x33/0x80 [13367.726188] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [13367.726574] RIP: 0033:0x7fe1df7add87 [13367.726948] Code: 00 00 00 48 8b 05 09 91 (...) [13367.727763] RSP: 002b:00007fe1df6b4d48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [13367.728179] RAX: ffffffffffffffda RBX: 000055ce1fb596a0 RCX: 00007fe1df7add87 [13367.728604] RDX: 000055ce1fb596a0 RSI: 00000000c400941b RDI: 0000000000000003 [13367.729021] RBP: 0000000000000000 R08: 00007fe1df6b5700 R09: 0000000000000000 [13367.729431] R10: 00007fe1df6b5700 R11: 0000000000000246 R12: 00007ffd922b07de [13367.729842] R13: 00007ffd922b07df R14: 00007fe1df6b4e40 R15: 0000000000802000 [13367.730275] Modules linked in: btrfs blake2b_generic xor (...) [13367.732638] CR2: 00000000000000b0 [13367.733166] ---[ end trace d298b6805556acd9 ]--- What happens is the following: 1) At reada_find_extent() we don't find any existing readahead extent for the metadata extent starting at logical address X; 2) So we proceed to create a new one. We then call btrfs_map_block() to get information about which stripes contain extent X; 3) After that we iterate over the stripes and create only one zone for the readahead extent - only one because reada_find_zone() returned NULL for all iterations except for one, either because a memory allocation failed or it couldn't find the block group of the extent (it may have just been deleted); 4) We then add the new readahead extent to the readahead extents radix tree at fs_info->reada_tree; 5) Then we iterate over each zone of the new readahead extent, and find that the device used for that zone no longer exists, because it was removed or it was the source device of a device replace operation. Since this left 'have_zone' set to 0, after finishing the loop we jump to the 'error' label, call kfree() on the new readahead extent and return without removing it from the radix tree at fs_info->reada_tree; 6) Any future call to reada_find_extent() for the logical address X will find the stale pointer in the readahead extents radix tree, increment its reference counter, which can trigger the use-after-free right away or return it to the caller reada_add_block() that results in the use-after-free of the example trace above. So fix this by making sure we delete the readahead extent from the radix tree if we fail to setup zones for it (when 'have_zone = 0'). Fixes: 319450211842ba ("btrfs: reada: bypass adding extent when all zone failed") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Daniel Xu	834a61b212	btrfs: tree-checker: validate number of chunk stripes and parity commit 85d07fbe09efd1c529ff3e025e2f0d2c6c96a1b7 upstream. If there's no parity and num_stripes < ncopies, a crafted image can trigger a division by zero in calc_stripe_length(). The image was generated through fuzzing. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209587 Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Josef Bacik	1cedc54ad3	btrfs: cleanup cow block on error commit 572c83acdcdafeb04e70aa46be1fa539310be20c upstream. In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead to a system lockup. It gets stuck trying to write back inodes, and the write back thread was trying to lock an extent buffer: $ cat /proc/2143497/stack [<0>] __btrfs_tree_lock+0x108/0x250 [<0>] lock_extent_buffer_for_io+0x35e/0x3a0 [<0>] btree_write_cache_pages+0x15a/0x3b0 [<0>] do_writepages+0x28/0xb0 [<0>] __writeback_single_inode+0x54/0x5c0 [<0>] writeback_sb_inodes+0x1e8/0x510 [<0>] wb_writeback+0xcc/0x440 [<0>] wb_workfn+0xd7/0x650 [<0>] process_one_work+0x236/0x560 [<0>] worker_thread+0x55/0x3c0 [<0>] kthread+0x13a/0x150 [<0>] ret_from_fork+0x1f/0x30 This is because we got an error while COWing a block, specifically here if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) { ret = btrfs_reloc_cow_block(trans, root, buf, cow); if (ret) { btrfs_abort_transaction(trans, ret); return ret; } } [16402.241552] BTRFS: Transaction aborted (error -2) [16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540 [16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8 [16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540 [16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282 [16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388 [16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380 [16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000 [16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0 [16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000 [16402.256321] FS: 00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000 [16402.256973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0 [16402.257867] Call Trace: [16402.258072] btrfs_cow_block+0x109/0x230 [16402.258356] btrfs_search_slot+0x530/0x9d0 [16402.258655] btrfs_lookup_file_extent+0x37/0x40 [16402.259155] __btrfs_drop_extents+0x13c/0xd60 [16402.259628] ? btrfs_block_rsv_migrate+0x4f/0xb0 [16402.259949] btrfs_replace_file_extents+0x190/0x820 [16402.260873] btrfs_clone+0x9ae/0xc00 [16402.261139] btrfs_extent_same_range+0x66/0x90 [16402.261771] btrfs_remap_file_range+0x353/0x3b1 [16402.262333] vfs_dedupe_file_range_one.part.0+0xd5/0x140 [16402.262821] vfs_dedupe_file_range+0x189/0x220 [16402.263150] do_vfs_ioctl+0x552/0x700 [16402.263662] __x64_sys_ioctl+0x62/0xb0 [16402.264023] do_syscall_64+0x33/0x40 [16402.264364] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [16402.264862] RIP: 0033:0x7f542c7d15cb [16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb [16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003 [16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64 [16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 [16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80 [16402.271498] irq event stamp: 0 [16402.271846] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0 [16402.273343] softirqs last enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0 [16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0 [16402.274338] ---[ end trace 737874a5a41a8236 ]--- [16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.276179] BTRFS info (device dm-9): forced readonly [16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry [16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.280582] BTRFS info (device dm-9): balance: ended with status: -30 The problem here is that as soon as we allocate the new block it is locked and marked dirty in the btree inode. This means that we could attempt to writeback this block and need to lock the extent buffer. However we're not unlocking it here and thus we deadlock. Fix this by unlocking the cow block if we have any errors inside of __btrfs_cow_block, and also free it so we do not leak it. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00
Qu Wenruo	d3ce2d0fb8	btrfs: tree-checker: fix false alert caused by legacy btrfs root item commit 1465af12e254a68706e110846f59cf0f09683184 upstream. Commit 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") introduced btrfs root item size check, however btrfs root item has two versions, the legacy one which just ends before generation_v2 member, is smaller than current btrfs root item size. This caused btrfs kernel to reject valid but old tree root leaves. Fix this problem by also allowing legacy root item, since kernel can already handle them pretty well and upgrade to newer root item format when needed. Reported-by: Martin Steigerwald <martin@lichtvoll.de> Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Tested-By: Martin Steigerwald <martin@lichtvoll.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-05 11:43:27 +01:00

1 2 3 4 5 ...

882911 Commits