IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit edf7ddbf1c5eb98b720b063b73e20e8a4a1ce673 ]
Missing calls to mntget() (or equivalently, too many calls to mntput())
are hard to detect because mntput() delays freeing mounts using
task_work_add(), then again using call_rcu(). As a result, mnt_count
can often be decremented to -1 without getting a KASAN use-after-free
report. Such cases are still bugs though, and they point to real
use-after-frees being possible.
For an example of this, see the bug fixed by commit 1b0b9cc8d379
("vfs: fsmount: add missing mntget()"), discussed at
https://lkml.kernel.org/linux-fsdevel/20190605135401.GB30925@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u.
This bug *should* have been trivial to find. But actually, it wasn't
found until syzkaller happened to use fchdir() to manipulate the
reference count just right for the bug to be noticeable.
Address this by making mntput_no_expire() issue a WARN if mnt_count has
become negative.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38dc717e97153e46375ee21797aa54777e5498f3 ]
Apparently there has been a longstanding race between udev/systemd and
the module loader. Currently, the module loader sends a uevent right
after sysfs initialization, but before the module calls its init
function. However, some udev rules expect that the module has
initialized already upon receiving the uevent.
This race has been triggered recently (see link in references) in some
systemd mount unit files. For instance, the configfs module creates the
/sys/kernel/config mount point in its init function, however the module
loader issues the uevent before this happens. sys-kernel-config.mount
expects to be able to mount /sys/kernel/config upon receipt of the
module loading uevent, but if the configfs module has not called its
init function yet, then this directory will not exist and the mount unit
fails. A similar situation exists for sys-fs-fuse-connections.mount, as
the fuse sysfs mount point is created during the fuse module's init
function. If udev is faster than module initialization then the mount
unit would fail in a similar fashion.
To fix this race, delay the module KOBJ_ADD uevent until after the
module has finished calling its init routine.
References: https://github.com/systemd/systemd/issues/17586
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-By: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6d49ecd1081740b6e632366428b960461f8158b ]
When returning the layout in nfs4_evict_inode(), we need to ensure that
the layout is actually done being freed before we can proceed to free the
inode itself.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1eab0fea2514b269e384c117f5b5772b882761f0 ]
When devm_rtc_allocate_device is failed in pl031_probe, it should release
mem regions with device.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20201112093139.32566-1-zhengliang6@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10f04d40a9fa29785206c619f80d8beedb778837 ]
The on-disk quota format supports quota files with upto 2^32 blocks. Be
careful when computing quota file offsets in the quota files from block
numbers as they can overflow 32-bit types. Since quota files larger than
4GB would require ~26 millions of quota users, this is mostly a
theoretical concern now but better be careful, fuzzers would find the
problem sooner or later anyway...
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e8ed280dab9eeabc1ba0b2db5dbe9fe6debb6b5 ]
If a module fails to load due to an error in prepare_coming_module(),
the following error handling in load_module() runs with
MODULE_STATE_COMING in module's state. Fix it by correctly setting
MODULE_STATE_GOING under "bug_cleanup" label.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28d211919e422f58c1e6c900e5810eee4f1ce4c8 ]
When clk_hw_register_fixed_rate_with_accuracy() fails,
clk_data should be freed. It's the same for the subsequent
two error paths, but we should also unregister the already
registered clocks in them.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201020061226.6572-1-dinghao.liu@zju.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8d1ddb5e79374fb277985a6b3faa2ed8631c5b4c upstream.
Syzbot reports a potential deadlock found by the newly added recursive
read deadlock detection in lockdep:
[...] ========================================================
[...] WARNING: possible irq lock inversion dependency detected
[...] 5.9.0-rc2-syzkaller #0 Not tainted
[...] --------------------------------------------------------
[...] syz-executor.1/10214 just changed the state of lock:
[...] ffff88811f506338 (&f->f_owner.lock){.+..}-{2:2}, at: send_sigurg+0x1d/0x200
[...] but this lock was taken by another, HARDIRQ-safe lock in the past:
[...] (&dev->event_lock){-...}-{2:2}
[...]
[...]
[...] and interrupts could create inverse lock ordering between them.
[...]
[...]
[...] other info that might help us debug this:
[...] Chain exists of:
[...] &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
[...]
[...] Possible interrupt unsafe locking scenario:
[...]
[...] CPU0 CPU1
[...] ---- ----
[...] lock(&f->f_owner.lock);
[...] local_irq_disable();
[...] lock(&dev->event_lock);
[...] lock(&new->fa_lock);
[...] <Interrupt>
[...] lock(&dev->event_lock);
[...]
[...] *** DEADLOCK ***
The corresponding deadlock case is as followed:
CPU 0 CPU 1 CPU 2
read_lock(&fown->lock);
spin_lock_irqsave(&dev->event_lock, ...)
write_lock_irq(&filp->f_owner.lock); // wait for the lock
read_lock(&fown-lock); // have to wait until the writer release
// due to the fairness
<interrupted>
spin_lock_irqsave(&dev->event_lock); // wait for the lock
The lock dependency on CPU 1 happens if there exists a call sequence:
input_inject_event():
spin_lock_irqsave(&dev->event_lock,...);
input_handle_event():
input_pass_values():
input_to_handler():
handler->event(): // evdev_event()
evdev_pass_values():
spin_lock(&client->buffer_lock);
__pass_event():
kill_fasync():
kill_fasync_rcu():
read_lock(&fa->fa_lock);
send_sigio():
read_lock(&fown->lock);
To fix this, make the reader in send_sigurg() and send_sigio() use
read_lock_irqsave() and read_lock_irqrestore().
Reported-by: syzbot+22e87cdf94021b984aa6@syzkaller.appspotmail.com
Reported-by: syzbot+c5e32344981ad9f33750@syzkaller.appspotmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc889b8d4a8122549feabe99eead04e6b23b6513 upstream.
Make the printk() [bfs "printf" macro] seem less severe by changing
"WARNING:" to "NOTE:".
<asm-generic/bug.h> warns us about using WARNING or BUG in a format string
other than in WARN() or BUG() family macros. bfs/inode.c is doing just
that in a normal printk() call, so change the "WARNING" string to be
"NOTE".
Link: https://lkml.kernel.org/r/20201203212634.17278-1-rdunlap@infradead.org
Reported-by: syzbot+3fd34060f26e766536ff@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Tigran A. Aivazian" <aivazian.tigran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ebd47037027c4beae99680bff3b20fdee5d7c1e upstream.
The snd_seq_queue struct contains various flags in the bit fields.
Those are categorized to two different use cases, both of which are
protected by different spinlocks. That implies that there are still
potential risks of the bad operations for bit fields by concurrent
accesses.
For addressing the problem, this patch rearranges those flags to be
a standard bool instead of a bit field.
Reported-by: syzbot+63cbe31877bb80ef58f5@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20201206083456.21110-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0ac1a26ed5943127cb0156148735f5f52a07075 upstream.
As reported on:
https://lore.kernel.org/linux-media/20190627222020.45909-1-willemdebruijn.kernel@gmail.com/
if gp8psk_usb_in_op() returns an error, the status var is not
initialized. Yet, this var is used later on, in order to
identify:
- if the device was already started;
- if firmware has loaded;
- if the LNBf was powered on.
Using status = 0 seems to ensure that everything will be
properly powered up.
So, instead of the proposed solution, let's just set
status = 0.
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70f259a3f4276b71db365b1d6ff1eab805ea6ec3 upstream.
When h5_close() gets called, the memory allocated for the hu gets
freed only if hu->serdev doesn't exist. This leads to a memory leak.
So when h5_close() is requested, close the serdev device instance and
free the memory allocated to the hu entirely instead.
Fixes: https://syzkaller.appspot.com/bug?extid=6ce141c55b2f7aafd1c4
Reported-by: syzbot+6ce141c55b2f7aafd1c4@syzkaller.appspotmail.com
Tested-by: syzbot+6ce141c55b2f7aafd1c4@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5812b32e01c6d86ba7a84110702b46d8a8531fe9 upstream.
Specify type alignment when declaring linker-section match-table entries
to prevent gcc from increasing alignment and corrupting the various
tables with padding (e.g. timers, irqchips, clocks, reserved memory).
This is specifically needed on x86 where gcc (typically) aligns larger
objects like struct of_device_id with static extent on 32-byte
boundaries which at best prevents matching on anything but the first
entry. Specifying alignment when declaring variables suppresses this
optimisation.
Here's a 64-bit example where all entries are corrupt as 16 bytes of
padding has been inserted before the first entry:
ffffffff8266b4b0 D __clk_of_table
ffffffff8266b4c0 d __of_table_fixed_factor_clk
ffffffff8266b5a0 d __of_table_fixed_clk
ffffffff8266b680 d __clk_of_table_sentinel
And here's a 32-bit example where the 8-byte-aligned table happens to be
placed on a 32-byte boundary so that all but the first entry are corrupt
due to the 28 bytes of padding inserted between entries:
812b3ec0 D __irqchip_of_table
812b3ec0 d __of_table_irqchip1
812b3fa0 d __of_table_irqchip2
812b4080 d __of_table_irqchip3
812b4160 d irqchip_of_match_end
Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte
alignment), and on arm using gcc-7.2.
Note that there are no in-tree users of these tables on x86 currently
(even if they are included in the image).
Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations")
Fixes: f6e916b82022 ("irqchip: add basic infrastructure")
Cc: stable <stable@vger.kernel.org> # 3.9
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org
[ johan: adjust context to 5.4 ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 upstream.
For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.
Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ddcdea5b54492f54700f427f58690cf1e187e5e upstream.
To pick up the changes in:
a85cbe6159ffc973 ("uapi: move constants from <linux/kernel.h> to <linux/const.h>")
That causes no changes in tooling, just addresses this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/const.h' differs from latest version at 'include/uapi/linux/const.h'
diff -u tools/include/uapi/linux/const.h include/uapi/linux/const.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a85cbe6159ffc973e5702f70a3bd5185f8f3c38d upstream.
and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>.
The reason is to avoid indirect <linux/sysinfo.h> include when using
some network headers: <linux/netlink.h> or others -> <linux/kernel.h>
-> <linux/sysinfo.h>.
This indirect include causes on MUSL redefinition of struct sysinfo when
included both <sys/sysinfo.h> and some of UAPI headers:
In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5,
from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5,
from ../include/tst_netlink.h:14,
from tst_crypto.c:13:
x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo'
struct sysinfo {
^~~~~~~
In file included from ../include/tst_safe_macros.h:15,
from ../include/tst_test.h:93,
from tst_crypto.c:11:
x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here
Link: https://lkml.kernel.org/r/20201015190013.8901-1-petr.vorel@gmail.com
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Suggested-by: Rich Felker <dalias@aerifal.cx>
Acked-by: Rich Felker <dalias@libc.org>
Cc: Peter Korsgaard <peter@korsgaard.com>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa4d0f1992a96f6d7c988ef423e3127e613f6ac9 upstream.
With the current implementation the following race can happen:
* blk_pre_runtime_suspend() calls blk_freeze_queue_start() and
blk_mq_unfreeze_queue().
* blk_queue_enter() calls blk_queue_pm_only() and that function returns
true.
* blk_queue_enter() calls blk_pm_request_resume() and that function does
not call pm_request_resume() because the queue runtime status is
RPM_ACTIVE.
* blk_pre_runtime_suspend() changes the queue status into RPM_SUSPENDING.
Fix this race by changing the queue runtime status into RPM_SUSPENDING
before switching q_usage_counter to atomic mode.
Link: https://lore.kernel.org/r/20201209052951.16136-2-bvanassche@acm.org
Fixes: 986d413b7c15 ("blk-mq: Enable support for runtime power management")
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd3ed3c73ac671ff6b0230ccb72b8300292d3643 ]
Set rp_size to zero will be ignore during remounting.
The method to identify whether we input a remounting option of
rp_size is to check if the rp_size input is zero. It can not work
well if we pass "rp_size=0".
This patch add a bool variable "set_rp_size" to fix this problem.
Reported-by: Jubin Zhong <zhongjubin@huawei.com>
Signed-off-by: lizhe <lizhe67@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1891ef21d92c4801ea082ee8ed478e304ddc6749 ]
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.
Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.
The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:
00000388 <testfls>:
388: 2c 03 00 00 cmpwi r3,0
38c: 41 82 00 10 beq 39c <testfls+0x14>
390: 7c 63 00 34 cntlzw r3,r3
394: 20 63 00 20 subfic r3,r3,32
398: 4e 80 00 20 blr
39c: 38 60 00 00 li r3,0
3a0: 4e 80 00 20 blr
000003b0 <testfls64>:
3b0: 2c 03 00 00 cmpwi r3,0
3b4: 40 82 00 1c bne 3d0 <testfls64+0x20>
3b8: 2f 84 00 00 cmpwi cr7,r4,0
3bc: 38 60 00 00 li r3,0
3c0: 4d 9e 00 20 beqlr cr7
3c4: 7c 83 00 34 cntlzw r3,r4
3c8: 20 63 00 20 subfic r3,r3,32
3cc: 4e 80 00 20 blr
3d0: 7c 63 00 34 cntlzw r3,r3
3d4: 20 63 00 40 subfic r3,r3,64
3d8: 4e 80 00 20 blr
When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.
For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:
00000388 <testfls>:
388: 7c 63 00 34 cntlzw r3,r3
38c: 20 63 00 20 subfic r3,r3,32
390: 4e 80 00 20 blr
000003a0 <testfls64>:
3a0: 2c 03 00 00 cmpwi r3,0
3a4: 40 82 00 10 bne 3b4 <testfls64+0x14>
3a8: 7c 83 00 34 cntlzw r3,r4
3ac: 20 63 00 20 subfic r3,r3,32
3b0: 4e 80 00 20 blr
3b4: 7c 63 00 34 cntlzw r3,r3
3b8: 20 63 00 40 subfic r3,r3,64
3bc: 4e 80 00 20 blr
Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39485ed95d6b83b62fa75c06c2c4d33992e0d971 ]
Until commit e7c587da1252 ("x86/speculation: Use synthetic bits for
IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before
allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD.
Testing only Intel bits on VMX processors, or only AMD bits on SVM
processors, fails if the guests are created with the "opposite" vendor
as the host.
While at it, also tweak the host CPU check to use the vendor-agnostic
feature bit X86_FEATURE_IBPB, since we only care about the availability
of the MSR on the host here and not about specific CPUID bits.
Fixes: e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP")
Cc: stable@vger.kernel.org
Reported-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df7e8818926eb4712b67421442acf7d568fe2645 ]
Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit. However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR. Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6441fa6178f5456d1d4b512c08798888f99db185 ]
If the guest is configured to have SPEC_CTRL but the host does not
(which is a nonsensical configuration but these are not explicitly
forbidden) then a host-initiated MSR write can write vmx->spec_ctrl
(respectively svm->spec_ctrl) and trigger a #GP when KVM tries to
restore the host value of the MSR. Add a more comprehensive check
for valid bits of SPEC_CTRL, covering host CPUID flags and,
since we are at it and it is more correct that way, guest CPUID
flags too.
For AMD, remove the unnecessary is_guest_mode check around setting
the MSR interception bitmap, so that the code looks the same as
for Intel.
Cc: Jim Mattson <jmattson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b08070eca9e247f60ab39d79b2c25d274750441f ]
ext4_handle_error() with errors=continue mount option can accidentally
remount the filesystem read-only when the system is rebooting. Fix that.
Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20201127113405.26867-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f458a3873ae94efe1f37c8b96c97e7298769e98 ]
When defragmenting we skip ranges that have holes or inline extents, so that
we don't do unnecessary IO and waste space. We do this check when calling
should_defrag_range() at btrfs_defrag_file(). However we do it without
holding the inode's lock. The reason we do it like this is to avoid
blocking other tasks for too long, that possibly want to operate on other
file ranges, since after the call to should_defrag_range() and before
locking the inode, we trigger a synchronous page cache readahead. However
before we were able to lock the inode, some other task might have punched
a hole in our range, or we may now have an inline extent there, in which
case we should not set the range for defrag anymore since that would cause
unnecessary IO and make us waste space (i.e. allocating extents to contain
zeros for a hole).
So after we locked the inode and the range in the iotree, check again if
we have holes or an inline extent, and if we do, just skip the range.
I hit this while testing my next patch that fixes races when updating an
inode's number of bytes (subject "btrfs: update the number of bytes used
by an inode atomically"), and it depends on this change in order to work
correctly. Alternatively I could rework that other patch to detect holes
and flag their range with the 'new delalloc' bit, but this itself fixes
an efficiency problem due a race that from a functional point of view is
not harmful (it could be triggered with btrfs/062 from fstests).
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 16b8fe4caf499ae8e12d2ab1b1324497e36a7b83 ]
In case an error occurs in vfio_pci_enable() before the call to
vfio_pci_probe_mmaps(), vfio_pci_disable() will try to iterate
on an uninitialized list and cause a kernel panic.
Lets move to the initialization to vfio_pci_probe() to fix the
issue.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Fixes: 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive")
CC: Stable <stable@vger.kernel.org> # v4.7+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3ceb6543e9cf6ed87cc1fbc6f23ca2db903564cd upstream.
There isn't really any valid reason to use __FSCRYPT_MODE_MAX or
FSCRYPT_POLICY_FLAGS_VALID in a userspace program. These constants are
only meant to be used by the kernel internally, and they are defined in
the UAPI header next to the mode numbers and flags only so that kernel
developers don't forget to update them when adding new modes or flags.
In https://lkml.kernel.org/r/20201005074133.1958633-2-satyat@google.com
there was an example of someone wanting to use __FSCRYPT_MODE_MAX in a
user program, and it was wrong because the program would have broken if
__FSCRYPT_MODE_MAX were ever increased. So having this definition
available is harmful. FSCRYPT_POLICY_FLAGS_VALID has the same problem.
So, remove these definitions from the UAPI header. Replace
FSCRYPT_POLICY_FLAGS_VALID with just listing the valid flags explicitly
in the one kernel function that needs it. Move __FSCRYPT_MODE_MAX to
fscrypt_private.h, remove the double underscores (which were only
present to discourage use by userspace), and add a BUILD_BUG_ON() and
comments to (hopefully) ensure it is kept in sync.
Keep the old name FS_POLICY_FLAGS_VALID, since it's been around for
longer and there's a greater chance that removing it would break source
compatibility with some program. Indeed, mtd-utils is using it in
an #ifdef, and removing it would introduce compiler warnings (about
FS_POLICY_FLAGS_PAD_* being redefined) into the mtd-utils build.
However, reduce its value to 0x07 so that it only includes the flags
with old names (the ones present before Linux 5.4), and try to make it
clear that it's now "frozen" and no new flags should be added to it.
Fixes: 2336d0deb2d4 ("fscrypt: use FSCRYPT_ prefix for uapi constants")
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20201024005132.495952-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 159e1de201b6fca10bfec50405a3b53a561096a8 upstream.
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfc2b7e8518999003a61f91c1deb5e88ed77b07d upstream.
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link().
Note that the weird check for the current task in f2fs_do_add_link()
seems to make this bug difficult to reproduce on f2fs.
Fixes: 9ea97163c6da ("f2fs crypto: add filename encryption for f2fs_add_link")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76786a0f083473de31678bdb259a3d4167cf756d upstream.
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on ubifs by rejecting no-key dentries in ubifs_create(),
ubifs_mkdir(), ubifs_mknod(), and ubifs_symlink().
Note that ubifs doesn't actually report the duplicate filenames from
readdir, but rather it seems to replace the original dentry with a new
one (which is still wrong, just a different effect from ext4).
On ubifs, this fixes xfstest generic/595 as well as the new xfstest I
wrote specifically for this bug.
Fixes: f4f61d2cc6d8 ("ubifs: Implement encrypted filenames")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75d18cd1868c2aee43553723872c35d7908f240f upstream.
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on ext4 by rejecting no-key dentries in ext4_add_entry().
Note that the duplicate check in ext4_find_dest_de() sometimes prevented
this bug. However in many cases it didn't, since ext4_find_dest_de()
doesn't examine every dentry.
Fixes: 4461471107b7 ("ext4 crypto: enable filename encryption")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 236761f19a4f373354f1dcf399b57753f1f4b871 upstream.
If state has not changed successfully and we updated cpufreq_state,
next time when the new state is equal to cpufreq_state (not changed
successfully last time), we will return directly and miss a
freq_qos_update_request() that should have been.
Fixes: 5130802ddbb1 ("thermal: cpu_cooling: Switch to QoS requests for freq limits")
Cc: v5.4+ <stable@vger.kernel.org> # v5.4+
Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201106092243.15574-1-zhuguangqing83@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93decc563637c4288380912eac0eb42fb246cc04 upstream.
In __make_request() a new r10bio is allocated and passed to
raid10_read_request(). The read_slot member of the bio is not
initialized, and the raid10_read_request() uses it to index an
array. This leads to occasional panics.
Fix by initializing the field to invalid value and checking for
valid value in raid10_read_request().
Cc: stable@vger.kernel.org
Signed-off-by: Kevin Vigor <kvigor@gmail.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 028c221ed1904af9ac3c5162ee98f48966de6b3d ]
AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.
The NodeId value can be used when a physical ID is needed by software.
Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.
Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.
Update the x86 topology documentation.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit adab66b71abfe206a020f11e561f4df41f0b2aba upstream.
It was believed that metag was the only architecture that required the ring
buffer to keep 8 byte words aligned on 8 byte architectures, and with its
removal, it was assumed that the ring buffer code did not need to handle
this case. It appears that sparc64 also requires this.
The following was reported on a sparc64 boot up:
kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear)
kernel: Running postponed tracer tests:
kernel: Testing tracer function:
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: PASSED
Need to put back the 64BIT aligned code for the ring buffer.
Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com
Cc: stable@vger.kernel.org
Fixes: 86b3de60a0b6 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS")
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 291de1d102fafef0798cdad9666cd4f8da7da7cc upstream.
When I use the axp20x chip to power my SDIO device on the 5.4 kernel,
the output voltage of DLDO2 is wrong. After comparing the register
manual and source code of the chip, I found that the mask bit of the
driver register of the port was wrong. I fixed this error by modifying
the mask register of the source code. This error seems to be a copy
error of the macro when writing the code. Now the voltage output of
the DLDO2 port of axp20x is correct. My development environment is
Allwinner A40I of arm architecture, and the kernel version is 5.4.
Signed-off-by: DingHua Ma <dinghua.ma.sz@gmail.com>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Cc: <stable@vger.kernel.org>
Fixes: db4a555f7c4c ("regulator: axp20x: use defines for masks")
Link: https://lore.kernel.org/r/20201201001000.22302-1-dinghua.ma.sz@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4684709bf81a2d98152ed6b610e3d5c403f9bced upstream.
If kobject_init_and_add() fails, pci_slot_release() is called to delete
slot->list from parent->slots. But slot->list hasn't been initialized
yet, so we dereference a NULL pointer:
Unable to handle kernel NULL pointer dereference at virtual address
00000000
...
CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197
task: ffffeb398a45ef10 task.stack: ffffeb398a470000
PC is at __list_del_entry_valid+0x5c/0xb0
LR is at pci_slot_release+0x84/0xe4
...
__list_del_entry_valid+0x5c/0xb0
pci_slot_release+0x84/0xe4
kobject_put+0x184/0x1c4
pci_create_slot+0x17c/0x1b4
__pci_hp_initialize+0x68/0xa4
pciehp_probe+0x1a4/0x2fc
pcie_port_probe_service+0x58/0x84
driver_probe_device+0x320/0x470
Initialize slot->list before calling kobject_init_and_add() to avoid this.
Fixes: 8a94644b440e ("PCI: Fix pci_create_slot() reference count leak")
Link: https://lore.kernel.org/r/1606876422-117457-1-git-send-email-zhongjubin@huawei.com
Signed-off-by: Jubin Zhong <zhongjubin@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe6000990394639ed374cb76c313be3640714f47 upstream.
This 2-in-1 model (Product name: Switch SA5-271) features a SW_TABLET_MODE
that works as it would be expected, both when detaching the keyboard and
when folding it behind the tablet body.
It used to work until the introduction of the allow list at
commit 8169bd3e6e193 ("platform/x86: intel-vbtn: Switch to an allow-list
for SW_TABLET_MODE reporting"). Add this model to it, so that the Virtual
Buttons device announces the EV_SW features again.
Fixes: 8169bd3e6e193 ("platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Garnacho <carlosg@gnome.org>
Link: https://lore.kernel.org/r/20201201135727.212917-1-carlosg@gnome.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>