IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit af44a387e743ab7aa39d3fb5e29c0a973cf91bdc upstream.
This partially reverts commit 882213990d32 ("xen: fix p2m size in dom0
for disabled memory hotplug case")
There's no need to special case XEN_UNPOPULATED_ALLOC anymore in order
to correctly size the p2m. The generic memory hotplug option has
already been tied together with the Xen hotplug limit, so enabling
memory hotplug should already trigger a properly sized p2m on Xen PV.
Note that XEN_UNPOPULATED_ALLOC depends on ZONE_DEVICE which pulls in
MEMORY_HOTPLUG.
Leave the check added to __set_phys_to_machine and the adjusted
comment about EXTRA_MEM_RATIO.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-3-roger.pau@citrix.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[boris: fixed formatting issues]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
commit 8249d17d3194eac064a8ca5bc5ca0abc86feecde upstream.
The pfn variable contains the page frame number as returned by the
pXX_pfn() functions, shifted to the right by PAGE_SHIFT to remove the
page bits. After page protection computations are done to it, it gets
shifted back to the physical address using page_level_shift().
That is wrong, of course, because that function determines the shift
length based on the level of the page in the page table but in all the
cases, it was shifted by PAGE_SHIFT before.
Therefore, shift it back using PAGE_SHIFT to get the correct physical
address.
[ bp: Rewrite commit message. ]
Fixes: dfaaec9033b8 ("x86: Add support for changing memory encryption attribute in early boot")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/81abbae1657053eccc535c16151f63cd049dcb97.1616098294.git.isaku.yamahata@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b514ec72706a31bea0c3b97e622b81535b5323a ]
The Xen memory hotplug limit should depend on the memory hotplug
generic option, rather than the Xen balloon configuration. It's
possible to have a kernel with generic memory hotplug enabled, but
without Xen balloon enabled, at which point memory hotplug won't work
correctly due to the size limitation of the p2m.
Rename the option to XEN_MEMORY_HOTPLUG_LIMIT since it's no longer
tied to ballooning.
Fixes: 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-2-roger.pau@citrix.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 141f8202cfa4192c3af79b6cbd68e7760bb01b5a ]
The ppos points to a position in the old kernel memory (and in case of
arm64 in the crash kernel since elfcorehdr is passed as a segment). The
function should update the ppos by the amount that was read. This bug is
not exposed by accident, but other platforms update this value properly.
So, fix it in ARM64 version of elfcorehdr_read() as well.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Fixes: e62aaeac426a ("arm64: kdump: provide /proc/vmcore file")
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210319205054.743368-1-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4817a1b6b77db538bc0141c3b138f2df803ce87 ]
For NAND Ecc layout, there is a dependency from old kernel's nand driver
setting and current. if old kernel use 4 bit ecc , we should use 4 bit
in new kernel either. else will run into following error at filesystem
mounting.
So, enable fsl,use-minimum-ecc from device tree, to fix this mismatch
[ 9.449265] ubi0: scanning is finished
[ 9.463968] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading
22528 bytes from PEB 513:4096, read only 22528 bytes, retry
[ 9.486940] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading
22528 bytes from PEB 513:4096, read only 22528 bytes, retry
[ 9.509906] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading
22528 bytes from PEB 513:4096, read only 22528 bytes, retry
[ 9.532845] ubi0 error: ubi_io_read: error -74 (ECC error) while reading
22528 bytes from PEB 513:4096, read 22528 bytes
Fixes: f9ecf10cb88c ("ARM: dts: imx6ull: add MYiR MYS-6ULX SBC")
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fbfa463be8dc7957ee4f81556e9e1ea2a951807d ]
When I dropped legacy data for omap4 and dra7 smartreflex in favor of
device tree based data, it seems I only testd for the "SmartReflex Class3
initialized" line in dmesg. I missed the fact that there is also
omap_devinit_smartreflex() that happens later, and now it produces an
error on boot for "No Voltage table for the corresponding vdd. Cannot
create debugfs entries for n-values".
This happens as we no longer have the smartreflex instance legacy data,
and have not yet moved completely to device tree based booting for the
driver. Let's fix the issue by changing the smartreflex init to use names.
This should all eventually go away in favor of doing the init in the
driver based on devicetree compatible value.
Note that dra7xx_init_early() is not calling any voltage domain init like
omap54xx_voltagedomains_init(), or a dra7 specific voltagedomains init.
This means that on dra7 smartreflex is still not fully initialized, and
also seems to be missing the related devicetree nodes.
Fixes: a6b1e717e942 ("ARM: OMAP2+: Drop legacy platform data for omap4 smartreflex")
Fixes: e54740b4afe8 ("ARM: OMAP2+: Drop legacy platform data for dra7 smartreflex")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 221c3a09ddf70a0a51715e6c2878d8305e95c558 upstream.
Fix the phy address to 7 for Ethernet PHY on SAMA5D27 SOM1. No
connection established if phy address 0 is used.
The board uses the 24 pins version of the KSZ8081RNA part, KSZ8081RNA
pin 16 REFCLK as PHYAD bit [2] has weak internal pull-down. But at
reset, connected to PD09 of the MPU it's connected with an internal
pull-up forming PHYAD[2:0] = 7.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Fixes: 2f61929eb10a ("ARM: dts: at91: at91-sama5d27_som1: fix PHY ID")
Cc: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c69c8a1736eace8de491d480e6e577a27c2087c upstream.
Fix the whole mux-mask table according to datasheet for the sam9x60
product. Too much functions for pins were disabled leading to
misunderstandings when enabling more peripherals or taking this table
as an example for another board.
Take advantage of this fix to move the mux-mask in the SoC file where it
belongs and use lower case letters for hex numbers like everywhere in
the file.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 1e5f532c2737 ("ARM: dts: at91: sam9x60: add device tree for soc and board")
Cc: <stable@vger.kernel.org> # 5.6+
Cc: Sandeep Sheriker Mallikarjun <sandeepsheriker.mallikarjun@microchip.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20210310152006.15018-1-nicolas.ferre@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 664979bba8169d775959452def968d1a7c03901f upstream.
According to the datasheet PA7 can be set to either function A, B or
C (see table 6-2 of DS60001579D). The previous value would permit just
configuring with function C.
Signed-off-by: Federico Pellegrin <fede@evolware.org>
Fixes: 1e5f532c2737 ("ARM: dts: at91: sam9x60: add device tree for soc and board")
Cc: <stable@vger.kernel.org> # 5.6+
Cc: Sandeep Sheriker Mallikarjun <sandeepsheriker.mallikarjun@microchip.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fb3a074755b7737c4081cffe0ccfa08c2f2d29d upstream.
Crypto engine (CAAM) on LS1043A platform is configured HW-coherent,
mark accordingly the DT node.
Lack of "dma-coherent" property for an IP that is configured HW-coherent
can lead to problems, similar to what has been reported for LS1046A.
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 63dac35b58f4 ("arm64: dts: ls1043a: add crypto node")
Link: https://lore.kernel.org/linux-crypto/fe6faa24-d8f7-d18f-adfa-44fa0caa1598@arm.com
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba8da03fa7dff59d9400250aebd38f94cde3cb0f upstream.
Crypto engine (CAAM) on LS1012A platform is configured HW-coherent,
mark accordingly the DT node.
Lack of "dma-coherent" property for an IP that is configured HW-coherent
can lead to problems, similar to what has been reported for LS1046A.
Cc: <stable@vger.kernel.org> # v4.12+
Fixes: 85b85c569507 ("arm64: dts: ls1012a: add crypto node")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c607ab4f916d4d5259072eca34055d3f5a795c21 upstream.
We recently converted arm64 to use arch_stack_walk() in commit:
5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK")
The core stacktrace code expects that (when tracing the current task)
arch_stack_walk() starts a trace at its caller, and does not include
itself in the trace. However, arm64's arch_stack_walk() includes itself,
and so traces include one more entry than callers expect. The core
stacktrace code which calls arch_stack_walk() tries to skip a number of
entries to prevent itself appearing in a trace, and the additional entry
prevents skipping one of the core stacktrace functions, leaving this in
the trace unexpectedly.
We can fix this by having arm64's arch_stack_walk() begin the trace with
its caller. The first value returned by the trace will be
__builtin_return_address(0), i.e. the caller of arch_stack_walk(). The
first frame record to be unwound will be __builtin_frame_address(1),
i.e. the caller's frame record. To prevent surprises, arch_stack_walk()
is also marked noinline.
While __builtin_frame_address(1) is not safe in portable code, local GCC
developers have confirmed that it is safe on arm64. To find the caller's
frame record, the builtin can safely dereference the current function's
frame record or (in theory) could stash the original FP into another GPR
at function entry time, neither of which are problematic.
Prior to this patch, the tracing code would unexpectedly show up in
traces of the current task, e.g.
| # cat /proc/self/stack
| [<0>] stack_trace_save_tsk+0x98/0x100
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180
After this patch, the tracing code will not show up in such traces:
| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180
Erring on the side of caution, I've given this a spin with a bunch of
toolchains, verifying the output of /proc/self/stack and checking that
the assembly looked sound. For GCC (where we require version 5.1.0 or
later) I tested with the kernel.org crosstool binares for versions
5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and
10.1.0. For clang (where we require version 10.0.1 or later) I tested
with the llvm.org binary releases of 11.0.0, and 11.0.1.
Fixes: 5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jun <chenjun102@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> # 5.10.x
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b318e8decf6b9ef1bcf4ca06fae6d6a2cb5d5c5c ]
Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.
Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created. That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.
Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...
- Nothing is __rcu annotated, making it nigh impossible to audit the
code for correctness.
- kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel
coding style aside, the lack of a smb_rmb() anywhere casts all code
into doubt.
- kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
count before taking the lock.
- kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
The entire approach of updating the live filter is also flawed. While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters. An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.
[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210316184436.2544875-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73f44fe19d359635a607e8e8daa0da4001c1cfc2 ]
When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called. This is
not desirable in general.
Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.
Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.
Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ceb1ace4a2778e34a5414e5349712ae4dc41d85 ]
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via
glibc's syscall() wrapper.
ia64 has two ways to call syscalls from userspace: via `break` and via
`eps` instructions.
The difference is in stack layout:
1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
2. `break` uses userspace stack frame: may be locals (glibc provides
one), in{0..7} == out{0..8}.
Both work fine in syscall handling cde itself.
But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
re-extract syscall arguments but it does not account for locals.
The change always skips locals registers. It should not change `eps`
path as kernel's handler already enforces locals=0 and fixes `break`.
Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org
Link: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5e8b80d352ec999d2bba3ea584f541c83f4ca3f ]
is_no_fault_exception() has two bugs which were discovered via random
opcode testing with stress-ng. Both are caused by improper filtering
of opcodes.
The first bug can be triggered by a floating point store with a no-fault
ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.
The code first tests op3[5] (0x1000000), which denotes a floating
point instruction, and then tests op3[2] (0x200000), which denotes a
store instruction. But these bits are not mutually exclusive, and the
above mentioned opcode has both bits set. The intent is to filter out
stores, so the test for stores must be done first in order to have
any effect.
The second bug can be triggered by a floating point load with one of
the invalid ASI values 0x8e or 0x8f, which pass this check in
is_no_fault_exception():
if ((asi & 0xf2) == ASI_PNF)
An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
in handle_ldf_stq(), and is_no_fault_exception() must not allow these
invalid asi values to make it that far.
In both of these cases, handle_ldf_stq() reacts by calling
sun4v_data_access_exception() or spitfire_data_access_exception(),
which call is_no_fault_exception() and results in an infinite
recursion.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eead089311f4d935ab5d1d8fbb0c42ad44699ada ]
lkp reported a build error in fsp2.o:
CC arch/powerpc/platforms/44x/fsp2.o
{standard input}:577: Error: unsupported relocation against base
Which comes from:
pr_err("GESR0: 0x%08x\n", mfdcr(base + PLB4OPB_GESR0));
Where our mfdcr() macro is stringifying "base + PLB4OPB_GESR0", and
passing that to the assembler, which obviously doesn't work.
The mfdcr() macro already checks that the argument is constant using
__builtin_constant_p(), and if not calls the out-of-line version of
mfdcr(). But in this case GCC is smart enough to notice that "base +
PLB4OPB_GESR0" will be constant, even though it's not something we can
immediately stringify into a register number.
Segher pointed out that passing the register number to the inline asm
as a constant would be better, and in fact it fixes the build error,
presumably because it gives GCC a chance to resolve the value.
While we're at it, change mtdcr() similarly.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210218123058.748882-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit dd926880da8dbbe409e709c1d3c1620729a94732 upstream.
Architectures that describe the CPU topology in devicetree and do not have
an identity mapping between physical and logical CPU ids must override the
default implementation of arch_match_cpu_phys_id().
Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node()
and of_cpu_device_node_get() which several drivers rely on. It also causes
the CPU struct devices exported through sysfs to point to the wrong
devicetree nodes.
On x86, CPUs are described in devicetree using their APIC ids and those
do not generally coincide with the logical ids, even if CPU0 typically
uses APIC id 0.
Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node
lookups work also with SMP.
Apart from fixing the broken sysfs devicetree-node links this likely does
not affect current users of mainline kernels on x86.
Fixes: 4e07db9c8db8 ("x86/devicetree: Use CPU description from Device Tree")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c150ba2fb5995c84a7a43848250d444a3329a7d upstream.
The comment in get_nr_restart_syscall() says:
* The problem is that we can get here when ptrace pokes
* syscall-like values into regs even if we're not in a syscall
* at all.
Yes, but if not in a syscall then the
status & (TS_COMPAT|TS_I386_REGS_POKED)
check below can't really help:
- TS_COMPAT can't be set
- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
32bit debugger; and even in this case get_nr_restart_syscall()
is only correct if the tracee is 32bit too.
Suppose that a 64bit debugger plays with a 32bit tracee and
* Tracee calls sleep(2) // TS_COMPAT is set
* User interrupts the tracee by CTRL-C after 1 sec and does
"(gdb) call func()"
* gdb saves the regs by PTRACE_GETREGS
* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
* PTRACE_CONT // TS_COMPAT is cleared
* func() hits int3.
* Debugger catches SIGTRAP.
* Restore original regs by PTRACE_SETREGS.
* PTRACE_CONT
get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
tracee calls ia32_sys_call_table[219] == sys_madvise.
Add the sticky TS_COMPAT_RESTART flag which survives after return to user
mode. It's going to be removed in the next step again by storing the
information in the restart block. As a further cleanup it might be possible
to remove also TS_I386_REGS_POKED with that.
Test-case:
$ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests
$ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
$ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
$ ./erestartsys-trap-debugger
Unexpected: retval 1, errno 22
erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421
Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174709.GA17895@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66c1b6d74cd7035e85c426f0af4aede19e805c8a upstream.
Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.
It was moved to asm/processor.h by b9d989c7218a ("x86/asm: Move the
thread_info::status field to thread_struct"), then later 37a8f7c38339
("x86/asm: Move 'status' from thread_struct to thread_info") moved the
'status' field back but TS_COMPAT was forgotten.
Preparatory patch to fix the COMPAT case for get_nr_restart_syscall()
Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a501b048a95b79e1e34f03cac3c87ff1e9f229ad upstream.
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
the matrix allocator claimed to be out of vectors. He analyzed it down to
the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
ever routed to the IO/APIC ended up having an interrupt vector assigned
which got moved during unplug of CPU0.
The underlying issue is that IRQ2 for various reasons (see commit
af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
as a reserved system vector by the vector core code and is not accounted as
a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
which causes the IO/APIC setup to claim that interrupt which is granted by
the vector domain because there is no sanity check. As a consequence the
allocation counter of CPU0 underflows which causes a subsequent unplug to
fail with:
[ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
There is another sanity check missing in the matrix allocator, but the
underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
during the conversion to irqdomains.
For almost 6 years nobody complained about this wreckage, which might
indicate that this requirement could be lifted, but for any system which
actually has a PIC IRQ2 is unusable by design so any routing entry has no
effect and the interrupt cannot be connected to a device anyway.
Due to that and due to history biased paranoia reasons restore the IRQ2
ignore logic and treat it as non existent despite a routing entry claiming
otherwise.
Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dc0572f2cef87425147658698dce2600b799bd3 upstream.
On a Haswell machine, the perf_fuzzer managed to trigger this message:
[117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to
write 0x0400000000000000) at rIP: 0xffffffff8106e4f4
(native_write_msr+0x4/0x20)
[117248.089957] Call Trace:
[117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
[117248.097737] intel_pmu_enable_all+0xa/0x10
[117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
[117248.107511] finish_task_switch.isra.0+0x15f/0x280
[117248.112765] schedule_tail+0xc/0x40
[117248.116562] ret_from_fork+0x8/0x30
A fake event called VLBR_EVENT may use the bit 58 of the PEBS_ENABLE, if
the precise_ip is set. The bit 58 is reserved by the HW. Accessing the
bit causes the unchecked MSR access error.
The fake event doesn't support PEBS. The case should be rejected.
Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d88d05a9e0b6d9356e97129d4ff9942d765f46ea upstream.
A repeatable crash can be triggered by the perf_fuzzer on some Haswell
system.
https://lore.kernel.org/lkml/7170d3b-c17f-1ded-52aa-cc6d9ae999f4@maine.edu/
For some old CPUs (HSW and earlier), the PEBS status in a PEBS record
may be mistakenly set to 0. To minimize the impact of the defect, the
commit was introduced to try to avoid dropping the PEBS record for some
cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates
the local pebs_status accordingly. However, it doesn't correct the PEBS
status in the PEBS record, which may trigger the crash, especially for
the large PEBS.
It's possible that all the PEBS records in a large PEBS have the PEBS
status 0. If so, the first get_next_pebs_record_by_bit() in the
__intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
PEBS, the 'count' parameter must > 1. The second
get_next_pebs_record_by_bit() will crash.
Besides the local pebs_status, correct the PEBS status in the PEBS
record as well.
Fixes: 01330d7288e0 ("perf/x86: Allow zero PEBS status with only single active event")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eed5fae00593ab9d261a0c1ffc1bdb786a87a55a upstream.
The code relies on constant folding of cpu_has_feature() based
on possible and always true values as defined per
CPU_FTRS_ALWAYS and CPU_FTRS_POSSIBLE.
Build failure is encountered with for instance
book3e_all_defconfig on kisskb in the AMDGPU driver which uses
cpu_has_feature(CPU_FTR_VSX_COMP) to decide whether calling
kernel_enable_vsx() or not.
The failure is due to cpu_has_feature() not being inlined with
that configuration with gcc 4.9.
In the same way as commit acdad8fb4a15 ("powerpc: Force inlining of
mmu_has_feature to fix build failure"), for inlining of
cpu_has_feature().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b231dfa040ce4cc37f702f5c3a595fdeabfe0462.1615378209.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fc4cac4cfc437659ce445c3c47b807e1cc625b66 ]
Commit 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer
UBSAN") added a possibility to build the entire kernel with UBSAN
instrumentation for MIPS, with the exception for VDSO.
However, self-extracting head wasn't been added to exceptions, so
this occurs:
mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o:
in function `FSE_buildDTable_wksp':
decompress.c:(.text.FSE_buildDTable_wksp+0x278): undefined reference
to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2a8):
undefined reference to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2c4):
undefined reference to `__ubsan_handle_shift_out_of_bounds'
mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o:
decompress.c:(.text.FSE_buildDTable_raw+0x9c): more undefined references
to `__ubsan_handle_shift_out_of_bounds' follow
Add UBSAN_SANITIZE := n to mips/boot/compressed/Makefile to exclude
it from instrumentation scope and fix this issue.
Fixes: 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer UBSAN")
Cc: stable@vger.kernel.org # 5.0+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22b89ba178dd0a66a26699ead014a3e73ff8e044 ]
Commit 8813ff49607e ("powerpc/sstep: Check instruction validity
against ISA version before emulation") introduced a proper way to skip
unknown instructions. This makes sure that the same is used for the
darn instruction when the range selection bits have a reserved value.
Fixes: a23987ef267a ("powerpc: sstep: Add support for darn instruction")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210204080744.135785-2-sandipan@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbda4b6c7d7c7f79da71f95c92a5d76be22c3efd ]
The Power ISA says that the fixed-point load and update instructions
must neither use R0 for the base address (RA) nor have the
destination (RT) and the base address (RA) as the same register.
Similarly, for fixed-point stores and floating-point loads and stores,
the instruction is invalid when R0 is used as the base address (RA).
This is applicable to the following instructions.
* Load Byte and Zero with Update (lbzu)
* Load Byte and Zero with Update Indexed (lbzux)
* Load Halfword and Zero with Update (lhzu)
* Load Halfword and Zero with Update Indexed (lhzux)
* Load Halfword Algebraic with Update (lhau)
* Load Halfword Algebraic with Update Indexed (lhaux)
* Load Word and Zero with Update (lwzu)
* Load Word and Zero with Update Indexed (lwzux)
* Load Word Algebraic with Update Indexed (lwaux)
* Load Doubleword with Update (ldu)
* Load Doubleword with Update Indexed (ldux)
* Load Floating Single with Update (lfsu)
* Load Floating Single with Update Indexed (lfsux)
* Load Floating Double with Update (lfdu)
* Load Floating Double with Update Indexed (lfdux)
* Store Byte with Update (stbu)
* Store Byte with Update Indexed (stbux)
* Store Halfword with Update (sthu)
* Store Halfword with Update Indexed (sthux)
* Store Word with Update (stwu)
* Store Word with Update Indexed (stwux)
* Store Doubleword with Update (stdu)
* Store Doubleword with Update Indexed (stdux)
* Store Floating Single with Update (stfsu)
* Store Floating Single with Update Indexed (stfsux)
* Store Floating Double with Update (stfdu)
* Store Floating Double with Update Indexed (stfdux)
E.g. the following behaviour is observed for an invalid load and
update instruction having RA = RT.
While a userspace program having an instruction word like 0xe9ce0001,
i.e. ldu r14, 0(r14), runs without getting receiving a SIGILL on a
Power system (observed on P8 and P9), the outcome of executing that
instruction word varies and its behaviour can be considered to be
undefined.
Attaching an uprobe at that instruction's address results in emulation
which currently performs the load as well as writes the effective
address back to the base register. This might not match the outcome
from hardware.
To remove any inconsistencies, this adds additional checks for the
aforementioned instructions to make sure that the emulation
infrastructure treats them as unknown. The kernel can then fallback to
executing such instructions on hardware.
Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210204080744.135785-1-sandipan@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a5406a7ff56e63376c210b06072aa0ef23473366 upstream.
There are two issues for RV32,
1) if use FLATMEM, it is useless to enable SPARSEMEM_STATIC.
2) if use SPARSMEM, both SPARSEMEM_VMEMMAP and SPARSEMEM_STATIC is enabled.
Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b13525c20febcfecccf6fc1db5969727401317d upstream.
In commit 05bc1be6db4b2 ("s390/pci: create zPCI bus") we removed the
pci_dev_put() call matching the earlier pci_get_slot() done as part of
__zpci_event_availability(). This was based on the wrong understanding
that the device_put() done as part of pci_destroy_device() would counter
the pci_get_slot() when it only counters the initial reference. This
same understanding and existing bad example also lead to not doing
a pci_dev_put() in zpci_remove_device().
Since releasing the PCI devices, unlike releasing the PCI slot, does not
print any debug message for testing I added one in pci_release_dev().
This revealed that we are indeed leaking the PCI device on PCI
hotunplug. Further testing also revealed another missing pci_dev_put() in
disable_slot().
Fix this by adding the missing pci_dev_put() in disable_slot() and fix
zpci_remove_device() with the correct pci_dev_put() calls. Also instead
of calling pci_get_slot() in __zpci_event_availability() to determine if
a PCI device is registered and then doing the same again in
zpci_remove_device() do this once in zpci_remove_device() which makes
sure that the pdev in __zpci_event_availability() is only used for the
result of pci_scan_single_device() which does not need a reference count
decremnt as its ownership goes to the PCI bus.
Also move the check if zdev->zbus->bus is set into zpci_remove_device()
since it may be that we're removing a device with devfn != 0 which never
had a PCI bus. So we can still set the pdev->error_state to indicate
that the device is not usable anymore, add a flag to set the error state.
Fixes: 05bc1be6db4b2 ("s390/pci: create zPCI bus")
Cc: <stable@vger.kernel.org> # 5.8+: e1bff843cde6 s390/pci: remove superfluous zdev->zbus check
Cc: <stable@vger.kernel.org> # 5.8+: ba764dd703fe s390/pci: refactor zpci_create_device()
Cc: <stable@vger.kernel.org> # 5.8+
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1bff843cde62a45a287b7f9b4cd5e824e8e49e2 upstream.
Checking zdev->zbus for NULL in __zpci_event_availability() is
superfluous as it can never be NULL at this point. While harmless this
check causes smatch warnings because we later access zdev->zbus with
only having checked zdev != NULL which is sufficient.
The reason zdev->zbus can never be NULL is since with zdev != NULL given
we know the zdev came from get_zdev_by_fid() and thus the zpci_list.
Now on first glance at zpci_create_device() one may assume that there is
a window where the zdev is in the list without a zdev, however this
window can't overlap with __zpci_event_availability() as
zpci_create_device() either runs on the same kthread as part of
availability events, or during the initial CLP List PCI at which point
the __zpci_event_availability() is not yet called as zPCI is not yet
initialized.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba764dd703feacb5a9c410d191af1b6cfbe96845 upstream.
Currently zpci_create_device() is only called in clp_add_pci_device()
which allocates the memory for the struct zpci_dev being created. There
is little separation of concerns as only both functions together can
create a zpci_dev and the only CLP specific code in
clp_add_pci_device() is a call to clp_query_pci_fn().
Improve this by removing clp_add_pci_device() and refactor
zpci_create_device() such that it alone creates and initializes the
zpci_dev given the FID and Function Handle. For this we need to make
clp_query_pci_fn() non-static. While at it remove the function handle
parameter since we can just take that from the zpci_dev. Also move
adding to the zpci_list to after the zdev has been fully created which
eliminates a window where a partially initialized zdev can be found by
get_zdev_by_fid().
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d54cb7d54877d529bc1e0e1f47a3dd082f73add3 upstream.
Commit 152e9b8676c6e ("s390/vtime: steal time exponential moving average")
inadvertently changed the input value for account_steal_time() from
"cputime_to_nsecs(steal)" to just "steal", resulting in broken increased
steal time accounting.
Fix this by changing it back to "cputime_to_nsecs(steal)".
Fixes: 152e9b8676c6e ("s390/vtime: steal time exponential moving average")
Cc: <stable@vger.kernel.org> # 5.1
Reported-by: Sabine Forkel <sabine.forkel@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 86ad60a65f29dd862a11c22bb4b5be28d6c5cef1 ]
The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.
Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.
As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)
Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 032d049ea0f45b45c21f3f02b542aa18bc6b6428 ]
CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg
instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions).
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit c4b000c3928d4f20acef79dccf3a65ae3795e0b0 upstream.
When panicking from the nVHE hyp and restoring the host context, x29 is
expected to hold a pointer to the host context. This wasn't being done
so fix it to make sure there's a valid pointer the host context being
used.
Rather than passing a boolean indicating whether or not the host context
should be restored, instead pass the pointer to the host context. NULL
is passed to indicate that no context should be restored.
Fixes: a2e102e20fd6 ("KVM: arm64: nVHE: Handle hyp panics")
Cc: stable@vger.kernel.org # 5.10.y only
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210219122406.1337626-1-ascull@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 01dc9262ff5797b675c32c0c6bc682777d23de05 upstream.
It recently became apparent that the ARMv8 architecture has interesting
rules regarding attributes being used when fetching instructions
if the MMU is off at Stage-1.
In this situation, the CPU is allowed to fetch from the PoC and
allocate into the I-cache (unless the memory is mapped with
the XN attribute at Stage-2).
If we transpose this to vcpus sharing a single physical CPU,
it is possible for a vcpu running with its MMU off to influence
another vcpu running with its MMU on, as the latter is expected to
fetch from the PoU (and self-patching code doesn't flush below that
level).
In order to solve this, reuse the vcpu-private TLB invalidation
code to apply the same policy to the I-cache, nuking it every time
the vcpu runs on a physical CPU that ran another vcpu of the same
VM in the past.
This involve renaming __kvm_tlb_flush_local_vmid() to
__kvm_flush_cpu_context(), and inserting a local i-cache invalidation
there.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 262b003d059c6671601a19057e9fe1a5e7f23722 upstream.
When registering a memslot, we check the size and location of that
memslot against the IPA size to ensure that we can provide guest
access to the whole of the memory.
Unfortunately, this check rejects memslot that end-up at the exact
limit of the addressing capability for a given IPA size. For example,
it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit
IPA space.
Fix it by relaxing the check to accept a memslot reaching the
limit of the IPA space.
Fixes: c3058d5da222 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d717558dd5ef10d28866750d5c24ff892ea3778 upstream.
KVM/arm64 has forever used a 40bit default IPA space, partially
due to its 32bit heritage (where the only choice is 40bit).
However, there are implementations in the wild that have a *cough*
much smaller *cough* IPA space, which leads to a misprogramming of
VTCR_EL2, and a guest that is stuck on its first memory access
if userspace dares to ask for the default IPA setting (which most
VMMs do).
Instead, blundly reject the creation of such VM, as we can't
satisfy the requirements from userspace (with a one-off warning).
Also clarify the boot warning, and document that the VM creation
will fail when an unsupported IPA size is provided.
Although this is an ABI change, it doesn't really change much
for userspace:
- the guest couldn't run before this change, but no error was
returned. At least userspace knows what is happening.
- a memory slot that was accepted because it did fit the default
IPA space now doesn't even get a chance to be registered.
The other thing that is left doing is to convince userspace to
actually use the IPA space setting instead of relying on the
antiquated default.
Fixes: 233a7cb23531 ("kvm: arm64: Allow tuning the physical address size for VM")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b96b0c5de685df82019e16826a282d53d86d112c upstream.
The nVHE KVM hyp drains and disables the SPE buffer, before
entering the guest, as the EL1&0 translation regime
is going to be loaded with that of the guest.
But this operation is performed way too late, because :
- The owning translation regime of the SPE buffer
is transferred to EL2. (MDCR_EL2_E2PB == 0)
- The guest Stage1 is loaded.
Thus the flush could use the host EL1 virtual address,
but use the EL2 translations instead of host EL1, for writing
out any cached data.
Fix this by moving the SPE buffer handling early enough.
The restore path is doing the right thing.
Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow")
Cc: stable@vger.kernel.org
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poulose@arm.com
Message-Id: <20210305185254.3730990-2-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31948332d5fa392ad933f4a6a10026850649ed76 upstream.
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest
context") tracks the currently running vCPU, clearing the pointer to
NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the
kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS
code to go off into the weeds when it saves the DISR assuming that the
CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing
the vCPU on the guest exit path.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andrew Scull <ascull@google.com>
Cc: <stable@vger.kernel.org>
Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context")
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210226181211.14542-1-will@kernel.org
Message-Id: <20210305185254.3730990-3-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 357ad203d45c0f9d76a8feadbd5a1c5d460c638b upstream.
When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.
This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.
Properly aligning the address before the next iteration addresses
this issue.
Cc: stable@vger.kernel.org
Reported-by: Howard Zhang <Howard.Zhang@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com
Message-Id: <20210305185254.3730990-9-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7eb79c6290c7ae4561418544072e0a3266e7384 upstream.
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-63
Off-line CPU(s) list: 64-87
# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro
rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall
# echo 1 > /sys/devices/system/cpu/cpu76/online
-bash: echo: write error: Cannot allocate memory
The per-cpu vsyscall pvclock data pointer assigns either an element of the
static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory
hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if
kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in
kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall
and sometimes you end up with vsyscall disabled if the host does something
strange. This patch fixes it by allocating this dynamically memory
unconditionally even if vsyscall is disabled.
Fixes: 6a1cac56f4 ("x86/kvm: Use __bss_decrypted attribute in shared variables")
Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org#v4.19-rc5+
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit beda430177f56656e7980dcce93456ffaa35676b upstream.
When posting a deadline timer interrupt, open code the checks guarding
__kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected()
check in kvm_wait_lapic_expire(). The injection check will always fail
since the interrupt has not yet be injected. Moving the call after
injection would also be wrong as that wouldn't actually delay delivery
of the IRQ if it is indeed sent via posted interrupt.
Fixes: 010fd37fddf6 ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305021808.3769732-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d5675df792ff67e74a500c4c94db0f99e6a10ef upstream.
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work. This confuses audit and ptrace. For example:
$ ./tools/testing/selftests/x86/syscall_arg_fault_32
...
strace: pid 264258: entering, ptrace_syscall_info.op == 2
...
This is a minimal fix intended for ease of backporting. A more
complete cleanup is coming.
Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bffe30dd9f1f3b2608a87ac909a224d6be472485 upstream.
The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().
Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.
Fixes: 5e3427a7bc432 ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6be002bcd1dd1dedb926abf3c90c794eacb77dc upstream.
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's
not any different on other architectures. Also the extra state type is not
necessary, irqentry_state_t can carry the necessary information as well.
Move it to common code and extend irqentry_state_t to carry lockdep state.
[ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive
between the IRQ and NMI exceptions, and add kernel documentation for
struct irqentry_state_t ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>