linux

iv/linux

Author	SHA1	Message	Date
Michael Ellerman	d93e5e2d03	powerpc/64: Update Speculation_Store_Bypass in /proc/<pid>/status Currently we don't report anything useful in /proc/<pid>/status: $ grep Speculation_Store_Bypass /proc/self/status Speculation_Store_Bypass: unknown Our mitigation is currently always a barrier instruction, which doesn't map that well onto the existing possibilities for the PR_SPEC values. However even if we added a "barrier" type PR_SPEC value, userspace would still need to consult some other source to work out which type of barrier to use. So reporting "vulnerable" seems sufficient, as userspace can see that and then consult its source to determine what barrier to use. Signed-off-by: Gustavo Walbon <gwalbon@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402124929.3574166-1-mpe@ellerman.id.au	2020-05-18 21:58:43 +10:00
Michael Ellerman	93900337b9	drivers/macintosh: Fix memleak in windfarm_pm112 driver create_cpu_loop() calls smu_sat_get_sdb_partition() which does kmalloc() and returns the allocated buffer. In fact it's called twice, and neither buffer is freed. This results in a memory leak as reported by Erhard: unreferenced object 0xc00000047081f840 (size 32): comm "kwindfarm", pid 203, jiffies 4294880630 (age 5552.877s) hex dump (first 32 bytes): c8 06 02 7f ff 02 ff 01 fb bf 00 41 00 20 00 00 ...........A. .. 00 07 89 37 00 a0 00 00 00 00 00 00 00 00 00 00 ...7............ backtrace: [<0000000083f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 [windfarm_smu_sat] [<000000003010fcb7>] .pm112_wf_notify+0x104c/0x13bc [windfarm_pm112] [<00000000b958b2dd>] .notifier_call_chain+0xa8/0x180 [<0000000070490868>] .blocking_notifier_call_chain+0x64/0x90 [<00000000131d8149>] .wf_thread_func+0x114/0x1a0 [<000000000d54838d>] .kthread+0x13c/0x190 [<00000000669b72bc>] .ret_from_kernel_thread+0x58/0x64 unreferenced object 0xc0000004737089f0 (size 16): comm "kwindfarm", pid 203, jiffies 4294880879 (age 5552.050s) hex dump (first 16 bytes): c4 04 01 7f 22 11 e0 e6 ff 55 7b 12 ec 11 00 00 ...."....U{..... backtrace: [<0000000083f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 [windfarm_smu_sat] [<00000000b94ef7e1>] .pm112_wf_notify+0x1294/0x13bc [windfarm_pm112] [<00000000b958b2dd>] .notifier_call_chain+0xa8/0x180 [<0000000070490868>] .blocking_notifier_call_chain+0x64/0x90 [<00000000131d8149>] .wf_thread_func+0x114/0x1a0 [<000000000d54838d>] .kthread+0x13c/0x190 [<00000000669b72bc>] .ret_from_kernel_thread+0x58/0x64 Fix it by rearranging the logic so we deal with each buffer separately, which then makes it easy to free the buffer once we're done with it. Fixes: ac171c46667c ("[PATCH] powerpc: Thermal control for dual core G5s") Cc: stable@vger.kernel.org # v2.6.16+ Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Erhard F. <erhard_f@mailbox.org> Link: https://lore.kernel.org/r/20200423060038.3308530-1-mpe@ellerman.id.au	2020-05-15 11:58:56 +10:00
Michael Ellerman	7481cad474	selftests/powerpc: Add a test of counting larx/stcx This is based on the count_instructions test. However this one also counts the number of failed stcx's, and in conjunction with knowing the size of the stcx loop, can calculate the total number of instructions executed even in the face of non-deterministic stcx failures. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200426114410.3917383-1-mpe@ellerman.id.au	2020-05-15 11:58:55 +10:00
Michael Ellerman	24ac99e97f	powerpc: Drop unneeded cast in task_pt_regs() There's no need to cast in task_pt_regs() as tsk->thread.regs should already be a struct pt_regs. If someone's using task_pt_regs() on something that's not a task but happens to have a thread.regs then we'll deal with them later. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123152.73566-1-mpe@ellerman.id.au	2020-05-15 11:58:55 +10:00
Michael Ellerman	7ffa8b7dc1	powerpc/64: Don't initialise init_task->thread.regs Aneesh increased the size of struct pt_regs by 16 bytes and started seeing this WARN_ON: smp: Bringing up secondary CPUs ... ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 NIP: c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000 REGS: c0000000026d3980 TRAP: 0700 Not tainted (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48048224 XER: 00000000 CFAR: c000000000019cc8 IRQMASK: 1 GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033 GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164 GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030 GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066 GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890 GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080 GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90 GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200 NIP [c00000000001a2b4] giveup_all+0xb4/0x110 LR [c00000000001a29c] giveup_all+0x9c/0x110 Call Trace: [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable) [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480 [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970 [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70 [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0 [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40 [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8 [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0 [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400 Which was unexpected. The warning is checking the thread.regs->msr value of the task we are switching from: usermsr = tsk->thread.regs->msr; ... WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set. Dumping tsk->thread.regs->msr we see that it's: 0x1db60000 Which is not a normal looking MSR, in fact the only valid bit is MSR_VSX, all the other bits are reserved in the current definition of the MSR. We can see from the oops that it was swapper/0 that we were switching from when we hit the warning, ie. init_task. So its thread.regs points to the base (high addresses) in init_stack. Dumping the content of init_task->thread.regs, with the members of pt_regs annotated (the 16 bytes larger version), we see: 0000000000000000 c000000002780080 gpr[0] gpr[1] 0000000000000000 c000000002666008 gpr[2] gpr[3] c0000000026d3ed0 0000000000000078 gpr[4] gpr[5] c000000000011b68 c000000002780080 gpr[6] gpr[7] 0000000000000000 0000000000000000 gpr[8] gpr[9] c0000000026d3f90 0000800000002200 gpr[10] gpr[11] c000000002004820 c0000000026d7200 gpr[12] gpr[13] 000000001db60000 c0000000010aabe8 gpr[14] gpr[15] c0000000010aabe8 c0000000010aabe8 gpr[16] gpr[17] c00000000294d598 0000000000000000 gpr[18] gpr[19] 0000000000000000 0000000000001ff8 gpr[20] gpr[21] 0000000000000000 c00000000206d608 gpr[22] gpr[23] c00000000278e0cc 0000000000000000 gpr[24] gpr[25] 000000002fff0000 c000000000000000 gpr[26] gpr[27] 0000000002000000 0000000000000028 gpr[28] gpr[29] 000000001db60000 0000000004750000 gpr[30] gpr[31] 0000000002000000 000000001db60000 nip msr 0000000000000000 0000000000000000 orig_r3 ctr c00000000000c49c 0000000000000000 link xer 0000000000000000 0000000000000000 ccr softe 0000000000000000 0000000000000000 trap dar 0000000000000000 0000000000000000 dsisr result 0000000000000000 0000000000000000 ppr kuap 0000000000000000 0000000000000000 pad[2] pad[3] This looks suspiciously like stack frames, not a pt_regs. If we look closely we can see return addresses from the stack trace above, c000000002004820 (start_kernel) and c00000000000c49c (start_here_common). init_task->thread.regs is setup at build time in processor.h: #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs )INIT_SP - 1, / XXX bogus, I think / \ The early boot code where we setup the initial stack is: LOAD_REG_ADDR(r3,init_thread_union) / set up a stack pointer */ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 li r0,0 stdu r0,-STACK_FRAME_OVERHEAD(r1) Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD). Which is far too small to contain a pt_regs. So the result is init_task->thread.regs is pointing at some stack frames on the init stack, not at a pt_regs. We have gotten away with this for so long because with pt_regs at its current size the MSR happens to point into the first frame, at a location that is not written to by the early asm. With the 16 byte expansion the MSR falls into the second frame, which is used by the compiler, and collides with a saved register that tends to be non-zero. As far as I can see this has been wrong since the original merge of 64-bit ppc support, back in 2002. Conceptually swapper should have no regs, it never entered from userspace, and in fact that's what we do on 32-bit. It's also presumably what the "bogus" comment is referring to. So I think the right fix is to just not-initialise regs at all. I'm slightly worried this will break some code that isn't prepared for a NULL regs, but we'll have to see. Remove the comment in head_64.S which refers to us setting up the regs (even though we never did), and is otherwise not really accurate any more. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Gustavo A. R. Silva	02bddf21c3	powerpc/mm: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507185755.GA15014@embeddedor	2020-05-15 11:58:54 +10:00
Gustavo A. R. Silva	0f6be41c60	powerpc: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507185749.GA14994@embeddedor	2020-05-15 11:58:54 +10:00
Nicholas Piggin	4e0e45b07d	powerpc: Use trap metadata to prevent double restart rather than zeroing trap It's not very nice to zero trap for this, because then system calls no longer have trap_is_syscall(regs) invariant, and we can't distinguish between sc and scv system calls (in a later patch). Take one last unused bit from the low bits of the pt_regs.trap word for this instead. There is not a really good reason why it should be in trap as opposed to another field, but trap has some concept of flags and it exists. Ideally I think we would move trap to 2-byte field and have 2 more bytes available independently. Add a selftests case for this, which can be seen to fail if trap_norestart() is changed to return false. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make them static inlines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-4-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	912237ea16	powerpc: trap_is_syscall() helper to hide syscall trap number A new system call interrupt will be added with a new trap number. Hide the explicit 0xc00 test behind an accessor to reduce churn in callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-3-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	db30144b5c	powerpc: Use set_trap() and avoid open-coding trap masking The pt_regs.trap field keeps 4 low bits for some metadata about the trap or how it was handled, which is masked off in order to test the architectural trap number. Add a set_trap() accessor to set this, equivalent to TRAP() for returning it. This is actually not quite the equivalent of TRAP() because it always clears the low bits, which may be harmless if it can only be updated via ptrace syscall, but it seems dangerous. In fact settting TRAP from ptrace doesn't seem like a great idea so maybe it's better deleted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline rather than a shouty macro] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-2-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	feb9df3462	powerpc/64s: Always has full regs, so remove remnant checks Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-1-mpe@ellerman.id.au	2020-05-15 11:58:53 +10:00
Christophe Leroy	edbadaf067	powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT When CONFIG_KASAN is selected, the stack usage is increased. In the same way as x86 and arm64 architectures, increase THREAD_SHIFT when CONFIG_KASAN is selected. Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Reported-by: <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=207129 Link: https://lore.kernel.org/r/2c50f3b1c9bbaa4217c9a98f3044bd2a36c46a4f.1586361277.git.christophe.leroy@c-s.fr	2020-05-11 23:15:16 +10:00
Christophe Leroy	4cdb2da654	powerpc: Remove _ALIGN_UP(), _ALIGN_DOWN() and _ALIGN() These three powerpc macros have been replaced by equivalent generic macros and are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bb0a6081f7b95ee64ca20f92483e5b9661cbacb2.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:16 +10:00
Christophe Leroy	d3f3d3bf76	powerpc: Replace _ALIGN() by ALIGN() _ALIGN() is specific to powerpc ALIGN() is generic and does the same Replace _ALIGN() by ALIGN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/4006d9c8e69f8eaccee954899f6b5fb76240d00b.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:16 +10:00
Christophe Leroy	b711531641	powerpc: Replace _ALIGN_UP() by ALIGN() _ALIGN_UP() is specific to powerpc ALIGN() is generic and does the same Replace _ALIGN_UP() by ALIGN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/8a6d7e45f7904c73a0af539642d3962e2a3c7268.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:15 +10:00
Christophe Leroy	e96d904ede	powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN() _ALIGN_DOWN() is specific to powerpc ALIGN_DOWN() is generic and does the same Replace _ALIGN_DOWN() by ALIGN_DOWN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/3911a86d6b5bfa7ad88cd7c82416fbe6bb47e793.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:15 +10:00
Christophe Leroy	7bfc3c84cb	drivers/powerpc: Replace _ALIGN_UP() by ALIGN() _ALIGN_UP() is specific to powerpc ALIGN() is generic and does the same Replace _ALIGN_UP() by ALIGN() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/a5945463f86c984151962a475a3ee56a2893e85d.1587407777.git.christophe.leroy@c-s.fr	2020-05-11 23:15:15 +10:00
Christophe Leroy	679d74abc4	powerpc/8xx: Update email address in MAINTAINERS Since 01 May 2020, our email adresses have changed to @csgroup.eu Update MAINTAINERS accordingly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9fd0f9a827ebbeae64ad7a6f6c595d242f4dd5fc.1588747860.git.christophe.leroy@csgroup.eu	2020-05-11 23:15:15 +10:00
Wolfram Sang	ad0f522df1	powerpc/5200: update contact email My 'pengutronix' address is defunct for years. Merge the entries and use the proper contact address. Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200502142642.18979-1-wsa@kernel.org	2020-05-11 23:15:14 +10:00
Andrey Abramov	bac7ca7b98	powerpc: module_[32\|64].c: replace swap function with built-in one Replace relaswap with built-in one, because relaswap does a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov <st5pub@yandex.ru> Reviewed-by: George Spelvin <lkml@sdf.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/994931554238042@iva8-b333b7f98ab0.qloud-c.yandex.net	2020-05-11 23:15:14 +10:00
Christophe JAILLET	2f62870ca5	powerpc/powernv: Fix a warning message Fix a cut'n'paste error in a warning message. This should be 'cpu-idle-state-residency-ns' to match the property searched in the previous 'of_property_read_u32_array()' Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200502115949.139000-1-christophe.jaillet@wanadoo.fr	2020-05-11 23:15:14 +10:00
Michael Ellerman	1f12096aca	Merge the lockless page table walk rework into next This merges the lockless page table walk rework series from Aneesh. Because it touches powerpc KVM code we are sharing it with the kvm-ppc tree in our topic/ppc-kvm branch. This is the cover letter from Aneesh: Avoid IPI while updating page table entries. Problem Summary: Slow termination of KVM guest with large guest RAM config due to a large number of IPIs that were caused by clearing level 1 PTE entries (THP) entries. This is shown in the stack trace below. - qemu-system-ppc [kernel.vmlinux] [k] smp_call_function_many - smp_call_function_many - 36.09% smp_call_function_many serialize_against_pte_lookup radix__pmdp_huge_get_and_clear zap_huge_pmd unmap_page_range unmap_vmas unmap_region __do_munmap __vm_munmap sys_munmap system_call __munmap qemu_ram_munmap qemu_anon_ram_free reclaim_ramblock call_rcu_thread qemu_thread_start start_thread __clone Why we need to do IPI when clearing PMD entries: This was added as part of commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk") serialize_against_pte_lookup makes sure that all parallel lockless page table walk completes before we convert a PMD pte entry to regular pmd entry. We end up doing that conversion in the below scenarios 1) __split_huge_zero_page_pmd 2) do_huge_pmd_wp_page_fallback 3) MADV_DONTNEED running parallel to page faults. local_irq_disable and lockless page table walk: The lockless page table walk work with the assumption that we can dereference the page table contents without holding a lock. For this to work, we need to make sure we read the page table contents atomically and page table pages are not going to be freed/released while we are walking the table pages. We can achieve by using a rcu based freeing for page table pages or if the architecture implements broadcast tlbie, we can block the IPI as we walk the page table pages. To support both the above framework, lockless page table walk is done with irq disabled instead of rcu_read_lock() We do have two interface for lockless page table walk, gup fast and __find_linux_pte. This patch series makes __find_linux_pte table walk safe against the conversion of PMD PTE to regular PMD. gup fast: gup fast is already safe against THP split because kernel now differentiate between a pmd split and a compound page split. gup fast can run parallel to a pmd split and we prevent a parallel gup fast to a hugepage split, by freezing the page refcount and failing the speculative page ref increment. Similar to how gup is safe against parallel pmd split, this patch series updates the __find_linux_pte callers to be safe against a parallel pmd split. We do that by enforcing the following rules. 1) Don't reload the pte value, because that can be updated in parallel. 2) Code should be able to work with a stale PTE value and not the recent one. ie, the pte value that we are looking at may not be the latest value in the page table. 3) Before looking at pte value check for _PAGE_PTE bit. We now do this as part of pte_present() check. Performance: This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch (upto removing IPI) munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms With patch (with adding the tlb invalidate in pmdp_huge_get_and_clear_full) munmap start: timer = 196345 ms, PID=6879 munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms Link: https://lore.kernel.org/r/20200505071729.54912-1-aneesh.kumar@linux.ibm.com	2020-05-06 15:53:24 +10:00
Aneesh Kumar K.V	75358ea359	powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race MADV_DONTNEED holds mmap_sem in read mode and that implies a parallel page fault is possible and the kernel can end up with a level 1 PTE entry (THP entry) converted to a level 0 PTE entry without flushing the THP TLB entry. Most architectures including POWER have issues with kernel instantiating a level 0 PTE entry while holding level 1 TLB entries. The code sequence I am looking at is down_read(mmap_sem) down_read(mmap_sem) zap_pmd_range() zap_huge_pmd() pmd lock held pmd_cleared table details added to mmu_gather pmd_unlock() insert a level 0 PTE entry() tlb_finish_mmu(). Fix this by forcing a tlb flush before releasing pmd lock if this is not a fullmm invalidate. We can safely skip this invalidate for task exit case (fullmm invalidate) because in that case we are sure there can be no parallel fault handlers. This do change the Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms With patch: munmap start: timer = 196345 ms, PID=6879 munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V	e21dfbf013	powerpc/mm/book3s64: Avoid sending IPI on clearing PMD Now that all the lockless page table walk is careful w.r.t the PTE address returned, we can now revert commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk.") We also drop the equivalent IPI from other pte updates routines. We still keep IPI in hash pmdp collapse and that is to take care of parallel hash page table insert. The radix pmdp collapse flush can possibly be removed once I am sure generic code doesn't have the any expectations around parallel gup walk. This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-21-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V	0e11df9649	powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check This adds _PAGE_PTE check and makes sure we validate the pte value returned via find_kvm_host_pte. NOTE: this also considers _PAGE_INVALID to the software valid bit. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-20-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V	9fd4236faa	powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-19-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V	bda3deaa6f	powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-18-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V	3ff8df1430	powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update. We now depend on kvm->mmu_lock Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-17-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	7769a3394b	powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functions Current code just hold rmap lock to ensure parallel page table update is prevented. That is not sufficient. The kernel should also check whether a mmu_notifer callback was running in parallel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-16-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	e3d8ed5518	powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter Since kvmppc_do_h_enter can get called in realmode use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-15-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	9781e759b3	powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-14-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	35528876a9	powerpc/kvm/book3s: Add helper for host page table walk Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-13-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	6cdf30375f	powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table update kvmppc_hv_handle_set_rc to use find_kvm_nested_guest_pte and find_kvm_secondary_pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-12-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	dc891849e0	powerpc/kvm/nested: Add helper to walk nested shadow linux page table. The locking rules for walking nested shadow linux page table is different from process scoped table. Hence add a helper for nested page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-11-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	4b99412ed6	powerpc/kvm/book3s: Add helper to walk partition scoped linux page table. The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V	87013f9c60	powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock. These functions can get called in realmode. Hence use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-9-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	15759cb054	powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow read_user_stack_slow is called with interrupts soft disabled and it copies contents from the page which we find mapped to a specific address. To convert userspace address to pfn, the kernel now uses lockless page table walk. The kernel needs to make sure the pfn value read remains stable and is not released and reused for another process while the contents are read from the page. This can only be achieved by holding a page reference. One of the first approaches I tried was to check the pte value after the kernel copies the contents from the page. But as shown below we can still get it wrong CPU0 CPU1 pte = READ_ONCE(ptep); pte_clear(pte); put_page(page); page = alloc_page(); memcpy(page_address(page), "secret password", nr); memcpy(buf, kaddr + offset, nb); put_page(page); handle_mm_fault() page = alloc_page(); set_pte(pte, page); if (pte_val(pte) != pte_val(ptep)) Hence switch to __get_user_pages_fast. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	0da81b658b	powerpc/mce: Don't reload pte val in addr_to_pfn A lockless page table walk should be safe against parallel THP collapse, THP split and madvise(MADV_DONTNEED)/parallel fault. This patch makes sure kernel won't reload the pteval when checking for different conditions. The patch also added a check for pte_present to make sure the kernel is indeed operating on a PTE and not a pointer to level 0 table page. The pfn value we find here can be different from the actual pfn on which machine check happened. This can happen if we raced with a parallel update of the page table. In such a scenario we end up isolating a wrong pfn. But that doesn't have any other side effect. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-7-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	2f92447f9f	powerpc/book3s64/hash: Use the pte_t address from the caller Don't fetch the pte value using lockless page table walk. Instead use the value from the caller. hash_preload is called with ptl lock held. So it is safe to use the pte_t address directly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-6-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	7900757ce1	powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range This is only used with init_mm currently. Walking init_mm is much simpler because we don't need to handle concurrent page table like other mm_context Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-5-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	ec4abf1e70	powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present This makes the pte_present check stricter by checking for additional _PAGE_PTE bit. A level 1 pte pointer (THP pte) can be switched to a pointer to level 0 pte page table page by following two operations. 1) THP split. 2) madvise(MADV_DONTNEED) in parallel to page fault. A lockless page table walk need to make sure we can handle such changes gracefully. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-4-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	c46241a370	powerpc/pkeys: Check vma before returning key fault error to the user If multiple threads in userspace keep changing the protection keys mapping a range, there can be a scenario where kernel takes a key fault but the pkey value found in the siginfo struct is a permissive one. This can confuse the userspace as shown in the below test case. /* use this to control the number of test iterations / static void pkeyreg_set(int pkey, unsigned long rights) { unsigned long reg, shift; shift = (NR_PKEYS - pkey - 1) PKEY_BITS_PER_PKEY; asm volatile("mfspr %0, 0xd" : "=r"(reg)); reg &= ~(((unsigned long) PKEY_BITS_MASK) << shift); reg \|= (rights & PKEY_BITS_MASK) << shift; asm volatile("mtspr 0xd, %0" : : "r"(reg)); } static unsigned long pkeyreg_get(void) { unsigned long reg; asm volatile("mfspr %0, 0xd" : "=r"(reg)); return reg; } static int sys_pkey_mprotect(void addr, size_t len, int prot, int pkey) { return syscall(SYS_pkey_mprotect, addr, len, prot, pkey); } static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights) { return syscall(SYS_pkey_alloc, flags, access_rights); } static int sys_pkey_free(int pkey) { return syscall(SYS_pkey_free, pkey); } static int faulting_pkey; static int permissive_pkey; static pthread_barrier_t pkey_set_barrier; static pthread_barrier_t mprotect_barrier; static void pkey_handle_fault(int signum, siginfo_t sinfo, void ctx) { unsigned long pkeyreg; / FIXME: printf is not signal-safe but for the current purpose, it gets the job done. / printf("pkey: exp = %d, got = %d\n", faulting_pkey, sinfo->si_pkey); fflush(stdout); assert(sinfo->si_code == SEGV_PKUERR); assert(sinfo->si_pkey == faulting_pkey); / clear pkey permissions to let the faulting instruction continue / pkeyreg_set(faulting_pkey, 0x0); } static void do_mprotect_fault(void p) { unsigned long rights, pkeyreg, pgsize; unsigned int i; void region; int pkey; srand(time(NULL)); pgsize = sysconf(_SC_PAGESIZE); rights = PKEY_DISABLE_WRITE; region = p; /* allocate key, no permissions / assert((pkey = sys_pkey_alloc(0, PKEY_DISABLE_ACCESS)) > 0); pkeyreg_set(4, 0x0); / cache the pkey here as the faulting pkey for future reference in the signal handler / faulting_pkey = pkey; printf("%s: faulting pkey = %d\n", __func__, faulting_pkey); / try to allocate, mprotect and free pkeys repeatedly / for (i = 0; i < NUM_ITERATIONS; i++) { / sync up with the other thread here / pthread_barrier_wait(&pkey_set_barrier); / make sure that the pkey used by the non-faulting thread is made permissive for this thread's context too so that no faults are triggered because it still might have been set to a restrictive value / // pkeyreg_set(permissive_pkey, 0x0); / sync up with the other thread here / pthread_barrier_wait(&mprotect_barrier); / perform mprotect / assert(!sys_pkey_mprotect(region, pgsize, PROT_READ \| PROT_WRITE, pkey)); / choose a random byte from the protected region and attempt to write to it, this will generate a fault / ((char ) region + (rand() % pgsize)) = rand(); / restore pkey permissions as the signal handler may have cleared the bit out for the sake of continuing / pkeyreg_set(pkey, PKEY_DISABLE_WRITE); } / free pkey / sys_pkey_free(pkey); return NULL; } static void do_mprotect_nofault(void p) { unsigned long pgsize; unsigned int i, j; void region; int pkey; pgsize = sysconf(_SC_PAGESIZE); region = p; /* try to allocate, mprotect and free pkeys repeatedly / for (i = 0; i < NUM_ITERATIONS; i++) { / allocate pkey, all permissions / assert((pkey = sys_pkey_alloc(0, 0)) > 0); permissive_pkey = pkey; / sync up with the other thread here / pthread_barrier_wait(&pkey_set_barrier); pthread_barrier_wait(&mprotect_barrier); / perform mprotect on the common page, no faults will be triggered as this is most permissive / assert(!sys_pkey_mprotect(region, pgsize, PROT_READ \| PROT_WRITE, pkey)); / free pkey / assert(!sys_pkey_free(pkey)); } return NULL; } int main(int argc, char argv) { pthread_t fault_thread, nofault_thread; unsigned long pgsize; struct sigaction act; pthread_attr_t attr; cpu_set_t fault_cpuset, nofault_cpuset; unsigned int i; void region; /* allocate memory region to protect / pgsize = sysconf(_SC_PAGESIZE); assert(region = memalign(pgsize, pgsize)); CPU_ZERO(&fault_cpuset); CPU_SET(0, &fault_cpuset); CPU_ZERO(&nofault_cpuset); CPU_SET(8, &nofault_cpuset); assert(!pthread_attr_init(&attr)); / setup sigsegv signal handler / act.sa_handler = 0; act.sa_sigaction = pkey_handle_fault; assert(!sigprocmask(SIG_SETMASK, 0, &act.sa_mask)); act.sa_flags = SA_SIGINFO; act.sa_restorer = 0; assert(!sigaction(SIGSEGV, &act, NULL)); / setup barrier for the two threads / pthread_barrier_init(&pkey_set_barrier, NULL, 2); pthread_barrier_init(&mprotect_barrier, NULL, 2); / setup and start threads / assert(!pthread_create(&fault_thread, &attr, &do_mprotect_fault, region)); assert(!pthread_setaffinity_np(fault_thread, sizeof(cpu_set_t), &fault_cpuset)); assert(!pthread_create(&nofault_thread, &attr, &do_mprotect_nofault, region)); assert(!pthread_setaffinity_np(nofault_thread, sizeof(cpu_set_t), &nofault_cpuset)); / cleanup */ assert(!pthread_attr_destroy(&attr)); assert(!pthread_join(fault_thread, NULL)); assert(!pthread_join(nofault_thread, NULL)); assert(!pthread_barrier_destroy(&pkey_set_barrier)); assert(!pthread_barrier_destroy(&mprotect_barrier)); free(region); puts("PASS"); return EXIT_SUCCESS; } The above test can result the below failure without this patch. pkey: exp = 3, got = 3 pkey: exp = 3, got = 4 a.out: pkey-siginfo-race.c💯 pkey_handle_fault: Assertion `sinfo->si_pkey == faulting_pkey' failed. Aborted Check for vma access before considering this a key fault. If vma pkey allow access retry the acess again. Test case is written by Sandipan Das <sandipan@linux.ibm.com> hence added SOB from him. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-3-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V	fe4a6856cb	powerpc/pkeys: Avoid using lockless page table walk Fetch pkey from vma instead of linux page table. Also document the fact that in some cases the pkey returned in siginfo won't be the same as the one we took keyfault on. Even with linux page table walk, we can end up in a similar scenario. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-2-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:13 +10:00
Aneesh Kumar K.V	93a98695f2	mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg We will use this in later patch to do tlb flush when clearing pmd entries. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-22-aneesh.kumar@linux.ibm.com	2020-05-05 21:20:13 +10:00
Michael Ellerman	f2b8d76dc6	PPC KVM fix for 5.7 - Fix a regression introduced in the last merge window, which results in guests in HPT mode dying randomly. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJeni/pAAoJEJ2a6ncsY3GfTRoIANAQjIZi96AfJcfnrYQ4yUF7 scxawTiJ9VavvsEJLJ7vsozrJ4xxmvmA0fFWC84uw9+BwPqoLFFvZTjazbGEDVvF FGwNBR/k7nfFVMIHS3K9iy9KjvYL3xkL26AgFTDJFq8hmOO9pH0txuk4r7SXb+NX bGG0mScAD/Dg/HwAHAS6EP3jT35QtGTK62p8foqVTziTNcmBn9Ywtg0lEzAcq2iY Y1BUD4Ov3cggshMI9SqHE8Yyq0XA2Wi6ggcyz/gVzvcbdFQmtg57Tri8nN8661LX XKh+VTpYSIxNs5GgjwlNesJzJ9h6CSynJF556qrjQ0XsXcNqvn8fcZdNQ+hnRYw= =Y19W -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-fixes-5.7-1' into topic/ppc-kvm This brings in a fix from the kvm-ppc tree that was merged to mainline after rc2, and so isn't in the base of our topic branch. We'd like it in the topic branch because it interacts with patches we plan to carry in this branch.	2020-05-05 21:16:47 +10:00
Hari Bathini	140777a3d8	powerpc/fadump: consider reserved ranges while reserving memory Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse reserved-ranges DT node and reserve kernel memory falling in these ranges for F/W purposes. Memory reserved for FADump should not overlap with these ranges as it could corrupt memory meant for F/W or crash'ed kernel memory to be exported as vmcore. But since commit 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode"), memblock_find_in_range() is being used to find the appropriate area to reserve memory for FADump, which can't account for reserved-ranges as these ranges are reserved only after FADump memory reservation. With reserved-ranges now being populated during early boot, look out for these memory ranges while reserving memory for FADump. Without this change, MPIPL on PowerNV systems aborts with hostboot failure, when memory reserved for FADump is less than 4096MB. Fixes: 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158737297693.26700.16193820746269425424.stgit@hbathini.in.ibm.com	2020-05-04 22:29:58 +10:00
Hari Bathini	02c04e374e	powerpc/fadump: use static allocation for reserved memory ranges At times, memory ranges have to be looked up during early boot, when kernel couldn't be initialized for dynamic memory allocation. In fact, reserved-ranges look up is needed during FADump memory reservation. Without accounting for reserved-ranges in reserving memory for FADump, MPIPL boot fails with memory corruption issues. So, extend memory ranges handling to support static allocation and populate reserved memory ranges during early boot. Fixes: dda9dbfeeb7a ("powerpc/fadump: consider reserved ranges while releasing memory") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158737294432.26700.4830263187856221314.stgit@hbathini.in.ibm.com	2020-05-04 22:29:58 +10:00
Xiongfeng Wang	43c8a496fa	powerpc/ps3: Move static keyword to the front of declaration Move the static keyword to the front of declaration of 'vuart_bus_priv', and resolve the following compiler warning that can be seen when building with warnings enabled (W=1): drivers/ps3/ps3-vuart.c:867:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration] } static vuart_bus_priv; ^ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1588154448-56759-1-git-send-email-wangxiongfeng2@huawei.com	2020-05-04 22:29:37 +10:00
Naveen N. Rao	57b3ed941b	powerpc/64: Have MPROFILE_KERNEL depend on FUNCTION_TRACER Currently, it is possible to have CONFIG_FUNCTION_TRACER disabled, but CONFIG_MPROFILE_KERNEL enabled. Though all existing users of MPROFILE_KERNEL are doing the right thing, it is weird to have MPROFILE_KERNEL enabled when the function tracer isn't. Fix this by making MPROFILE_KERNEL depend on FUNCTION_TRACER. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200422092612.514301-1-naveen.n.rao@linux.vnet.ibm.com	2020-04-30 12:56:28 +10:00
Gautham R. Shenoy	bde752c3d6	Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr Add documentation for the following sysfs interfaces: /sys/devices/system/cpu/cpuX/purr /sys/devices/system/cpu/cpuX/spurr /sys/devices/system/cpu/cpuX/idle_purr /sys/devices/system/cpu/cpuX/idle_spurr Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1586249263-14048-6-git-send-email-ego@linux.vnet.ibm.com	2020-04-30 12:35:27 +10:00

1 2 3 4 5 ...

915814 Commits